Deprecated extension API¶
This page documents the original asdf extension API, which has been
deprecated in favor of Extensions. Since support
for the deprecated API will be removed in asdf 3.0, we recommend that
all new extensions be implemented with the new API.
Extensions provide a way for ASDF to represent complex types that are not defined by the ASDF standard. Examples of types that require custom extensions include types from third-party libraries, user-defined types, and complex types that are part of the Python standard library but are not handled in the ASDF standard. From ASDF’s perspective, these are all considered ‘custom’ types.
Supporting new types in ASDF is easy. Three components are required:
A YAML Schema file for each new type.
A tag class (inheriting from
asdf.CustomType) corresponding to each new custom type. The class must overrideto_treeandfrom_treefromasdf.CustomTypein order to define how ASDF serializes and deserializes the custom type.A Python class to define an “extension” to ASDF, which is a set of related types. This class must implement the
asdf.extension.AsdfExtensionabstract base class. In general, a third-party library that defines multiple custom types can group them all in the same extension.
Note
The mechanisms of tag classes and extension classes are specific to this particular implementation of ASDF. As of this writing, this is the only complete implementation of the ASDF Standard. However, other language implementations may use other mechanisms for processing custom types.
All implementations of ASDF, regardless of language, will make use of the same schemas for abstract data type definitions. This allows all ASDF files to be language-agnostic, and also enables interoperability.
An Example¶
As an example, we will write an extension for ASDF that allows us to represent
Python’s standard fractions.Fraction class for representing rational numbers.
We will call our new ASDF type fraction.
First, the YAML Schema, defining the type as a pair of integers:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/fraction-1.0.0"
title: An example custom type for handling fractions
tag: "tag:nowhere.org:custom/fraction-1.0.0"
type: array
items:
type: integer
minItems: 2
maxItems: 2
...
Then, the Python implementation of the tag class and extension class. See the
asdf.CustomType and asdf.extension.AsdfExtension documentation for more information:
Note that the method to_tree of the tag class
FractionType defines how the library converts fractions.Fraction into a
tree that can be stored by ASDF. Conversely, the method
from_tree defines how the library reads a serialized
representation of the object and converts it back into an instance of
fractions.Fraction.
Note that the values of the name,
organization, standard, and
version fields are all reflected in the id and tag
definitions in the schema.
Note also that the base of the tag value (up to the name and version
components) is reflected in tag_mapping property of the
FractionExtension type, which is used to map tags to URLs. The
url_mapping is used to map URLs (of the same form as the
id field in the schema) to the actual location of a schema file.
Once these classes and the schema have been defined, we can save an ASDF file using them:
Defining custom types¶
In the example above, we showed how to create an extension that is capable of
serializing fractions.Fraction. The custom tag type that we created was
defined as a subclass of asdf.CustomType.
Custom type attributes¶
We overrode the following attributes of CustomType in order to define
FractionType (each bullet is also a link to the API documentation):
Each of these attributes is important, and each is described in more detail in the linked API documentation.
The choice of name should be descriptive of the custom type
that is being serialized. The choice of organization, and
standard is fairly arbitrary, but also important. Custom
types that are provided by the same package should be grouped into the same
standard and organization.
These three values, along with the version, are used to
define the YAML tag that will mark the serialized type in ASDF files. In our
example, the tag becomes tag:nowhere.org:custom/fraction-1.0.0. The tag
is important when defining the asdf.extension.AsdfExtension subclass.
Critically, these values must all be reflected in the associated schema.
Custom type methods¶
In addition to the attributes mentioned above, we also overrode the following
methods of CustomType (each bullet is also a link to the API
documentation):
The to_tree method defines how an instance of a custom data
type is converted into data structures that represent a YAML tree that can be
serialized to a file.
The from_tree method defines how a YAML tree can be
converted back into an instance of the original custom data type.
In the example above, we used a list to contain the important attributes of
fractions.Fraction. However, this choice is fairly arbitrary, as long as it
is consistent between the way that to_tree and
from_tree are defined. For example, we could have also
chosen to use a dict:
In this case, the associated schema would look like the following:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/fraction-1.0.0"
title: An example custom type for handling fractions
tag: "tag:nowhere.org:custom/fraction-1.0.0"
type: object
properties:
numerator:
type: integer
denominator:
type: integer
...
We can compare the output using this representation to the example above:
Serializing more complex types¶
Sometimes the custom types that we wish to represent in ASDF themselves have
attributes which are also custom types. As a somewhat contrived example,
consider a 2D cartesian coordinate that uses fraction.Fraction to represent
each of the components. We will call this type Fractional2DCoordinate.
First we need to define a schema to represent this new type:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/fractional_2d_coord-1.0.0"
title: An example custom type for handling components
tag: "tag:nowhere.org:custom/fractional_2d_coord-1.0.0"
type: object
properties:
x:
$ref: fraction-1.0.0
y:
$ref: fraction-1.0.0
...
Note that in the schema, the x and y attributes are expressed as
references to our fraction-1.0.0 schema. Since both of these schemas are
defined under the same standard and organization, we can simply use the name
and version of the fraction-1.0.0 schema to refer to it. However, if the
reference type was defined in a different organization and standard, it would
be necessary to use the entire YAML tag in the reference (e.g.
tag:nowhere.org:custom/fraction-1.0.0). Relative tag references are also
allowed where appropriate.
We also need to define the custom tag type that corresponds to our new type:
In previous versions of this library, it was necessary for our
Fractional2DCoordinateType class to call yamlutil functions
explicitly to convert the x and y components to and from
their tree representations. Now, the library will automatically
convert nested custom types before calling from_tree,
and after receiving the result from to_tree.
Since Fractional2DCoordinateType shares the same
organization and standard as
FractionType, it can be added to the same extension class:
Now we can use this extension to create an ASDF file:
Note that in the resulting ASDF file, the x and y components of
our new fraction_2d_coord type are tagged as fraction-1.0.0.
Serializing reference cycles¶
Special considerations must be made when deserializing a custom type that
contains a reference to itself among its descendants. Consider a
fractions.Fraction subclass that maintains a reference to its multiplicative
inverse:
The inverse of the inverse of a fraction is the fraction itself, so you might wish to construct your objects in the following way:
Which creates an “infinite loop” between the two fractions. An ordinary
CustomType wouldn’t be able to deserialize this, since each object
requires that the other be deserialized first! Let’s see what happens
when we define our from_tree method in a naive way:
After adding our type to the extension class, the tree will serialize correctly:
But upon deserialization, we notice a problem:
The presence of _PendingValue is asdf’s way of telling you
that the value corresponding to the key inverse was not fully deserialized
at the time that you retrieved it. We can handle this situation by making our
from_tree a generator function:
The generator version of from_tree yields the partially constructed
FractionWithInverse object before setting its inverse property. This allows
asdf to proceed to constructing the inverse FractionWithInverse object,
and resume the original from_tree execution only when the inverse
is actually available.
With this new version of from_tree, we can successfully deserialize
our ASDF file:
Assigning schema and tag versions¶
Authors of new tags and schemas should strive to use the conventions described
by semantic versioning. Tags and schemas for types
that have not been serialized before should begin at 1.0.0. Versions for a
particular tag type need not move in lock-step with other tag types in the same
extension.
The patch version should be bumped for bug fixes and other minor, backwards-compatible changes. New features can be indicated with increments to the minor version, as long as they remain backwards compatible with older versions of the schema. Any changes that break backwards compatibility must be indicated by a major version update.
Since ASDF is intended to be an archival file format, authors of tags and schemas should work to ensure that ASDF files created with older extensions can continue to be processed. This means that every time a schema version is bumped (with the possible exception of patch updates), a new schema file should be created.
For example, if we currently have a schema for xyz-1.0.0, and we wish to
make changes and bump the version to xyz-1.1.0, we should leave the
original schema intact. A new schema file should be created for
xyz-1.1.0, which can exist in parallel with the old file. The version of
the corresponding tag type should be bumped to 1.1.0.
For more details on the behavior of schema and tag versioning from a user perspective, see Versioning and Compatibility, and also Custom types, extensions, and versioning.
Explicit version support¶
To some extent schemas and tag classes will be closely tied to the custom data types that they represent. This means that in some cases API changes or other changes to the representation of the underlying types will force us to modify our schemas and tag classes. ASDF’s schema versioning allows us to handle changes in schemas over time.
Let’s consider an imaginary custom type called Person that we want to
serialize in ASDF. The first version of Person was constructed using a
first and last name:
person = Person("James", "Webb")
print(person.first, person.last)
Our version 1.0.0 YAML schema for Person might look like the following:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/person-1.0.0"
title: An example custom type for representing a Person
tag: "tag:nowhere.org:custom/person-1.0.0"
type: array
items:
type: string
minItems: 2
maxItems: 2
...
And our tag implementation would look something like this:
import asdf
from people import Person
class PersonType(asdf.CustomType):
name = "person"
organization = "nowhere.org"
version = (1, 0, 0)
standard = "custom"
types = [Person]
@classmethod
def to_tree(cls, node, ctx):
return [node.first, node.last]
@classmethod
def from_tree(cls, tree, ctx):
return Person(tree[0], tree[1])
However, a newer version of Person now requires a middle name in the
constructor as well:
person = Person("James", "Edwin", "Webb")
print(person.first, person.middle, person.last)
So we update our YAML schema to version 1.1.0 in order to support newer versions of Person:
%YAML 1.1
---
$schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
id: "http://nowhere.org/schemas/custom/person-1.1.0"
title: An example custom type for representing a Person
tag: "tag:nowhere.org:custom/person-1.1.0"
type: array
items:
type: string
minItems: 3
maxItems: 3
...
We need to update our tag class implementation as well. However, we need to be
careful. We still want to be able to read version 1.0.0 of our schema and be
able to convert it to the newer version of Person objects. To accomplish
this, we will make use of the supported_versions attribute
for our tag class. This will allow us to declare explicit support for the
schema versions our tag class implements.
Under the hood, asdf creates multiple copies of our PersonType tag class,
each with a different version attribute corresponding to one
of the supported versions. This means that in our new tag class implementation,
we can condition our from_tree implementation on the value
of version to determine which schema version should be used when reading:
import asdf
from people import Person
class PersonType(asdf.CustomType):
name = "person"
organization = "nowhere.org"
version = (1, 1, 0)
supported_versions = [(1, 0, 0), (1, 1, 0)]
standard = "custom"
types = [Person]
@classmethod
def to_tree(cls, node, ctx):
return [node.first, node.middle, node.last]
@classmethod
def from_tree(cls, tree, ctx):
# Handle the older version of the person schema
if cls.version == (1, 0, 0):
# Construct a Person object with an empty middle name field
return Person(tree[0], "", tree[1])
else:
# The newer version of the schema stores the middle name too
return person(tree[0], tree[1], tree[2])
Note that the implementation of to_tree is not conditioned on
cls.version since we do not need to convert new Person objects back to
the older version of the schema.
Handling subclasses¶
By default, if a custom type is serialized by an asdf tag class, then all
subclasses of that type can also be serialized. However, no attributes that are
specific to the subclass will be stored in the file. When reading the file, an
instance of the base custom type will be returned instead of the subclass that
was written.
To properly handle subclasses of custom types already recognized by asdf, it is
necessary to implement a separate tag class that is specific to the subclass to
be serialized.
Previous versions of this library implemented an experimental feature that allowed ADSF to serialize subclass attributes using the same tag class, but this feature was dropped as it produced files that were not portable.
Creating custom schemas¶
All custom types to be serialized by asdf require custom schemas. The best
resource for creating ASDF schemas can be found in the ASDF Standard documentation.
In most cases, ASDF schemas will be included as part of a packaged software
distribution. In these cases, it is important for the
url_mapping of the corresponding AsdfExtension
extension class to map the schema URL to an actual location on disk. However,
it is possible for schemas to be hosted online as well, in which case the URL
mapping can map (perhaps trivially) to an actual network location. See
Defining custom extension classes for more information.
It is also important for packages that provide custom schemas to test them, both to make sure that they are valid, and to ensure that any examples they provide are also valid. See Testing custom schemas for more information.
Adding custom validators¶
A new type may also add new validation keywords to the schema language. This can be used to impose type-specific restrictions on the values in an ASDF file. This feature is used internally so a schema can specify the required datatype of an array.
To support custom validation keywords, set the validators
member of a CustomType subclass to a dictionary where the keys are the
validation keyword name and the values are validation functions. The
validation functions are of the same form as the validation functions in the
underlying jsonschema library, and are passed the following arguments:
validator: Ajsonschema.Validatorinstance.
value: The value of the schema keyword.
instance: The instance to validate. This will be made up of basic datatypes as represented in the YAML file (list, dict, number, strings), and not include any object types.
schema: The entire schema that applies to instance. Useful to get other related schema keywords.
The validation function should either return None if the instance
is valid or yield one or more jsonschema.ValidationError objects if
the instance is invalid.
To continue the example from above, for the FractionType say we
want to add a validation keyword “simplified” that, when true,
asserts that the corresponding fraction is in simplified form:
from asdf import ValidationError
def validate_simplified(validator, simplified, instance, schema):
if simplified:
reduced = fraction.Fraction(instance[0], instance[1])
if reduced.numerator != instance[0] or reduced.denominator != instance[1]:
yield ValidationError("Fraction is not in simplified form.")
FractionType.validators = {"simplified": validate_simplified}
Defining custom extension classes¶
Extension classes are the mechanism that asdf uses to register custom tag types
so that they can be used when processing ASDF files. Packages that define their
own custom tag types must also define extensions in order for those types to be
used.
All extension classes must implement the asdf.extension.AsdfExtension abstract base
class. A custom extension will override each of the following properties of
asdf.extension.AsdfExtension (the text in each bullet is also a link to the corresponding
documentation):
Overriding built-in extensions¶
It is possible for externally defined extensions to override tag types that are
provided by asdf’s built-in extension. For example, maybe an external package
wants to provide a different implementation of NDArrayType.
In this case, the external package does not need to provide custom schemas
since the schema for the type to be overridden is already provided as part of
the ASDF standard.
Instead, the extension class may inherit from asdf’s
asdf.extension.BuiltinExtension and simply override the
types property to indicate the type that is being
overridden. Doing this preserves the tag_mapping and
url_mapping that is used by the BuiltinExtension, which
allows the schemas that are packaged by asdf to be located.
asdf will give precedence to the type that is provided by the external
extension, effectively overriding the corresponding type in the built-in
extension. Note that it is currently undefined if multiple external extensions
are provided that override the same built-in type.
Packaging custom extensions¶
Packaging schemas¶
If a package provides custom schemas, the schema files must be installed as
part of that package distribution. In general, schema files must be installed
into a subdirectory of the package distribution. The asdf extension class must
supply a url_mapping that maps to the installed location
of the schemas. See Defining custom extension classes for more details.
Registering entry points¶
Packages that provide their own ASDF extensions can (and should!) install them
so that they are automatically detectable by the asdf Python package. This is
accomplished using Python’s setuptools
entry points. Entry points are registered in a package’s setup.py file.
Consider a package that provides an extension class MyPackageExtension in the
submodule mypackage.asdf.extensions. We need to register this class as an
extension entry point that asdf will recognize. First, we create a dictionary:
entry_points = {}
entry_points["asdf_extensions"] = [
"mypackage = mypackage.asdf.extensions:MyPackageExtension"
]
The key used in the entry_points dictionary must be 'asdf_extensions'.
The value must be an array of one or more strings, each with the following
format:
extension_name = fully.specified.submodule:ExtensionClass
The extension name can be any arbitrary string, but it should be descriptive of the package and the extension. In most cases the package itself name will suffice.
Note that depending on individual package requirements, there may be other
entries in the entry_points dictionary.
The entry points must be passed to the call to setuptools.setup:
from setuptools import setup
entry_points = {}
entry_points["asdf_extensions"] = [
"mypackage = mypackage.asdf.extensions:MyPackageExtension"
]
setup(
# We omit other package-specific arguments that are not
# relevant to this example
entry_points=entry_points,
)
When running python setup.py install or python setup.py develop on this
package, the entry points will be registered automatically. This allows the
asdf package to recognize the extensions without any user intervention. Users
of your package that wish to read ASDF files using types that you have
registered will not need to use any extension explicitly. Instead, asdf will
automatically recognize the types you have registered and will process them
appropriately. See Extensions from other packages for more information on using
extensions.
Testing custom schemas¶
Packages that provide their own schemas can test them using asdf’s
pytest plugin for schema testing.
Schemas are tested for overall validity, and any examples given within the
schemas are also tested.
The schema tester plugin is automatically registered when the asdf package is
installed. In order to enable testing, it is necessary to add the directory
containing your schema files to the pytest section of your project’s build configuration
(pyproject.toml or setup.cfg). If you do not already have such a file, creating
one with the following should be sufficient:
The schema directory paths should be paths that are relative to the top of the
package directory when it is installed. If this is different from the path
in the source directory, then both paths can be used to facilitate in-place
testing (see asdf’s own pyproject.toml for an example of this).
Note
Older versions of asdf (prior to 2.4.0) required the plugin to be registered
in your project’s conftest.py file. As of 2.4.0, the plugin is now
registered automatically and so this line should be removed from your
conftest.py file, unless you need to retain compatibility with older
versions of asdf.
The asdf_schema_skip_names configuration variable can be used to skip
schema files that live within one of the asdf_schema_root directories but
should not be tested. The names should be given as simple base file names
(without directory paths or extensions). Again, see asdf’s own pyproject.toml file
for an example.
The schema tests do not run by default. In order to enable the tests by
default for your package, add asdf_schema_tests_enabled = 'true' to the
[tool.pytest.ini_options] section of your pyproject.toml file (or [tool:pytest] in setup.cfg).
If you do not wish to enable the schema tests by default, you can add the --asdf-tests option to
the pytest command line to enable tests on a per-run basis.