Scheduled service maintenance on November 22


On Friday, November 22, 2024, between 06:00 CET and 18:00 CET, GIN services will undergo planned maintenance. Extended service interruptions should be expected. We will try to keep downtimes to a minimum, but recommend that users avoid critical tasks, large data uploads, or DOI requests during this time.

We apologize for any inconvenience.

README.md 2.5 KB

Build status codecov PyPI version GitHub release (latest by date including pre-releases)

Datalad Metadata Model

This software implements the metadata model that datalad and datalad-metalad will use in the future (datalad-metalad>=0.3.0) to handle metadata.

Model Elements (the model layer)

The metadata model is defined by the API of the top-level classes. Those are:

  • MetadataRootRecord -- holds top-level metadata information for a single version of a datalad dataset

  • UUIDSet -- holds metadata root records for a set of datasets that are identified by their UUIDs and their version.

  • TreeVersionList -- holds metadata root records and a sub-dataset tree for a dataset version and its sub-datasets

  • Metadata -- represents metadata for a single item, i.e. dataset or file. Metadata is associated with extractor names and extraction parameters.

  • DatasetTree -- a representation of the sub-dataset hierarchy of a dataset

  • FileTree -- a representation of the file-tree of a dataset

  • ...

Because of the large size of some datalad-datasets, e.g. tens of thousands of sub-datasets and hundres of millions of files, the implementation allows focus-based operations on individual parts of the potentially very large metadata model. The implementation uses the proxy-pattern, that means, it loads, modifies, and saves only the minimal necessary model elements that are necessary to operate on the metadata-information that the user is interested in.

Storage layer

The model elements have to be persisted on a storage backend. How the model is mapped on storage backends is defined by the storage layer, that is to a large degree independent of the model layer. The intention is to support multiple storage backends in the past.

Currently only one storage backend is supported:

  • git-mapping -- a storage backend that stores a metadata model in a git repository. The model objects are stored outside of existing branches. They are referenced by datalad-specific git-references under refs/datalad/*