some default

Christian Mönch f238335d40 fix a bug in metadata path initialisation 3 éve
.github fb08ad1722 add codql analysis workflow definition 3 éve
dataladmetadatamodel f238335d40 fix a bug in metadata path initialisation 3 éve
tools aa3f98617d add ci tools for appveyor runs on windows 3 éve
.appveyor.yml d8e4ec4cdd clean up appveyor spec 3 éve
.gitattributes 699b503b88 add versioneer 3 éve
.gitignore 6429186ba8 NF: add DatasetTree and its mapper 3 éve
LICENSE 47efdc66f7 Initial commit 3 éve
MANIFEST.in 699b503b88 add versioneer 3 éve
README.md a6179abbd9 update readme file 3 éve
codecov.yml de278012d2 remove duplicated requiremnt, add appveyor config 3 éve
pyproject.toml 96850a711f NF: improve package definition 3 éve
requirements.txt 2b282b687f add missing requirement for coverage 3 éve
setup.cfg 699b503b88 add versioneer 3 éve
setup.py 81d4e7e422 fix classifiers list 3 éve
versioneer.py 699b503b88 add versioneer 3 éve

README.md

Build status codecov PyPI version GitHub release (latest by date including pre-releases)

Datalad Metadata Model

This software implements the metadata model that datalad and datalad-metalad will use in the future (datalad-metalad>=0.3.0) to handle metadata.

Model Elements (the model layer)

The metadata model is defined by the API of the top-level classes. Those are:

  • MetadataRootRecord -- holds top-level metadata information for a single version of a datalad dataset

  • UUIDSet -- holds metadata root records for a set of datasets that are identified by their UUIDs and their version.

  • TreeVersionList -- holds metadata root records and a sub-dataset tree for a dataset version and its sub-datasets

  • Metadata -- represents metadata for a single item, i.e. dataset or file. Metadata is associated with extractor names and extraction parameters.

  • DatasetTree -- a representation of the sub-dataset hierarchy of a dataset

  • FileTree -- a representation of the file-tree of a dataset

  • ...

Because of the large size of some datalad-datasets, e.g. tens of thousands of sub-datasets and hundres of millions of files, the implementation allows focus-based operations on individual parts of the potentially very large metadata model. The implementation uses the proxy-pattern, that means, it loads, modifies, and saves only the minimal necessary model elements that are necessary to operate on the metadata-information that the user is interested in.

Storage layer

The model elements have to be persisted on a storage backend. How the model is mapped on storage backends is defined by the storage layer, that is to a large degree independent of the model layer. The intention is to support multiple storage backends in the past.

Currently only one storage backend is supported:

  • git-mapping -- a storage backend that stores a metadata model in a git repository. The model objects are stored outside of existing branches. They are referenced by datalad-specific git-references under refs/datalad/*