README.md 3.3 KB

Datalad Metadata Model

This repository contains the metadata model that datalad and datalad-metalad (will) use for their metadata.

The model is separated into individual components that can be independently loaded and saved in order to have a focus-based view on the potentially very large metadata model instance (an application of the "proxy design pattern").

The implementation is divided into a user facing API-layer and a storage layer. Both are independent from each other (as long as the model does not change). The API layer defines an abstract data type, which represents the metadata model. The storage layer is responsible for persisting the model instance.

The two layers communicate through a defined interface. This allows for the use of multiple different storage layers with the same model instance. This can be even done in parallel, for example, if you want to copy a model from one storage layer to another storage layer. It also allows for the independent development of storage backends

Test it with datalad-metalad

There is a datalad-metalad (aka metalad) fork, i.e. https://github.com/christian-monch/datalad-metalad with the branch "metadata_model". This branch uses the metadata model to operate on metadata.

Currently the metadata_model branch of datalad-metalad implements the following commands based on the model:

  • meta-dump
  • ... (more to come)

Consequently there is also a repository, that contains "test" metadata (which has been created with the mdc-tool in this distribution).

Installation instructions

(These instructions were tested on Debian 10) Create a virtual environment, activate it, and upgrade pip, e.g.:

python3 -m venv ~/venv/datalad-metadata-model
source ~/venv/datalad-metadata-model/bin/activate
pip install --upgrade pip

Clone datalad-metalad and checkout the branch "metadata_model".

git clone https://github.com/christian-monch/datalad-metalad
cd datalad-metalad
git checkout metadata_model

Install the checked out version of metalad, i.e.

pip install -r requirements.txt

Invoking datalad meta-dump should now output:

[WARNING] No git-mapped datalad metadata model found in: .

Now, clone the demo-metadata repository into a directory of your choice, change into it and fetch all remote references

git clone https://github.com/christian-monch/datalad-metadata-demo-2.git

Change into the directory and fetch some remote references

cd datalad-metadata-demo-2
git fetch origin refs/datalad/dataset-tree-version-list:refs/datalad/dataset-tree-version-list
git fetch origin refs/datalad/dataset-uuid-set:refs/datalad/dataset-uuid-set
git fetch origin refs/datalad/object-references/dataset-tree:refs/datalad/object-references/dataset-tree
git fetch origin refs/datalad/object-references/file-tree:refs/datalad/object-references/file-tree
git fetch origin refs/datalad/object-references/metadata:refs/datalad/object-references/metadata

Invocation

Now you are all set to give it a try. Execute:

datalad -f json_pp meta-dump -r

That should output a few JSON objects describing datasets and files.

(The metadata was created with the "mdc" tool that comes with the datalad-metadata-model package. The dataset hierarchy and file names are taken from a local clone of the datasets.datalad.org dataset.)