some default

Christian Mönch ddaab0882c fix a lying comment 2 年之前
.github 1eed3cb1ba update nosetest workflow 2 年之前
dataladmetadatamodel ddaab0882c fix a lying comment 2 年之前
simple_test 21754ace05 TEMP 2 年之前
tools 262f8f1513 fix wrong instantiation, add asserts 2 年之前
.gitignore 6429186ba8 NF: add DatasetTree and its mapper 3 年之前
LICENSE 47efdc66f7 Initial commit 3 年之前
README.md b81185f6c5 DOC: minor fixup to steps in README.md which worked for me 3 年之前
pyproject.toml 96850a711f NF: improve package definition 3 年之前
requirements.txt 416e3d9f66 add nose to requirements 2 年之前
setup.cfg 99d451bbcf add proper PEP440 version 2 年之前
setup.py be3b85a8f7 fix versioneer import 2 年之前
versioneer.py 99d451bbcf add proper PEP440 version 2 年之前

README.md

Datalad Metadata Model

This repository contains the metadata model that datalad and datalad-metalad (will) use for their metadata.

The model is separated into individual components that can be independently loaded and saved in order to have a focus-based view on the potentially very large metadata model instance (an application of the "proxy design pattern").

The implementation is divided into a user facing API-layer and a storage layer. Both are independent from each other (as long as the model does not change). The API layer defines an abstract data type, which represents the metadata model. The storage layer is responsible for persisting the model instance.

The two layers communicate through a defined interface. This allows for the use of multiple different storage layers with the same model instance. This can be even done in parallel, for example, if you want to copy a model from one storage layer to another storage layer. It also allows for the independent development of storage backends

Test it with datalad-metalad

There is a datalad-metalad (aka metalad) fork, i.e. https://github.com/christian-monch/datalad-metalad with the branch "metadata_model". This branch uses the metadata model to operate on metadata.

Currently the metadata_model branch of datalad-metalad implements the following commands based on the model:

  • meta-dump
  • ... (more to come)

Consequently there is also a repository, that contains "test" metadata (which has been created with the mdc-tool in this distribution).

Installation instructions

(These instructions were tested on Debian 10) Create a virtual environment, activate it, and upgrade pip, e.g.:

python3 -m venv ~/venv/datalad-metadata-model
source ~/venv/datalad-metadata-model/bin/activate
pip install --upgrade pip

Clone datalad-metalad and checkout the branch "metadata_model".

git clone https://github.com/christian-monch/datalad-metalad
cd datalad-metalad
git checkout metadata_model

Install the checked out version of metalad, i.e.

pip install -r requirements.txt

Invoking datalad meta-dump should now output:

[WARNING] No git-mapped datalad metadata model found in: .

Now, clone the demo-metadata repository into a directory of your choice, change into it and fetch all remote references

git clone https://github.com/christian-monch/datalad-metadata-demo-2.git

Change into the directory and fetch some remote references

cd datalad-metadata-demo-2
git fetch origin refs/datalad/dataset-tree-version-list:refs/datalad/dataset-tree-version-list
git fetch origin refs/datalad/dataset-uuid-set:refs/datalad/dataset-uuid-set
git fetch origin refs/datalad/object-references/dataset-tree:refs/datalad/object-references/dataset-tree
git fetch origin refs/datalad/object-references/file-tree:refs/datalad/object-references/file-tree
git fetch origin refs/datalad/object-references/metadata:refs/datalad/object-references/metadata

Invocation

Now you are all set to give it a try. Execute:

datalad -f json_pp meta-dump -r

That should output a few JSON objects describing datasets and files.

(The metadata was created with the "mdc" tool that comes with the datalad-metadata-model package. The dataset hierarchy and file names are taken from a local clone of the datasets.datalad.org dataset.)