|
@@ -33,47 +33,77 @@ files contained, as well as a number of validation analyses.
|
|
|
How to obtain the dataset
|
|
|
-------------------------
|
|
|
|
|
|
-This repository contains metadata and information on the identity of all
|
|
|
-included files. However, the actual content of the (sometime large) data
|
|
|
-files is stored elsewhere. To obtain any dataset component, git-annex_ is
|
|
|
-required in addition to Git_.
|
|
|
+This repository is a `DataLad <https://www.datalad.org/>`__ dataset. It provides
|
|
|
+fine-grained data access down to the level of individual files, and allows for
|
|
|
+tracking future updates. In order to use this repository for data retrieval,
|
|
|
+`DataLad <https://www.datalad.org>`_ is required.
|
|
|
+It is a free and open source command line tool, available for all
|
|
|
+major operating systems, and builds up on Git and `git-annex
|
|
|
+<https://git-annex.branchable.com>`__ to allow sharing, synchronizing, and
|
|
|
+version controlling collections of large files. You can find information on
|
|
|
+how to install DataLad at `handbook.datalad.org/en/latest/intro/installation.html
|
|
|
+<http://handbook.datalad.org/en/latest/intro/installation.html>`_.
|
|
|
|
|
|
-1. Clone this repository to the desired location.
|
|
|
-2. Enter the directory with the local clone and run::
|
|
|
+Get the dataset
|
|
|
+^^^^^^^^^^^^^^^
|
|
|
|
|
|
- git annex init
|
|
|
+A DataLad dataset can be ``cloned`` by running::
|
|
|
|
|
|
- Older versions of git-annex may require you to run the following
|
|
|
- command immediately afterwards::
|
|
|
+ datalad clone <url>
|
|
|
|
|
|
- git annex enableremote mddatasrc
|
|
|
+Once a dataset is cloned, it is a light-weight directory on your local machine.
|
|
|
+At this point, it contains only small metadata and information on the
|
|
|
+identity of the files in the dataset, but not actual *content* of the
|
|
|
+(sometimes large) data files.
|
|
|
|
|
|
-Now any desired dataset component can be obtained by using the ``git annex get``
|
|
|
-command. To obtain the entire dataset content run::
|
|
|
+Retrieve dataset content
|
|
|
+^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
- git annex get .
|
|
|
+After cloning a dataset, you can retrieve file contents by running::
|
|
|
|
|
|
+ datalad get <path/to/directory/or/file>
|
|
|
|
|
|
-Keep data up-to-date
|
|
|
---------------------
|
|
|
+This command will trigger a download of the files, directories, or
|
|
|
+subdatasets you have specified.
|
|
|
+
|
|
|
+DataLad datasets can contain other datasets, so called *subdatasets*. If you
|
|
|
+clone the top-level dataset, subdatasets do not yet contain metadata and
|
|
|
+information on the identity of files, but appear to be empty directories. In
|
|
|
+order to retrieve file availability metadata in subdatasets, run::
|
|
|
|
|
|
-If updates to this dataset are made in the future, update any local clone by
|
|
|
-running::
|
|
|
+ datalad get -n <path/to/subdataset>
|
|
|
|
|
|
- git pull
|
|
|
+Afterwards, you can browse the retrieved metadata to find out about
|
|
|
+subdataset contents, and retrieve individual files with ``datalad get``. If you
|
|
|
+use ``datalad get <path/to/subdataset>``, all contents of the subdataset will
|
|
|
+be downloaded at once.
|
|
|
|
|
|
-followed by::
|
|
|
+Keep data up-to-date
|
|
|
+^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
- git annex get .
|
|
|
+DataLad datasets can be updated. The command ``datalad update`` will *fetch*
|
|
|
+updates and store them on a different branch (by default
|
|
|
+``remotes/origin/master``). Running::
|
|
|
|
|
|
-to fetch all new files.
|
|
|
+ datalad update --merge
|
|
|
|
|
|
+will *pull* available updates and integrate them in one go.
|
|
|
|
|
|
+Find out what has been done
|
|
|
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
+DataLad datasets contain their history in the ``git log``.
|
|
|
+By running ``git log`` (or a tool that displays Git history) in the dataset or on
|
|
|
+specific files, you can find out what has been done to the dataset or to individual files
|
|
|
+by whom, and when.
|
|
|
|
|
|
-.. _Git: http://www.git-scm.com
|
|
|
+More information
|
|
|
+^^^^^^^^^^^^^^^^
|
|
|
|
|
|
-.. _git-annex: http://git-annex.branchable.com/
|
|
|
+More information on DataLad and how to use it can be found in the DataLad Handbook at
|
|
|
+`handbook.datalad.org <http://handbook.datalad.org/en/latest/index.html>`_. The
|
|
|
+chapter "DataLad datasets" can help you to familiarize yourself with the
|
|
|
+concept of a dataset.
|
|
|
|
|
|
.. |license|
|
|
|
image:: https://img.shields.io/badge/license-PDDL-blue.svg
|