Browse Source

DOC: Add short DataLad intro as proposed in the handbook

'http://handbook.datalad.org/en/latest/basics/101-180-FAQ.html#how-can-i-help-others-get-started-with-a-shared-dataset'
Adina Wagner 4 years ago
parent
commit
99a1f71c03
1 changed files with 50 additions and 21 deletions
  1. 50 21
      README.rst

+ 50 - 21
README.rst

@@ -58,40 +58,69 @@ Content
 How to obtain the data files
 ----------------------------
 
-This repository contains metadata and information on the identity of all
-included files. However, the actual content of the (sometime large) data
-files is stored elsewhere. To obtain any dataset component, git-annex_ is
-required in addition to Git_.
+This repository is a `DataLad <https://www.datalad.org/>`__ dataset. It provides
+fine-grained data access down to the level of individual files, and allows for
+tracking future updates up to the level of single files. In order to use
+this repository for data retrieval, `DataLad <https://www.datalad.org>`_ is
+required. It is a free and open source command line tool, available for all
+major operating systems, and builds up on Git and `git-annex
+<https://git-annex.branchable.com>`__ to allow sharing, synchronizing, and
+version controlling collections of large files. You can find information on
+how to install DataLad at `handbook.datalad.org/en/latest/intro/installation.html
+<http://handbook.datalad.org/en/latest/intro/installation.html>`_.
 
-1. Clone this repository to the desired location.
-2. Enter the directory with the local clone and run::
+Get the dataset
+^^^^^^^^^^^^^^^
 
-     git annex init
+A DataLad dataset can be ``cloned`` by running::
 
-   Older versions of git-annex may require you to run the following
-   command immediately afterwards::
+   datalad clone <url>
 
-     git annex enableremote mddatasrc
+Once a dataset is cloned, it is a light-weight directory on your local machine.
+At this point, it contains only small metadata and information on the
+identity of the files in the dataset, but not actual *content* of the
+(sometimes large) data files.
 
-Now any desired dataset component can be obtained by using the ``git annex get``
-command. To obtain the entire dataset content run::
+Retrieve dataset content
+^^^^^^^^^^^^^^^^^^^^^^^^
 
-     git annex get .
+After cloning a dataset, you can retrieve file contents by running::
 
-Keep data up-to-date
---------------------
+   datalad get <path/to/directory/or/file>
 
-If updates to this dataset are made in the future, update any local clone by
-running::
+This command will trigger a download of the files, directories, or
+subdatasets you have specified.
 
-     git pull
+DataLad datasets can contain other datasets, so called *subdatasets*. If you
+clone the top-level dataset, subdatasets do not yet contain metadata and
+information on the identity of files, but appear to be empty directories. In
+order to retrieve file availability metadata in subdatasets, run::
 
-followed by::
+   datalad get -n <path/to/subdataset>
 
-     git annex get .
+Afterwards, you can browse the retrieved metadata to find out about
+subdataset contents, and retrieve individual files with ``datalad get``. If you
+use ``datalad get <path/to/subdataset>``, all contents of the subdataset will
+be downloaded at once.
 
-to fetch all new files.
+Stay up-to-date
+^^^^^^^^^^^^^^^
 
+DataLad datasets can be updated. The command ``datalad update`` will *fetch*
+updates and store them on a different branch (by default
+``remotes/origin/master``). Running::
+
+   datalad update --merge
+
+will *pull* available updates and integrate them in one go.
+
+More information
+^^^^^^^^^^^^^^^^
+
+More information on DataLad and how to use it can be found in the DataLad Handbook at
+`handbook.datalad.org <http://handbook.datalad.org/en/latest/index.html>`_. The
+chapter "DataLad datasets" can help you to familiarize yourself with the
+concept of a dataset.
 
 
 .. _Git: http://www.git-scm.com