|
@@ -1,108 +1,10 @@
|
|
|
-One can create a new dataset with 'datalad create [--description] PATH'.
|
|
|
-The dataset is created empty
|
|
|
|
|
|
-The command "datalad save [-m] PATH" saves the file
|
|
|
-(modifications) to history. Note to self:
|
|
|
-Always use informative, concise commit messages.
|
|
|
-
|
|
|
-The command 'datalad clone URL/PATH [PATH]'
|
|
|
-clones a dataset from e.g., a URL or a path.
|
|
|
-If you clone a dataset into an existing
|
|
|
-dataset (as a subdataset), remember to specify the
|
|
|
-root of the superdataset with the '-d' option.
|
|
|
-
|
|
|
-There are two useful functions to display changes between two
|
|
|
-states of a dataset: "datalad diff -f/--from COMMIT -t/--to COMMIT"
|
|
|
-and "git diff COMMIT COMMIT", where COMMIT is a shasum of a commit
|
|
|
-in the history.
|
|
|
-
|
|
|
-The datalad run command can record the impact a script or command has on a Dataset.
|
|
|
-In its simplest form, datalad run only takes a commit message and the command that
|
|
|
-should be executed.
|
|
|
-
|
|
|
-Any datalad run command can be re-executed by using its commit shasum as an argument
|
|
|
-in datalad rerun CHECKSUM. DataLad will take information from the run record of the original
|
|
|
-commit, and re-execute it. If no changes happen with a rerun, the command will not be written
|
|
|
-to history. Note: you can also rerun a datalad rerun command!
|
|
|
-
|
|
|
-You should specify all files that a command takes as input with an -i/--input flag. These
|
|
|
-files will be retrieved prior to the command execution. Any content that is modified or
|
|
|
-produced by the command should be specified with an -o/--output flag. Upon a run or rerun
|
|
|
-of the command, the contents of these files will get unlocked so that they can be modified.
|
|
|
-
|
|
|
-Important! If the dataset is not "clean" (a datalad status output is empty),
|
|
|
-datalad run will not work - you will have to save modifications present in your
|
|
|
-dataset.
|
|
|
-A suboptimal alternative is the --explicit flag,
|
|
|
-used to record only those changes done
|
|
|
-to the files listed with --output flags.
|
|
|
-
|
|
|
-A source to clone a dataset from can also be a path,
|
|
|
-for example as in "datalad clone ../DataLad-101".
|
|
|
-
|
|
|
-Just as in creating datasets, you can add a
|
|
|
-description on the location of the new dataset clone
|
|
|
-with the -D/--description option.
|
|
|
-
|
|
|
-Note that subdatasets will not be installed by default,
|
|
|
-but are only registered in the superdataset -- you will
|
|
|
-have to do a "datalad get -n PATH/TO/SUBDATASET"
|
|
|
-to clone the subdataset for file availability meta data.
|
|
|
-The -n/--no-data options prevents that file contents are
|
|
|
-also downloaded.
|
|
|
-
|
|
|
-Note that a recursive "datalad get" would clone all further
|
|
|
-registered subdatasets underneath a subdataset, so a safer
|
|
|
-way to proceed is to set a decent --recursion-limit:
|
|
|
-"datalad get -n -r --recursion-limit 2 <subds>"
|
|
|
-
|
|
|
-The command "git annex whereis PATH" lists the repositories that have
|
|
|
-the file content of an annexed file. When using "datalad get" to retrieve
|
|
|
-file content, those repositories will be queried.
|
|
|
-
|
|
|
-To update a shared dataset, run the command "datalad update --merge".
|
|
|
-This command will query its origin for changes, and integrate the
|
|
|
-changes into the dataset.
|
|
|
-
|
|
|
-To update from a dataset with a shared history, you
|
|
|
-need to add this dataset as a sibling to your dataset.
|
|
|
-"Adding a sibling" means providing DataLad with info about
|
|
|
-the location of a dataset, and a name for it. Afterwards,
|
|
|
-a "datalad update --merge -s name" will integrate the changes
|
|
|
-made to the sibling into the dataset.
|
|
|
-A safe step in between is to do a "datalad update -s name"
|
|
|
-and checkout the changes with "git/datalad diff"
|
|
|
-to remotes/origin/master
|
|
|
-
|
|
|
-Configurations for datasets exist on different levels
|
|
|
-(systemwide, global, and local), and in different types
|
|
|
-of files (not version controlled (git)config files, or
|
|
|
-version controlled .datalad/config, .gitattributes, or
|
|
|
-gitmodules files), or environment variables.
|
|
|
-With the exception of .gitattributes, all configuration
|
|
|
-files share a common structure, and can be modified with
|
|
|
-the git config command, but also with an editor by hand.
|
|
|
-
|
|
|
-Depending on whether a configuration file is version
|
|
|
-controlled or not, the configurations will be shared together
|
|
|
-with the dataset. More specific configurations and not-shared
|
|
|
-configurations will always take precedence over more global or
|
|
|
-shared configurations, and environment variables take precedence
|
|
|
-over configurations in files.
|
|
|
-
|
|
|
-The git config --list --show-origin command is a useful tool
|
|
|
-to give an overview over existing configurations. Particularly
|
|
|
-important may be the .gitattributes file, in which one can set
|
|
|
-rules for git-annex about which files should be version-controlled
|
|
|
-with Git instead of being annexed.
|
|
|
-
|
|
|
-It can be useful to use pre-configured procedures that can apply
|
|
|
-configurations, create files or file hierarchies, or perform
|
|
|
-arbitrary tasks in datasets. They can be shipped with DataLad,
|
|
|
-its extensions, or datasets, and you can even write your own
|
|
|
-procedures and distribute them. The "datalad run-procedure"
|
|
|
-command is used to apply such a procedure to a dataset. Procedures
|
|
|
-shipped with DataLad or its extensions starting with a "cfg" prefix
|
|
|
-can also be applied at the creation of a dataset with
|
|
|
-"datalad create -c <PROC-NAME> <PATH>" (omitting the "cfg" prefix).
|
|
|
+Git has many handy tools to go back in forth in
|
|
|
+time and work with the history of datasets.
|
|
|
+Among many other things you can rewrite commit
|
|
|
+messages, undo changes, or look at previous versions
|
|
|
+of datasets. A superb resource to find out more about
|
|
|
+this and practice such Git operations is this
|
|
|
+chapter in the Pro-git book:
|
|
|
+https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History
|
|
|
|