4 years ago · 1b31c6bab2
--- a/notes.txt
+++ b/notes.txt
@@ -1,108 +1,10 @@
 
				-One can create a new dataset with 'datalad create [--description] PATH'.
			
 
				-The dataset is created empty
			
 
				 
			
 
				-The command "datalad save [-m] PATH" saves the file
			
 
				-(modifications) to history. Note to self:
			
 
				-Always use informative, concise commit messages.
			
 
				-
			
 
				-The command 'datalad clone URL/PATH [PATH]'
			
 
				-clones a dataset from e.g., a URL or a path.
			
 
				-If you clone a dataset into an existing
			
 
				-dataset (as a subdataset), remember to specify the
			
 
				-root of the superdataset with the '-d' option.
			
 
				-
			
 
				-There are two useful functions to display changes between two
			
 
				-states of a dataset: "datalad diff -f/--from COMMIT -t/--to COMMIT"
			
 
				-and "git diff COMMIT COMMIT", where COMMIT is a shasum of a commit
			
 
				-in the history.
			
 
				-
			
 
				-The datalad run command can record the impact a script or command has on a Dataset.
			
 
				-In its simplest form, datalad run only takes a commit message and the command that
			
 
				-should be executed.
			
 
				-
			
 
				-Any datalad run command can be re-executed by using its commit shasum as an argument
			
 
				-in datalad rerun CHECKSUM. DataLad will take information from the run record of the original
			
 
				-commit, and re-execute it. If no changes happen with a rerun, the command will not be written
			
 
				-to history. Note: you can also rerun a datalad rerun command!
			
 
				-
			
 
				-You should specify all files that a command takes as input with an -i/--input flag. These
			
 
				-files will be retrieved prior to the command execution. Any content that is modified or
			
 
				-produced by the command should be specified with an -o/--output flag. Upon a run or rerun
			
 
				-of the command, the contents of these files will get unlocked so that they can be modified.
			
 
				-
			
 
				-Important! If the dataset is not "clean" (a datalad status output is empty),
			
 
				-datalad run will not work - you will have to save modifications present in your
			
 
				-dataset.
			
 
				-A suboptimal alternative is the --explicit flag,
			
 
				-used to record only those changes done
			
 
				-to the files listed with --output flags.
			
 
				-
			
 
				-A source to clone a dataset from can also be a path,
			
 
				-for example as in "datalad clone ../DataLad-101".
			
 
				-
			
 
				-Just as in creating datasets, you can add a
			
 
				-description on the location of the new dataset clone
			
 
				-with the -D/--description option.
			
 
				-
			
 
				-Note that subdatasets will not be installed by default,
			
 
				-but are only registered in the superdataset -- you will
			
 
				-have to do a "datalad get -n PATH/TO/SUBDATASET"
			
 
				-to clone the subdataset for file availability meta data.
			
 
				-The -n/--no-data options prevents that file contents are
			
 
				-also downloaded.
			
 
				-
			
 
				-Note that a recursive "datalad get" would clone all further
			
 
				-registered subdatasets underneath a subdataset, so a safer
			
 
				-way to proceed is to set a decent --recursion-limit:
			
 
				-"datalad get -n -r --recursion-limit 2 <subds>"
			
 
				-
			
 
				-The command "git annex whereis PATH" lists the repositories that have
			
 
				-the file content of an annexed file. When using "datalad get" to retrieve
			
 
				-file content, those repositories will be queried.
			
 
				-
			
 
				-To update a shared dataset, run the command "datalad update --merge".
			
 
				-This command will query its origin for changes, and integrate the
			
 
				-changes into the dataset.
			
 
				-
			
 
				-To update from a dataset with a shared history, you
			
 
				-need to add this dataset as a sibling to your dataset.
			
 
				-"Adding a sibling" means providing DataLad with info about
			
 
				-the location of a dataset, and a name for it. Afterwards,
			
 
				-a "datalad update --merge -s name" will integrate the changes
			
 
				-made to the sibling into the dataset.
			
 
				-A safe step in between is to do a "datalad update -s name"
			
 
				-and checkout the changes with "git/datalad diff"
			
 
				-to remotes/origin/master
			
 
				-
			
 
				-Configurations for datasets exist on different levels
			
 
				-(systemwide, global, and local), and in different types
			
 
				-of files (not version controlled (git)config files, or
			
 
				-version controlled .datalad/config, .gitattributes, or
			
 
				-gitmodules files), or environment variables.
			
 
				-With the exception of .gitattributes, all configuration
			
 
				-files share a common structure, and can be modified with
			
 
				-the git config command, but also with an editor by hand.
			
 
				-
			
 
				-Depending on whether a configuration file is version
			
 
				-controlled or not, the configurations will be shared together
			
 
				-with the dataset. More specific configurations and not-shared
			
 
				-configurations will always take precedence over more global or
			
 
				-shared configurations, and environment variables take precedence
			
 
				-over configurations in files.
			
 
				-
			
 
				-The git config --list --show-origin command is a useful tool
			
 
				-to give an overview over existing configurations. Particularly
			
 
				-important may be the .gitattributes file, in which one can set
			
 
				-rules for git-annex about which files should be version-controlled
			
 
				-with Git instead of being annexed.
			
 
				-
			
 
				-It can be useful to use pre-configured procedures that can apply
			
 
				-configurations, create files or file hierarchies, or perform
			
 
				-arbitrary tasks in datasets. They can be shipped with DataLad,
			
 
				-its extensions, or datasets, and you can even write your own
			
 
				-procedures and distribute them. The "datalad run-procedure"
			
 
				-command is used to apply such a procedure to a dataset. Procedures
			
 
				-shipped with DataLad or its extensions starting with a "cfg" prefix
			
 
				-can also be applied at the creation of a dataset with
			
 
				-"datalad create -c <PROC-NAME> <PATH>" (omitting the "cfg" prefix).
			
 
				+Git has many handy tools to go back in forth in
			
 
				+time and work with the history of datasets.
			
 
				+Among many other things you can rewrite commit
			
 
				+messages, undo changes, or look at previous versions
			
 
				+of datasets. A superb resource to find out more about
			
 
				+this and practice such Git operations is this
			
 
				+chapter in the Pro-git book:
			
 
				+https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History