101-120-summary.rst 4.4 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. .. _summary_sharelocal:
  2. Summary
  3. -------
  4. Together with your room mate you have just discovered how
  5. to share, update, and collaborate on a DataLad dataset on a shared file system.
  6. Thus, you have glimpsed into the principles and advantages of
  7. sharing a dataset with a simple example.
  8. * To obtain a dataset, one can also use :dlcmd:`clone` with a path.
  9. Potential subdatasets will not be installed right away. As they are registered in
  10. the superdataset, you can
  11. - do ``datalad get -n/--no-data``
  12. - or specify the ``-r``/``--recursive``: ``datalad get -n -r <subds>``
  13. with a decent ``-R/--recursion-limit`` choice to install them afterwards.
  14. * The configuration of the original dataset determines which types
  15. of files will have their content available right after the installation of
  16. the dataset, and which types of files need to be retrieved via
  17. :dlcmd:`get`: Any file content stored in :term:`Git` will be available
  18. right away, while all file content that is ``annexed`` only has
  19. small metadata about its availability attached to it. The original
  20. ``DataLad-101`` dataset used the ``text2git`` configuration template
  21. to store text files such as ``notes.txt`` and ``code/list_titles.sh``
  22. in Git -- these files' content is therefore available right after
  23. installation.
  24. * Annexed content can be retrieved via :dlcmd:`get` from the
  25. file content sources.
  26. * :gitannexcmd:`whereis PATH` will list all locations known to contain file
  27. content for a particular file. It is a very
  28. helpful command to find out where file content resides, and how many
  29. locations with copies exist. :term:`git-annex` will try to retrieve file contents from those locations. If you want, you can describe locations with the
  30. ``--description`` provided during a :dlcmd:`create`.
  31. * A shared copy of a dataset includes the datasets history. If well made,
  32. :dlcmd:`run` commands can then easily be ``rerun``.
  33. * Because an installed dataset knows its origin -- the place it was
  34. originally installed from -- it can be kept up-to-date with the
  35. :dlcmd:`update` command. This command will query the origin of the
  36. dataset for updates, and a :dlcmd:`update --how merge` will integrate
  37. these changes into the dataset copy.
  38. * Thus, using DataLad, data can be easily shared and kept up to date
  39. with only two commands: :dlcmd:`clone` and :dlcmd:`update`.
  40. * By configuring a dataset as a :term:`sibling`, collaboration becomes easy.
  41. * To avoid integrating conflicting modifications of a sibling dataset into your
  42. own dataset, a :dlcmd:`update -s SIBLINGNAME` will "``fetch``" modifications
  43. and store them on a different :term:`branch` of your dataset. The commands
  44. :dlcmd:`diff` and :gitcmd:`diff` can subsequently help to find
  45. out what changes have been made in the sibling.
  46. Now what can I do with that?
  47. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  48. Most importantly, you have experienced the first way of sharing
  49. and updating a dataset.
  50. The example here may strike you as too simplistic, but in later parts of
  51. the book you will see examples in which datasets are shared on the same
  52. file system in surprisingly useful ways.
  53. Simultaneously, you have observed dataset properties you already knew
  54. (for example, how annexed files need to be retrieved via :dlcmd:`get`),
  55. but you have also seen novel aspects of a dataset -- for example, that
  56. subdatasets are not automatically installed by default, how
  57. :gitannexcmd:`whereis` can help you find out where file content might be stored,
  58. how useful commands that capture provenance about the origin or creation of files
  59. (such as :dlcmd:`run` or :dlcmd:`download-url`) are,
  60. or how a shared dataset can be updated to reflect changes that were made
  61. to the original dataset.
  62. Also, you have successfully demonstrated a large number of DataLad dataset
  63. principles to your room mate: How content stored in Git is present right
  64. away and how annexed content first needs to be retrieved, how easy a
  65. :dlcmd:`rerun` is if the original :dlcmd:`run` command was well
  66. specified, how a datasets history is shared and not only its data.
  67. Lastly, with the configuration of a sibling, you have experienced one
  68. way to collaborate in a dataset, and with :dlcmd:`update --how merge`
  69. and :dlcmd:`update`, you also glimpsed into more advances aspects
  70. of Git, namely the concept of a branch.
  71. Therefore, these last few sections have hopefully been a good review
  72. of what you already knew, but also a big knowledge gain, and cause
  73. joyful anticipation of collaboration in a real-world setting of one
  74. of your own use cases.