101-106-nesting.rst 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
  1. .. index::
  2. pair: dataset nesting; with DataLad
  3. .. _nesting:
  4. Dataset nesting
  5. ---------------
  6. Without noticing, the previous section demonstrated another core principle
  7. and feature of DataLad datasets: *Nesting*.
  8. Within DataLad datasets one can *nest* other DataLad
  9. datasets arbitrarily deep. We for example just installed one dataset, the
  10. ``longnow`` podcasts, *into* another dataset, the ``DataLad-101`` dataset.
  11. This was done by supplying the ``--dataset``/``-d`` flag in the command call.
  12. At first glance, nesting does not seem particularly spectacular --
  13. after all, any directory on a file system can have other directories inside of it.
  14. The possibility for nested Datasets, however, is one of many advantages
  15. DataLad datasets have:
  16. One aspect of nested datasets is that any DataLad dataset
  17. (*subdataset* or *superdataset*) keeps their stand-alone
  18. history. The top-level DataLad dataset (the *superdataset*) only stores
  19. *which version* of the subdataset is currently used through an identifier.
  20. Let's dive into that.
  21. Remember how we had to navigate into ``recordings/longnow`` to see the history,
  22. and how this history was completely independent of the ``DataLad-101``
  23. superdataset history? This was the subdataset's own history.
  24. Apart from stand-alone histories of super- or subdatasets, this highlights another
  25. very important advantage that nesting provides: Note that the ``longnow`` dataset
  26. is a completely independent, standalone dataset that was once created and
  27. published. Nesting allows for a modular reuse of any other DataLad dataset,
  28. and this reuse is possible and simple precisely because all of the information
  29. is kept within a (sub)dataset.
  30. But now let's also check out how the *superdataset's* (``DataLad-101``) history
  31. looks like after the addition of a subdataset. To do this, make sure you are
  32. *outside* of the subdataset ``longnow``. Note that the first commit is our recent
  33. addition to ``notes.txt``, so we'll look at the second most recent commit in
  34. this excerpt.
  35. .. index::
  36. pair: show commit patches; with Git
  37. .. runrecord:: _examples/DL-101-106-101
  38. :language: console
  39. :workdir: dl-101/DataLad-101
  40. :lines: 1, 22-62
  41. :emphasize-lines: 25
  42. :realcommand: git log -p
  43. :cast: 01_dataset_basics
  44. :notes: The superdataset only stores the version of the subdataset. Let's take a look into how the superdataset's history looks like
  45. $ git log -p -n 3
  46. We have highlighted the important part of this rather long commit summary.
  47. Note that you cannot see any ``.mp3``\s being added to the dataset,
  48. as was previously the case when we :dlcmd:`save`\d PDFs that we
  49. downloaded into ``books/``. Instead,
  50. DataLad stores what it calls a *subproject commit* of the subdataset.
  51. The cryptic character sequence in this line is the :term:`shasum` we have briefly
  52. mentioned before, and it is the identifier that
  53. DataLad internally used to identify the files and the changes to the files in the subdataset. Exactly this
  54. :term:`shasum` is what identifies the state of the subdataset.
  55. Navigate back into ``longnow`` and try to find the highlighted shasum in the
  56. subdataset's history:
  57. .. runrecord:: _examples/DL-101-106-102
  58. :language: console
  59. :workdir: dl-101/DataLad-101
  60. :lines: 1-9
  61. :emphasize-lines: 3
  62. :cast: 01_dataset_basics
  63. :notes: We can find this shasum in the subdatasets history: it's the most recent change
  64. $ cd recordings/longnow
  65. $ git log --oneline
  66. We can see that it is the most recent commit shasum of the subdataset
  67. (albeit we can see only the first seven characters here -- a :gitcmd:`log`
  68. would show you the full shasum). Thus, your dataset does not only know the origin
  69. of its subdataset, but also which version of the subdataset to use,
  70. i.e., it has the identifier of the stage/version in the subdataset's evolution to be used.
  71. This is what is meant by "the top-level DataLad dataset (the *superdataset*) only stores
  72. *which version* of the subdataset is currently used through an identifier".
  73. Importantly, once we learn how to make use of the history of a dataset,
  74. we can set subdatasets to previous states, or *update* them.
  75. .. index::
  76. pair: temporary working directory change; with Git
  77. .. find-out-more:: Do I have to navigate into the subdataset to see it's history?
  78. Previously, we used :shcmd:`cd` to navigate into the subdataset, and
  79. subsequently opened the Git log. This is necessary, because a :gitcmd:`log`
  80. in the superdataset would only return the superdatasets history.
  81. While moving around with ``cd`` is straightforward, you also found it
  82. slightly annoying from time to time to use the ``cd`` command so often and also
  83. to remember in which directory you currently are in. There is one
  84. trick, though: ``git -C`` and ``datalad -C`` (note that it is a capital C) let you perform any
  85. Git or DataLad command in a provided path. Providing this option together with a path to
  86. a Git or DataLad command let's you run the command as if it was started in this path
  87. instead of the current working directory.
  88. Thus, from the root of ``DataLad-101``, this command would have given you the
  89. subdataset's history as well:
  90. .. code-block:: console
  91. $ git -C recordings/longnow log --oneline
  92. In the upcoming sections, we'll experience the perks of dataset nesting
  93. frequently, and everything that might seem vague at this point will become
  94. clearer. To conclude this demonstration,
  95. :numref:`fignesting` illustrates the current state of our dataset, ``DataLad-101``, with its nested subdataset.
  96. Thus, without being consciously aware of it, by taking advantage of dataset
  97. nesting, we took a dataset ``longnow`` and installed it as a
  98. subdataset within the superdataset ``DataLad-101``.
  99. .. _fignesting:
  100. .. figure:: ../artwork/src/virtual_dstree_dl101.svg
  101. :width: 70%
  102. Virtual directory tree of a nested DataLad dataset
  103. If you have executed the above code snippets, make sure to go back into the
  104. root of the dataset again:
  105. .. runrecord:: _examples/DL-101-106-103
  106. :language: console
  107. :workdir: dl-101/DataLad-101/recordings/longnow
  108. :cast: 01_dataset_basics
  109. $ cd ../../