101-132-advancednesting.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251
  1. .. index::
  2. pair: dataset nesting; DataLad concept
  3. .. _nesting2:
  4. More on dataset nesting
  5. ^^^^^^^^^^^^^^^^^^^^^^^
  6. You may have noticed how working in the subdataset felt as if you would be
  7. working in an independent dataset -- there was no information or influence at
  8. all from the top-level ``DataLad-101`` superdataset, and you build up a
  9. completely stand-alone history:
  10. .. runrecord:: _examples/DL-101-132-101
  11. :language: console
  12. :workdir: dl-101/DataLad-101/midterm_project
  13. $ git log --oneline
  14. In principle, this is no news to you. From section :ref:`nesting` and the
  15. YODA principles you already know that nesting allows for a modular reuse of
  16. any other DataLad dataset, and that this reuse is possible and simple
  17. precisely because all of the information is kept within a (sub)dataset.
  18. What is new now, however, is that you applied changes to the dataset. While
  19. you already explored the looks and feels of the ``longnow`` subdataset in
  20. previous sections, you now *modified* the contents of the ``midterm_project``
  21. subdataset.
  22. How does this influence the superdataset, and how does this look like in the
  23. superdataset's history? You know from section :ref:`nesting` that the
  24. superdataset only stores the *state* of the subdataset. Upon creation of the
  25. dataset, the very first, initial state of the subdataset was thus recorded in
  26. the superdataset. But now, after you finished your project, your subdataset
  27. evolved. Let's query the superdataset what it thinks about this.
  28. .. runrecord:: _examples/DL-101-132-102
  29. :language: console
  30. :workdir: dl-101/DataLad-101/midterm_project
  31. $ # move into the superdataset
  32. $ cd ../
  33. $ datalad status
  34. From the superdataset's perspective, the subdataset appears as being
  35. "modified". Note how it is not individual files that show up as "modified", but
  36. indeed the complete subdataset as a single entity.
  37. What this shows you is that the modifications of the subdataset you performed are not
  38. automatically recorded to the superdataset. This makes sense, after all it
  39. should be up to you to decide whether you want record something or not.
  40. But it is worth repeating: If you modify a subdataset, you will need to save
  41. this *in the superdataset* in order to have a clean superdataset status.
  42. Let's save the modification of the subdataset into the history of the
  43. superdataset. For this, to avoid confusion, you can specify explicitly to
  44. which dataset you want to save a modification. ``-d .`` specifies the current
  45. dataset, i.e., ``DataLad-101``, as the dataset to save to:
  46. .. runrecord:: _examples/DL-101-132-103
  47. :language: console
  48. :workdir: dl-101/DataLad-101/
  49. $ datalad save -d . -m "finished my midterm project" midterm_project
  50. .. index::
  51. pair: save modification in nested dataset; with DataLad
  52. .. find-out-more:: More on how 'datalad save' can operate on nested datasets
  53. In a superdataset with subdatasets, :dlcmd:`save` by default
  54. tries to figure out on its own which dataset's history of all available
  55. datasets a :dlcmd:`save` should be written to. However, it can reduce
  56. confusion or allow specific operations to be very explicit in the command
  57. call and tell DataLad where to save what kind of modifications to.
  58. If you want to save the current state of the subdataset into the superdataset
  59. (as necessary here), start a ``save`` from the superdataset and have the
  60. ``-d/--dataset`` option point to its root:
  61. .. code-block:: console
  62. $ # in the root of the superds
  63. $ datalad save -d . -m "update subdataset"
  64. If you are in the superdataset, and you want to save an unsaved modification
  65. in a subdataset to the *subdatasets* history, let ``-d/--dataset`` point to
  66. the subdataset:
  67. .. code-block:: console
  68. $ # in the superds
  69. $ datalad save -d path/to/subds -m "modified XY"
  70. The recursive option allows you to save any content underneath the specified
  71. directory, and recurse into any potential subdatasets:
  72. .. code-block:: console
  73. $ datalad save . --recursive
  74. Let's check which subproject commit is now recorded in the superdataset:
  75. .. runrecord:: _examples/DL-101-132-104
  76. :language: console
  77. :workdir: dl-101/DataLad-101/
  78. :emphasize-lines: 14
  79. $ git log -p -n 1
  80. As you can see in the log entry, the subproject commit changed from the
  81. first commit hash in the subdataset history to the most recent one. With this
  82. change, therefore, your superdataset tracks the most recent version of
  83. the ``midterm_project`` dataset, and your dataset's status is clean again.
  84. This time in DataLad-101 is a convenient moment to dive a bit deeper
  85. into the functions of the :dlcmd:`status` command. If you are
  86. interested in this, checkout the :ref:`dedicated Findoutmore <fom-status>`.
  87. .. index::
  88. pair: status; DataLad command
  89. pair: check dataset for modification; with DataLad
  90. .. find-out-more:: More on 'datalad status'
  91. :name: fom-status
  92. :float:
  93. First of all, let's start with a quick overview of the different content *types*
  94. and content *states* various :dlcmd:`status` commands in the course
  95. of DataLad-101 have shown up to this point.
  96. You have seen the following *content types*:
  97. - ``file``, e.g., ``notes.txt``: any file (or symlink that is a placeholder to an annexed file)
  98. - ``directory``, e.g., ``books``: any directory that does not qualify for the ``dataset`` type
  99. - ``symlink``, e.g., the ``.jgp`` that was manually unlocked in section :ref:`run3`:
  100. any symlink that is not used as a placeholder for an annexed file
  101. - ``dataset``, e.g., the ``midterm_project``: any top-level dataset, or any subdataset
  102. that is properly registered in the superdataset
  103. And you have seen the following *content states*: ``modified`` and ``untracked``.
  104. The section :ref:`file system` will show you many instances of ``deleted`` content
  105. state as well.
  106. But beyond understanding the report of :dlcmd:`status`, there is also
  107. additional functionality:
  108. :dlcmd:`status` can handle status reports for a whole hierarchy
  109. of datasets, and it can report on a subset of the content across any number of
  110. datasets in this hierarchy by providing selected paths. This is useful as soon
  111. as datasets become more complex and contain subdatasets with changing contents.
  112. When performed without any arguments, :dlcmd:`status` will report
  113. the state of the current dataset. However, you can specify a path to any
  114. sub- or superdataset with the ``--dataset`` option.
  115. In order to demonstrate this a bit better, we will make sure that not only the
  116. state of the subdataset *within* the superdataset is modified, but also that the
  117. subdataset contains a modification. For this, let's add an empty text file into
  118. the ``midterm_project`` subdataset:
  119. .. runrecord:: _examples/DL-101-132-105
  120. :language: console
  121. :workdir: dl-101/DataLad-101
  122. $ touch midterm_project/an_empty_file
  123. If you are in the root of ``DataLad-101``, but interested in the status
  124. *within* the subdataset, simply provide a path (relative to your current location)
  125. to the command:
  126. .. runrecord:: _examples/DL-101-132-106
  127. :language: console
  128. :workdir: dl-101/DataLad-101
  129. $ datalad status midterm_project
  130. Alternatively, to achieve the same, specify the superdataset as the ``--dataset``
  131. and provide a path to the subdataset *with a trailing path separator* like
  132. this:
  133. .. runrecord:: _examples/DL-101-132-107
  134. :language: console
  135. :workdir: dl-101/DataLad-101
  136. $ datalad status -d . midterm_project/
  137. Note that both of these commands return only the ``untracked`` file and not
  138. not the ``modified`` subdataset because we're explicitly querying only the
  139. subdataset for its status.
  140. If you however, as done outside of this Find-out-more, you want to know about
  141. the subdataset record in the superdataset without causing a status query for
  142. the state *within* the subdataset itself, you can also provide an explicit
  143. path to the dataset (without a trailing path separator). This can be used
  144. to specify a specific subdataset in the case of a dataset with many subdatasets:
  145. .. runrecord:: _examples/DL-101-132-108
  146. :language: console
  147. :workdir: dl-101/DataLad-101
  148. $ datalad status -d . midterm_project
  149. But if you are interested in both the state within the subdataset, and
  150. the state of the subdataset within the superdataset, you can combine the
  151. two paths:
  152. .. runrecord:: _examples/DL-101-132-109
  153. :language: console
  154. :workdir: dl-101/DataLad-101
  155. $ datalad status -d . midterm_project midterm_project/
  156. Finally, if these subtle differences in the paths are not easy to memorize,
  157. the ``-r/--recursive`` option will also report you both status aspects:
  158. .. runrecord:: _examples/DL-101-132-110
  159. :language: console
  160. :workdir: dl-101/DataLad-101
  161. $ datalad status --recursive
  162. Importantly, the regular output from a :dlcmd:`status` command in the commandline is "condensed" to the most important information by a tailored result renderer.
  163. You can, however, also get ``status``' unfiltered full output by switching the ``-f``/``--output-format`` from ``tailored`` (the default) to ``json`` or, for the same infos as ``json`` but better readability, ``json_pp``:
  164. .. runrecord:: _examples/DL-101-132-111
  165. :language: console
  166. :workdir: dl-101/DataLad-101
  167. $ datalad -f json_pp status -d . midterm_project
  168. This still was not all of the available functionality of the
  169. :dlcmd:`status` command. You could, for example, adjust whether and
  170. how untracked dataset content should be reported with the ``--untracked``
  171. option, or get additional information from annexed content with the ``--annex``
  172. option (especially powerful when combined with ``-f json_pp``). To get a complete overview on what you could do, check out the technical
  173. documentation of :dlcmd:`status` `here <https://docs.datalad.org/en/latest/generated/man/datalad-status.html>`_.
  174. Before we leave this Find-out-more, lets undo the modification of the subdataset
  175. by removing the untracked file:
  176. .. runrecord:: _examples/DL-101-132-112
  177. :language: console
  178. :workdir: dl-101/DataLad-101
  179. $ rm midterm_project/an_empty_file
  180. $ datalad status --recursive
  181. .. only:: adminmode
  182. Add a tag at the section end.
  183. .. runrecord:: _examples/DL-101-132-113
  184. :language: console
  185. :workdir: dl-101/DataLad-101
  186. $ git branch sct_more_on_dataset_nesting