123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251 |
- .. index::
- pair: dataset nesting; DataLad concept
- .. _nesting2:
- More on dataset nesting
- ^^^^^^^^^^^^^^^^^^^^^^^
- You may have noticed how working in the subdataset felt as if you would be
- working in an independent dataset -- there was no information or influence at
- all from the top-level ``DataLad-101`` superdataset, and you build up a
- completely stand-alone history:
- .. runrecord:: _examples/DL-101-132-101
- :language: console
- :workdir: dl-101/DataLad-101/midterm_project
- $ git log --oneline
- In principle, this is no news to you. From section :ref:`nesting` and the
- YODA principles you already know that nesting allows for a modular reuse of
- any other DataLad dataset, and that this reuse is possible and simple
- precisely because all of the information is kept within a (sub)dataset.
- What is new now, however, is that you applied changes to the dataset. While
- you already explored the looks and feels of the ``longnow`` subdataset in
- previous sections, you now *modified* the contents of the ``midterm_project``
- subdataset.
- How does this influence the superdataset, and how does this look like in the
- superdataset's history? You know from section :ref:`nesting` that the
- superdataset only stores the *state* of the subdataset. Upon creation of the
- dataset, the very first, initial state of the subdataset was thus recorded in
- the superdataset. But now, after you finished your project, your subdataset
- evolved. Let's query the superdataset what it thinks about this.
- .. runrecord:: _examples/DL-101-132-102
- :language: console
- :workdir: dl-101/DataLad-101/midterm_project
- $ # move into the superdataset
- $ cd ../
- $ datalad status
- From the superdataset's perspective, the subdataset appears as being
- "modified". Note how it is not individual files that show up as "modified", but
- indeed the complete subdataset as a single entity.
- What this shows you is that the modifications of the subdataset you performed are not
- automatically recorded to the superdataset. This makes sense, after all it
- should be up to you to decide whether you want record something or not.
- But it is worth repeating: If you modify a subdataset, you will need to save
- this *in the superdataset* in order to have a clean superdataset status.
- Let's save the modification of the subdataset into the history of the
- superdataset. For this, to avoid confusion, you can specify explicitly to
- which dataset you want to save a modification. ``-d .`` specifies the current
- dataset, i.e., ``DataLad-101``, as the dataset to save to:
- .. runrecord:: _examples/DL-101-132-103
- :language: console
- :workdir: dl-101/DataLad-101/
- $ datalad save -d . -m "finished my midterm project" midterm_project
- .. index::
- pair: save modification in nested dataset; with DataLad
- .. find-out-more:: More on how 'datalad save' can operate on nested datasets
- In a superdataset with subdatasets, :dlcmd:`save` by default
- tries to figure out on its own which dataset's history of all available
- datasets a :dlcmd:`save` should be written to. However, it can reduce
- confusion or allow specific operations to be very explicit in the command
- call and tell DataLad where to save what kind of modifications to.
- If you want to save the current state of the subdataset into the superdataset
- (as necessary here), start a ``save`` from the superdataset and have the
- ``-d/--dataset`` option point to its root:
- .. code-block:: console
- $ # in the root of the superds
- $ datalad save -d . -m "update subdataset"
- If you are in the superdataset, and you want to save an unsaved modification
- in a subdataset to the *subdatasets* history, let ``-d/--dataset`` point to
- the subdataset:
- .. code-block:: console
- $ # in the superds
- $ datalad save -d path/to/subds -m "modified XY"
- The recursive option allows you to save any content underneath the specified
- directory, and recurse into any potential subdatasets:
- .. code-block:: console
- $ datalad save . --recursive
- Let's check which subproject commit is now recorded in the superdataset:
- .. runrecord:: _examples/DL-101-132-104
- :language: console
- :workdir: dl-101/DataLad-101/
- :emphasize-lines: 14
- $ git log -p -n 1
- As you can see in the log entry, the subproject commit changed from the
- first commit hash in the subdataset history to the most recent one. With this
- change, therefore, your superdataset tracks the most recent version of
- the ``midterm_project`` dataset, and your dataset's status is clean again.
- This time in DataLad-101 is a convenient moment to dive a bit deeper
- into the functions of the :dlcmd:`status` command. If you are
- interested in this, checkout the :ref:`dedicated Findoutmore <fom-status>`.
- .. index::
- pair: status; DataLad command
- pair: check dataset for modification; with DataLad
- .. find-out-more:: More on 'datalad status'
- :name: fom-status
- :float:
- First of all, let's start with a quick overview of the different content *types*
- and content *states* various :dlcmd:`status` commands in the course
- of DataLad-101 have shown up to this point.
- You have seen the following *content types*:
- - ``file``, e.g., ``notes.txt``: any file (or symlink that is a placeholder to an annexed file)
- - ``directory``, e.g., ``books``: any directory that does not qualify for the ``dataset`` type
- - ``symlink``, e.g., the ``.jgp`` that was manually unlocked in section :ref:`run3`:
- any symlink that is not used as a placeholder for an annexed file
- - ``dataset``, e.g., the ``midterm_project``: any top-level dataset, or any subdataset
- that is properly registered in the superdataset
- And you have seen the following *content states*: ``modified`` and ``untracked``.
- The section :ref:`file system` will show you many instances of ``deleted`` content
- state as well.
- But beyond understanding the report of :dlcmd:`status`, there is also
- additional functionality:
- :dlcmd:`status` can handle status reports for a whole hierarchy
- of datasets, and it can report on a subset of the content across any number of
- datasets in this hierarchy by providing selected paths. This is useful as soon
- as datasets become more complex and contain subdatasets with changing contents.
- When performed without any arguments, :dlcmd:`status` will report
- the state of the current dataset. However, you can specify a path to any
- sub- or superdataset with the ``--dataset`` option.
- In order to demonstrate this a bit better, we will make sure that not only the
- state of the subdataset *within* the superdataset is modified, but also that the
- subdataset contains a modification. For this, let's add an empty text file into
- the ``midterm_project`` subdataset:
- .. runrecord:: _examples/DL-101-132-105
- :language: console
- :workdir: dl-101/DataLad-101
- $ touch midterm_project/an_empty_file
- If you are in the root of ``DataLad-101``, but interested in the status
- *within* the subdataset, simply provide a path (relative to your current location)
- to the command:
- .. runrecord:: _examples/DL-101-132-106
- :language: console
- :workdir: dl-101/DataLad-101
- $ datalad status midterm_project
- Alternatively, to achieve the same, specify the superdataset as the ``--dataset``
- and provide a path to the subdataset *with a trailing path separator* like
- this:
- .. runrecord:: _examples/DL-101-132-107
- :language: console
- :workdir: dl-101/DataLad-101
- $ datalad status -d . midterm_project/
- Note that both of these commands return only the ``untracked`` file and not
- not the ``modified`` subdataset because we're explicitly querying only the
- subdataset for its status.
- If you however, as done outside of this Find-out-more, you want to know about
- the subdataset record in the superdataset without causing a status query for
- the state *within* the subdataset itself, you can also provide an explicit
- path to the dataset (without a trailing path separator). This can be used
- to specify a specific subdataset in the case of a dataset with many subdatasets:
- .. runrecord:: _examples/DL-101-132-108
- :language: console
- :workdir: dl-101/DataLad-101
- $ datalad status -d . midterm_project
- But if you are interested in both the state within the subdataset, and
- the state of the subdataset within the superdataset, you can combine the
- two paths:
- .. runrecord:: _examples/DL-101-132-109
- :language: console
- :workdir: dl-101/DataLad-101
- $ datalad status -d . midterm_project midterm_project/
- Finally, if these subtle differences in the paths are not easy to memorize,
- the ``-r/--recursive`` option will also report you both status aspects:
- .. runrecord:: _examples/DL-101-132-110
- :language: console
- :workdir: dl-101/DataLad-101
- $ datalad status --recursive
- Importantly, the regular output from a :dlcmd:`status` command in the commandline is "condensed" to the most important information by a tailored result renderer.
- You can, however, also get ``status``' unfiltered full output by switching the ``-f``/``--output-format`` from ``tailored`` (the default) to ``json`` or, for the same infos as ``json`` but better readability, ``json_pp``:
- .. runrecord:: _examples/DL-101-132-111
- :language: console
- :workdir: dl-101/DataLad-101
- $ datalad -f json_pp status -d . midterm_project
- This still was not all of the available functionality of the
- :dlcmd:`status` command. You could, for example, adjust whether and
- how untracked dataset content should be reported with the ``--untracked``
- option, or get additional information from annexed content with the ``--annex``
- option (especially powerful when combined with ``-f json_pp``). To get a complete overview on what you could do, check out the technical
- documentation of :dlcmd:`status` `here <https://docs.datalad.org/en/latest/generated/man/datalad-status.html>`_.
- Before we leave this Find-out-more, lets undo the modification of the subdataset
- by removing the untracked file:
- .. runrecord:: _examples/DL-101-132-112
- :language: console
- :workdir: dl-101/DataLad-101
- $ rm midterm_project/an_empty_file
- $ datalad status --recursive
- .. only:: adminmode
- Add a tag at the section end.
- .. runrecord:: _examples/DL-101-132-113
- :language: console
- :workdir: dl-101/DataLad-101
- $ git branch sct_more_on_dataset_nesting
|