123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378 |
- .. _sibling:
- Networking
- ----------
- To get a hang on the basics of sharing a dataset,
- you shared your ``DataLad-101`` dataset with your
- room mate on a common, local file system. Your lucky
- room mate now has your notes and can thus try to catch
- up to still pass the course.
- Moreover, though, he can also integrate all other notes
- or changes you make to your dataset, and stay up to date.
- This is because a DataLad dataset makes updating shared
- data a matter of a single :dlcmd:`update --how merge` command.
- But why does this need to be a one-way street? "I want to
- provide helpful information for you as well!", says your
- room mate. "How could you get any insightful notes that
- I make in my dataset, or maybe the results of our upcoming
- mid-term project? Its a bit unfair that I can get your work,
- but you cannot get mine."
- .. index::
- pair: register file with URL in dataset; with DataLad
- Consider, for example, that your room mate might have googled about DataLad
- a bit. In the depths of the web, he might have found useful additional information, such
- a script on `dataset nesting <https://raw.githubusercontent.com/datalad/datalad.org/7e8e39b1/content/asciicast/seamless_nested_repos.sh>`_.
- Because he found this very helpful in understanding dataset
- nesting concepts, he decided to download it from GitHub, and saved it in the ``code/`` directory.
- He does it using the DataLad command :dlcmd:`download-url`
- that you experienced in section :ref:`createDS` already: This command will
- download a file just as ``wget``, but it can also take a commit message
- and will save the download right to the history of the dataset that you specify,
- while recording its origin as provenance information.
- Navigate into your dataset copy in ``mock_user/DataLad-101``,
- and run the following command
- .. runrecord:: _examples/DL-101-121-101
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: Let's make changes in the copy of the original ds
- :cast: 04_collaboration
- $ # navigate into the installed copy
- $ cd ../mock_user/DataLad-101
- $ # download the shell script and save it in your code/ directory
- $ datalad download-url \
- -d . \
- -m "Include nesting demo from datalad website" \
- -O code/nested_repos.sh \
- https://raw.githubusercontent.com/datalad/datalad.org/7e8e39b1/content/asciicast/seamless_nested_repos.sh
- Run a quick ``datalad status``:
- .. runrecord:: _examples/DL-101-121-102
- :language: console
- :workdir: dl-101/mock_user/DataLad-101
- :notes: the download url command takes care of saving contents for you
- :cast: 04_collaboration
- $ datalad status
- Nice, the :dlcmd:`download-url` command saved this download
- right into the history, and :dlcmd:`status` does not report
- unsaved modifications! We'll show an excerpt of the last commit
- here [#f1]_:
- .. runrecord:: _examples/DL-101-121-103
- :language: console
- :workdir: dl-101/mock_user/DataLad-101
- :lines: 1-13
- :notes: the ds copy has a change the original ds does not have:
- :cast: 04_collaboration
- $ git log -n 1 -p
- Suddenly, your room mate has a file change that you do not have.
- His dataset evolved.
- So how do we link back from the copy of the dataset to its
- origin, such that your room mate's changes can be included in
- your dataset? How do we let the original dataset "know" about
- this copy your room mate has?
- Do we need to install the installed dataset of our room mate
- as a copy again?
- No, luckily, it's simpler and less convoluted. What we have to
- do is to *register* a DataLad :term:`sibling`: A reference to our room mate's
- dataset in our own, original dataset.
- .. index::
- pair: sibling; DataLad concept
- .. gitusernote:: Remote siblings
- Git repositories can configure clones of a dataset as *remotes* in
- order to fetch, pull, or push from and to them. A :dlcmd:`sibling`
- is the equivalent of a git clone that is configured as a remote.
- Let's see how this is done.
- .. index::
- pair: siblings; DataLad command
- pair: register sibling in dataset; with DataLad
- First of all, navigate back into the original dataset.
- In the original dataset, "add" a "sibling" by using
- the :dlcmd:`siblings` command.
- The command takes the base command,
- :dlcmd:`siblings`, an action, in this case ``add``, a path to the
- root of the dataset ``-d .``, a name for the sibling, ``-s/--name roommate``,
- and a URL or path to the sibling, ``--url ../mock_user/DataLad-101``.
- This registers your room mate's ``DataLad-101`` as a "sibling" (we will call it
- "roommate") to your own ``DataLad-101`` dataset.
- .. runrecord:: _examples/DL-101-121-104
- :language: console
- :workdir: dl-101/mock_user/DataLad-101
- :notes: To allow updates from copy to original we have to configure the copy as a sibling of the original
- :cast: 04_collaboration
- $ cd ../../DataLad-101
- $ # add a sibling
- $ datalad siblings add -d . \
- --name roommate --url ../mock_user/DataLad-101
- There are a few confusing parts about this command: For one, do not be surprised
- about the ``--url`` argument -- it's called "URL" but it can be a path as well.
- Also, do not forget to give a name to your dataset's sibling. Without the ``-s``/
- ``--name`` argument the command will fail. The reason behind this is that the default
- name of a sibling if no name is given will be the host name of the specified URL,
- but as you provide a path and not a URL, there is no host name to take as a default.
- As you can see in the command output, the addition of a :term:`sibling` succeeded:
- ``roommate(+)[../mock_user/DataLad-101]`` means that your room mate's dataset
- is now known to your own dataset as "roommate".
- .. index::
- pair: list dataset siblings; with DataLad
- .. runrecord:: _examples/DL-101-121-105
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: we can check which siblings the dataset has
- :cast: 04_collaboration
- $ datalad siblings
- This command will list all known siblings of the dataset. You can see it
- in the resulting list with the name "roommate" you have given to it.
- .. index::
- pair: remove dataset sibling; with DataLad
- .. find-out-more:: What if I mistyped the name or want to remove the sibling?
- You can remove a sibling using :dlcmd:`siblings remove -s roommate`
- The fact that the ``DataLad-101`` dataset now has a sibling means that we
- can also :dlcmd:`update` this repository. Awesome!
- Your room mate previously ran a :dlcmd:`update --how merge` in the section
- :ref:`update`. This got him
- changes *he knew you made* into a dataset that *he so far did not change*.
- This meant that nothing unexpected would happen with the
- :dlcmd:`update --how merge`.
- But consider the current case: Your room mate made changes to his
- dataset, but you do not necessarily know which. You also made
- changes to your dataset in the meantime, and added a note on
- :dlcmd:`update`.
- How would you know that his changes and
- your changes are not in conflict with each other?
- This scenario is where a plain :dlcmd:`update` becomes useful.
- If you run a plain :dlcmd:`update` (which uses the default option ``--how fetch``), DataLad will query the sibling
- for changes, and store those changes in a safe place in your own
- dataset, *but it will not yet integrate them into your dataset*.
- This gives you a chance to see whether you actually want to have the
- changes your room mate made.
- .. index::
- pair: update dataset from particular sibling; with DataLad
- Let's see how it's done. First, run a plain :dlcmd:`update` without
- the ``--how merge`` option.
- .. runrecord:: _examples/DL-101-121-106
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: now we can update. Problem: how do we know whether we want the changes? --> plain datalad update
- :cast: 04_collaboration
- $ datalad update -s roommate
- Note that we supplied the sibling's name with the ``-s``/``--name`` option.
- This is good practice, and allows you to be precise in where you want to get
- updates from. It would have worked without the specification (just as a bare
- :dlcmd:`update --how merge` worked for your room mate), because there is only
- one other known location, though.
- This plain :dlcmd:`update` "fetched" updates from
- the dataset. The changes however, are not yet visible -- the script that
- he added is not yet in your ``code/`` directory:
- .. runrecord:: _examples/DL-101-121-107
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: no file changes there yet, but where are they?
- :cast: 04_collaboration
- $ ls code/
- So where is the file? It is in a different *branch* of your dataset.
- If you do not use :term:`Git`, the concept of a :term:`branch` can be a big
- source of confusion. There will be sections later in this book that will
- elaborate a bit more what branches are, and how to work with them, but
- for now envision a branch just like a bunch of drawers on your desk.
- The paperwork that you have in front of you right on your desk is your
- dataset as you currently see it.
- These drawers instead hold documents that you are in principle working on,
- just not now -- maybe different versions of paperwork you currently have in
- front of you, or maybe other files than the ones currently in front of you
- on your desk.
- Imagine that a :dlcmd:`update` created a small drawer, placed all of
- the changed or added files from the sibling inside, and put it on your
- desk. You can now take a look into that drawer to see whether you want
- to have the changes right in front of you.
- The drawer is a branch, and it is usually called ``remotes/origin/main``.
- To look inside of it you can :gitcmd:`checkout BRANCHNAME`, or you can
- do a ``diff`` between the branch (your drawer) and the dataset as it
- is currently in front of you (your desk). We will do the latter, and leave
- the former for a different lecture:
- .. index::
- pair: corresponding branch; in adjusted mode
- pair: show dataset modification for particular path; on Windows with DataLad
- pair: diff; DataLad command
- .. windows-wit:: Please use 'datalad diff --from main --to remotes/roommate/main'
- .. include:: topic/adjustedmode-diff-remote.rst
- .. runrecord:: _examples/DL-101-121-108
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: on a different branch: remotes/roommate/main. Do a git remote -v here
- :cast: 04_collaboration
- $ datalad diff --to remotes/roommate/main
- This shows us that there is an additional file, and it also shows us
- that there is a difference in ``notes.txt``! Let's ask
- :gitcmd:`diff` to show us what the differences in detail (note that it is a shortened excerpt, cut in the middle to reduce its length):
- .. index::
- pair: corresponding branch; in adjusted mode
- pair: show dataset modification; on Windows with Git
- pair: diff; DataLad command
- .. windows-wit:: Please use 'git diff main..remotes/roommate/main'
- .. include:: topic/adjustedmode-gitdiff-remote.rst
- .. runrecord:: _examples/DL-101-121-109
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: also git diff
- :lines: 1-18, 67-78
- :cast: 04_collaboration
- $ git diff remotes/roommate/main
- Let's digress into what is shown here.
- We are comparing the current state of your dataset against
- the current state of your room mate's dataset. Everything marked with
- a ``-`` is a change that your room mate has, but not you: This is the
- script that he downloaded!
- Everything that is marked with a ``+`` is a change that you have,
- but not your room mate: It is the additional note on :dlcmd:`update`
- you made in your own dataset in the previous section.
- Cool! So now that you know what the changes are that your room mate
- made, you can safely :dlcmd:`update --how merge` them to integrate
- them into your dataset. In technical terms you will
- "*merge the branch remotes/roommate/main into main*".
- But the details of this will be stated in a standalone section later.
- Note that the fact that your room mate does not have the note
- on :dlcmd:`update` does not influence your note. It will not
- get deleted by the merge. You do not set your dataset to the state
- of your room mate's dataset, but you incorporate all changes he made
- -- which is only the addition of the script.
- .. runrecord:: _examples/DL-101-121-110
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: no we can safely merge
- :cast: 04_collaboration
- $ datalad update --how merge -s roommate
- The exciting question is now whether your room mate's change is now
- also part of your own dataset. Let's list the contents of the ``code/``
- directory and also peek into the history:
- .. runrecord:: _examples/DL-101-121-111
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: check for the updated files... they are there!
- :cast: 04_collaboration
- $ ls code/
- .. runrecord:: _examples/DL-101-121-112
- :language: console
- :lines: 1-6
- :emphasize-lines: 2, 4
- :workdir: dl-101/DataLad-101
- :notes: and here is the summary in the log
- :cast: 04_collaboration
- $ git log --oneline
- Wohoo! Here it is: The script now also exists in your own dataset.
- You can see the commit that your room mate made when he saved the script,
- and you can also see a commit that records how you ``merged`` your
- room mate's dataset changes into your own dataset. The commit message of this
- latter commit for now might contain many words yet unknown to you if you
- do not use Git, but a later section will get into the details of what
- the meaning of ":term:`merge`", ":term:`branch`", "refs"
- or ":term:`main`" is.
- For now, you are happy to have the changes your room mate made available.
- This is how it should be! You helped him, and he helps you. Awesome!
- There actually is a wonderful word for it: *Collaboration*.
- Thus, without noticing, you have successfully collaborated for the first
- time using DataLad datasets.
- Create a note about this, and save it.
- .. runrecord:: _examples/DL-101-121-113
- :language: console
- :workdir: dl-101/DataLad-101
- :notes: write a note
- :cast: 04_collaboration
- $ cat << EOT >> notes.txt
- To update from a dataset with a shared history, you need to add this
- dataset as a sibling to your dataset. "Adding a sibling" means
- providing DataLad with info about the location of a dataset, and a
- name for it.
- Afterwards, a "datalad update --how merge -s name" will integrate the
- changes made to the sibling into the dataset. A safe step in between
- is to do a "datalad update -s name" and checkout the changes with
- "git/datalad diff" to remotes/origin/main
- EOT
- $ datalad save -m "Add note on adding siblings"
- .. rubric:: Footnotes
- .. [#f1] As this example, simplistically, created a "pretend" room mate by only changing directories, not user accounts, the recorded Git identity of your "room mote" will, of course, be the same as yours.
- .. only:: adminmode
- Add a tag at the section end.
- .. runrecord:: _examples/DL-101-121-114
- :language: console
- :workdir: dl-101/DataLad-101
- $ git branch sct_networking
|