101-121-siblings.rst 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378
  1. .. _sibling:
  2. Networking
  3. ----------
  4. To get a hang on the basics of sharing a dataset,
  5. you shared your ``DataLad-101`` dataset with your
  6. room mate on a common, local file system. Your lucky
  7. room mate now has your notes and can thus try to catch
  8. up to still pass the course.
  9. Moreover, though, he can also integrate all other notes
  10. or changes you make to your dataset, and stay up to date.
  11. This is because a DataLad dataset makes updating shared
  12. data a matter of a single :dlcmd:`update --how merge` command.
  13. But why does this need to be a one-way street? "I want to
  14. provide helpful information for you as well!", says your
  15. room mate. "How could you get any insightful notes that
  16. I make in my dataset, or maybe the results of our upcoming
  17. mid-term project? Its a bit unfair that I can get your work,
  18. but you cannot get mine."
  19. .. index::
  20. pair: register file with URL in dataset; with DataLad
  21. Consider, for example, that your room mate might have googled about DataLad
  22. a bit. In the depths of the web, he might have found useful additional information, such
  23. a script on `dataset nesting <https://raw.githubusercontent.com/datalad/datalad.org/7e8e39b1/content/asciicast/seamless_nested_repos.sh>`_.
  24. Because he found this very helpful in understanding dataset
  25. nesting concepts, he decided to download it from GitHub, and saved it in the ``code/`` directory.
  26. He does it using the DataLad command :dlcmd:`download-url`
  27. that you experienced in section :ref:`createDS` already: This command will
  28. download a file just as ``wget``, but it can also take a commit message
  29. and will save the download right to the history of the dataset that you specify,
  30. while recording its origin as provenance information.
  31. Navigate into your dataset copy in ``mock_user/DataLad-101``,
  32. and run the following command
  33. .. runrecord:: _examples/DL-101-121-101
  34. :language: console
  35. :workdir: dl-101/DataLad-101
  36. :notes: Let's make changes in the copy of the original ds
  37. :cast: 04_collaboration
  38. $ # navigate into the installed copy
  39. $ cd ../mock_user/DataLad-101
  40. $ # download the shell script and save it in your code/ directory
  41. $ datalad download-url \
  42. -d . \
  43. -m "Include nesting demo from datalad website" \
  44. -O code/nested_repos.sh \
  45. https://raw.githubusercontent.com/datalad/datalad.org/7e8e39b1/content/asciicast/seamless_nested_repos.sh
  46. Run a quick ``datalad status``:
  47. .. runrecord:: _examples/DL-101-121-102
  48. :language: console
  49. :workdir: dl-101/mock_user/DataLad-101
  50. :notes: the download url command takes care of saving contents for you
  51. :cast: 04_collaboration
  52. $ datalad status
  53. Nice, the :dlcmd:`download-url` command saved this download
  54. right into the history, and :dlcmd:`status` does not report
  55. unsaved modifications! We'll show an excerpt of the last commit
  56. here [#f1]_:
  57. .. runrecord:: _examples/DL-101-121-103
  58. :language: console
  59. :workdir: dl-101/mock_user/DataLad-101
  60. :lines: 1-13
  61. :notes: the ds copy has a change the original ds does not have:
  62. :cast: 04_collaboration
  63. $ git log -n 1 -p
  64. Suddenly, your room mate has a file change that you do not have.
  65. His dataset evolved.
  66. So how do we link back from the copy of the dataset to its
  67. origin, such that your room mate's changes can be included in
  68. your dataset? How do we let the original dataset "know" about
  69. this copy your room mate has?
  70. Do we need to install the installed dataset of our room mate
  71. as a copy again?
  72. No, luckily, it's simpler and less convoluted. What we have to
  73. do is to *register* a DataLad :term:`sibling`: A reference to our room mate's
  74. dataset in our own, original dataset.
  75. .. index::
  76. pair: sibling; DataLad concept
  77. .. gitusernote:: Remote siblings
  78. Git repositories can configure clones of a dataset as *remotes* in
  79. order to fetch, pull, or push from and to them. A :dlcmd:`sibling`
  80. is the equivalent of a git clone that is configured as a remote.
  81. Let's see how this is done.
  82. .. index::
  83. pair: siblings; DataLad command
  84. pair: register sibling in dataset; with DataLad
  85. First of all, navigate back into the original dataset.
  86. In the original dataset, "add" a "sibling" by using
  87. the :dlcmd:`siblings` command.
  88. The command takes the base command,
  89. :dlcmd:`siblings`, an action, in this case ``add``, a path to the
  90. root of the dataset ``-d .``, a name for the sibling, ``-s/--name roommate``,
  91. and a URL or path to the sibling, ``--url ../mock_user/DataLad-101``.
  92. This registers your room mate's ``DataLad-101`` as a "sibling" (we will call it
  93. "roommate") to your own ``DataLad-101`` dataset.
  94. .. runrecord:: _examples/DL-101-121-104
  95. :language: console
  96. :workdir: dl-101/mock_user/DataLad-101
  97. :notes: To allow updates from copy to original we have to configure the copy as a sibling of the original
  98. :cast: 04_collaboration
  99. $ cd ../../DataLad-101
  100. $ # add a sibling
  101. $ datalad siblings add -d . \
  102. --name roommate --url ../mock_user/DataLad-101
  103. There are a few confusing parts about this command: For one, do not be surprised
  104. about the ``--url`` argument -- it's called "URL" but it can be a path as well.
  105. Also, do not forget to give a name to your dataset's sibling. Without the ``-s``/
  106. ``--name`` argument the command will fail. The reason behind this is that the default
  107. name of a sibling if no name is given will be the host name of the specified URL,
  108. but as you provide a path and not a URL, there is no host name to take as a default.
  109. As you can see in the command output, the addition of a :term:`sibling` succeeded:
  110. ``roommate(+)[../mock_user/DataLad-101]`` means that your room mate's dataset
  111. is now known to your own dataset as "roommate".
  112. .. index::
  113. pair: list dataset siblings; with DataLad
  114. .. runrecord:: _examples/DL-101-121-105
  115. :language: console
  116. :workdir: dl-101/DataLad-101
  117. :notes: we can check which siblings the dataset has
  118. :cast: 04_collaboration
  119. $ datalad siblings
  120. This command will list all known siblings of the dataset. You can see it
  121. in the resulting list with the name "roommate" you have given to it.
  122. .. index::
  123. pair: remove dataset sibling; with DataLad
  124. .. find-out-more:: What if I mistyped the name or want to remove the sibling?
  125. You can remove a sibling using :dlcmd:`siblings remove -s roommate`
  126. The fact that the ``DataLad-101`` dataset now has a sibling means that we
  127. can also :dlcmd:`update` this repository. Awesome!
  128. Your room mate previously ran a :dlcmd:`update --how merge` in the section
  129. :ref:`update`. This got him
  130. changes *he knew you made* into a dataset that *he so far did not change*.
  131. This meant that nothing unexpected would happen with the
  132. :dlcmd:`update --how merge`.
  133. But consider the current case: Your room mate made changes to his
  134. dataset, but you do not necessarily know which. You also made
  135. changes to your dataset in the meantime, and added a note on
  136. :dlcmd:`update`.
  137. How would you know that his changes and
  138. your changes are not in conflict with each other?
  139. This scenario is where a plain :dlcmd:`update` becomes useful.
  140. If you run a plain :dlcmd:`update` (which uses the default option ``--how fetch``), DataLad will query the sibling
  141. for changes, and store those changes in a safe place in your own
  142. dataset, *but it will not yet integrate them into your dataset*.
  143. This gives you a chance to see whether you actually want to have the
  144. changes your room mate made.
  145. .. index::
  146. pair: update dataset from particular sibling; with DataLad
  147. Let's see how it's done. First, run a plain :dlcmd:`update` without
  148. the ``--how merge`` option.
  149. .. runrecord:: _examples/DL-101-121-106
  150. :language: console
  151. :workdir: dl-101/DataLad-101
  152. :notes: now we can update. Problem: how do we know whether we want the changes? --> plain datalad update
  153. :cast: 04_collaboration
  154. $ datalad update -s roommate
  155. Note that we supplied the sibling's name with the ``-s``/``--name`` option.
  156. This is good practice, and allows you to be precise in where you want to get
  157. updates from. It would have worked without the specification (just as a bare
  158. :dlcmd:`update --how merge` worked for your room mate), because there is only
  159. one other known location, though.
  160. This plain :dlcmd:`update` "fetched" updates from
  161. the dataset. The changes however, are not yet visible -- the script that
  162. he added is not yet in your ``code/`` directory:
  163. .. runrecord:: _examples/DL-101-121-107
  164. :language: console
  165. :workdir: dl-101/DataLad-101
  166. :notes: no file changes there yet, but where are they?
  167. :cast: 04_collaboration
  168. $ ls code/
  169. So where is the file? It is in a different *branch* of your dataset.
  170. If you do not use :term:`Git`, the concept of a :term:`branch` can be a big
  171. source of confusion. There will be sections later in this book that will
  172. elaborate a bit more what branches are, and how to work with them, but
  173. for now envision a branch just like a bunch of drawers on your desk.
  174. The paperwork that you have in front of you right on your desk is your
  175. dataset as you currently see it.
  176. These drawers instead hold documents that you are in principle working on,
  177. just not now -- maybe different versions of paperwork you currently have in
  178. front of you, or maybe other files than the ones currently in front of you
  179. on your desk.
  180. Imagine that a :dlcmd:`update` created a small drawer, placed all of
  181. the changed or added files from the sibling inside, and put it on your
  182. desk. You can now take a look into that drawer to see whether you want
  183. to have the changes right in front of you.
  184. The drawer is a branch, and it is usually called ``remotes/origin/main``.
  185. To look inside of it you can :gitcmd:`checkout BRANCHNAME`, or you can
  186. do a ``diff`` between the branch (your drawer) and the dataset as it
  187. is currently in front of you (your desk). We will do the latter, and leave
  188. the former for a different lecture:
  189. .. index::
  190. pair: corresponding branch; in adjusted mode
  191. pair: show dataset modification for particular path; on Windows with DataLad
  192. pair: diff; DataLad command
  193. .. windows-wit:: Please use 'datalad diff --from main --to remotes/roommate/main'
  194. .. include:: topic/adjustedmode-diff-remote.rst
  195. .. runrecord:: _examples/DL-101-121-108
  196. :language: console
  197. :workdir: dl-101/DataLad-101
  198. :notes: on a different branch: remotes/roommate/main. Do a git remote -v here
  199. :cast: 04_collaboration
  200. $ datalad diff --to remotes/roommate/main
  201. This shows us that there is an additional file, and it also shows us
  202. that there is a difference in ``notes.txt``! Let's ask
  203. :gitcmd:`diff` to show us what the differences in detail (note that it is a shortened excerpt, cut in the middle to reduce its length):
  204. .. index::
  205. pair: corresponding branch; in adjusted mode
  206. pair: show dataset modification; on Windows with Git
  207. pair: diff; DataLad command
  208. .. windows-wit:: Please use 'git diff main..remotes/roommate/main'
  209. .. include:: topic/adjustedmode-gitdiff-remote.rst
  210. .. runrecord:: _examples/DL-101-121-109
  211. :language: console
  212. :workdir: dl-101/DataLad-101
  213. :notes: also git diff
  214. :lines: 1-18, 67-78
  215. :cast: 04_collaboration
  216. $ git diff remotes/roommate/main
  217. Let's digress into what is shown here.
  218. We are comparing the current state of your dataset against
  219. the current state of your room mate's dataset. Everything marked with
  220. a ``-`` is a change that your room mate has, but not you: This is the
  221. script that he downloaded!
  222. Everything that is marked with a ``+`` is a change that you have,
  223. but not your room mate: It is the additional note on :dlcmd:`update`
  224. you made in your own dataset in the previous section.
  225. Cool! So now that you know what the changes are that your room mate
  226. made, you can safely :dlcmd:`update --how merge` them to integrate
  227. them into your dataset. In technical terms you will
  228. "*merge the branch remotes/roommate/main into main*".
  229. But the details of this will be stated in a standalone section later.
  230. Note that the fact that your room mate does not have the note
  231. on :dlcmd:`update` does not influence your note. It will not
  232. get deleted by the merge. You do not set your dataset to the state
  233. of your room mate's dataset, but you incorporate all changes he made
  234. -- which is only the addition of the script.
  235. .. runrecord:: _examples/DL-101-121-110
  236. :language: console
  237. :workdir: dl-101/DataLad-101
  238. :notes: no we can safely merge
  239. :cast: 04_collaboration
  240. $ datalad update --how merge -s roommate
  241. The exciting question is now whether your room mate's change is now
  242. also part of your own dataset. Let's list the contents of the ``code/``
  243. directory and also peek into the history:
  244. .. runrecord:: _examples/DL-101-121-111
  245. :language: console
  246. :workdir: dl-101/DataLad-101
  247. :notes: check for the updated files... they are there!
  248. :cast: 04_collaboration
  249. $ ls code/
  250. .. runrecord:: _examples/DL-101-121-112
  251. :language: console
  252. :lines: 1-6
  253. :emphasize-lines: 2, 4
  254. :workdir: dl-101/DataLad-101
  255. :notes: and here is the summary in the log
  256. :cast: 04_collaboration
  257. $ git log --oneline
  258. Wohoo! Here it is: The script now also exists in your own dataset.
  259. You can see the commit that your room mate made when he saved the script,
  260. and you can also see a commit that records how you ``merged`` your
  261. room mate's dataset changes into your own dataset. The commit message of this
  262. latter commit for now might contain many words yet unknown to you if you
  263. do not use Git, but a later section will get into the details of what
  264. the meaning of ":term:`merge`", ":term:`branch`", "refs"
  265. or ":term:`main`" is.
  266. For now, you are happy to have the changes your room mate made available.
  267. This is how it should be! You helped him, and he helps you. Awesome!
  268. There actually is a wonderful word for it: *Collaboration*.
  269. Thus, without noticing, you have successfully collaborated for the first
  270. time using DataLad datasets.
  271. Create a note about this, and save it.
  272. .. runrecord:: _examples/DL-101-121-113
  273. :language: console
  274. :workdir: dl-101/DataLad-101
  275. :notes: write a note
  276. :cast: 04_collaboration
  277. $ cat << EOT >> notes.txt
  278. To update from a dataset with a shared history, you need to add this
  279. dataset as a sibling to your dataset. "Adding a sibling" means
  280. providing DataLad with info about the location of a dataset, and a
  281. name for it.
  282. Afterwards, a "datalad update --how merge -s name" will integrate the
  283. changes made to the sibling into the dataset. A safe step in between
  284. is to do a "datalad update -s name" and checkout the changes with
  285. "git/datalad diff" to remotes/origin/main
  286. EOT
  287. $ datalad save -m "Add note on adding siblings"
  288. .. rubric:: Footnotes
  289. .. [#f1] As this example, simplistically, created a "pretend" room mate by only changing directories, not user accounts, the recorded Git identity of your "room mote" will, of course, be the same as yours.
  290. .. only:: adminmode
  291. Add a tag at the section end.
  292. .. runrecord:: _examples/DL-101-121-114
  293. :language: console
  294. :workdir: dl-101/DataLad-101
  295. $ git branch sct_networking