101-179-gitignore.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239
  1. .. _gitignore:
  2. How to hide content from DataLad
  3. --------------------------------
  4. You have progressed quite far in the DataLad-101 course,
  5. and by now, you have gotten a good overview on the basics
  6. and *not-so-basic-anymore*\s of DataLad.
  7. You know how to add, modify, and save files, even completely
  8. reproducibly, and how to share your work with others.
  9. By now, the :dlcmd:`save` command is probably
  10. the most often used command in this dataset.
  11. This means that you have seen some of its peculiarities.
  12. The most striking was that it by default
  13. will save the complete datasets status if one does not provide
  14. a path to a file change. This would result in all content
  15. that is either modified or untracked being saved in a single
  16. commit.
  17. In principle, a general recommendation may be to keep your DataLad
  18. dataset clean. This assists a structured way of working and prevents
  19. clutter, and it also nicely records provenance inside your dataset.
  20. If you have content in your dataset that has been untracked for 9 months
  21. it will be hard to remember where this content came from, whether it
  22. is relevant, and if it is relevant, for what. Adding content to your
  23. dataset will thus usually not do harm -- certainly not for your
  24. dataset.
  25. However, there may be valid reasons to keep content out of
  26. DataLad's version control and tracking. Maybe you hide your secret
  27. ``my-little-pony-themesongs/`` collection within ``Deathmetal/``
  28. and do not want a record of this in your history or the directory
  29. being shared together with the rest of the dataset. Who knows?
  30. We would not judge in any way.
  31. In principle, you already know a few
  32. tricks on how to be "messy" and have untracked files.
  33. For :dlcmd:`save`, you know that precise file paths allow
  34. you to save only those modifications you want to change.
  35. For :dlcmd:`run` you know that one
  36. can specify the ``--explicit`` option
  37. to only save those modifications that are specified in the ``--output``
  38. argument.
  39. Beyond these tricks, there are two ways to leave *untracked* content unaffected
  40. by a :dlcmd:`save`. One is the ``-u/--updated`` option of
  41. :dlcmd:`save`::
  42. $ datalad save -m "my commit message here" -u/--updated
  43. will only save dataset modifications to previously tracked
  44. paths. If ``my-little-pony-themesongs/`` is not yet tracked,
  45. a ``datalad save -u`` will leave it untouched, and its existence
  46. or content is not written to the history of your dataset.
  47. A second way of hiding content from DataLad is a ``.gitignore``
  48. file. As the name suggests, it is a :term:`Git` related solution,
  49. but it works just as well for DataLad.
  50. A ``.gitignore`` file is a file that specifies which files should
  51. be *ignored* by the version control tool.
  52. To use a ``.gitignore`` file, simply create a file with this
  53. name in the root of your dataset (be mindful: remember the leading ``.``!).
  54. You can use one of `thousands of publicly shared examples <https://github.com/github/gitignore>`_,
  55. or create your own one.
  56. To specify dataset content to be git-ignored, you can either write
  57. a full file name, e.g. ``playlists/my-little-pony-themesongs/Friendship-is-magic.mp3``
  58. into this file, or paths or patterns that make use of globbing, such as
  59. ``playlists/my-little-pony-themesongs/*``. The :find-out-more:`on general rules for patterns in .gitignore files <fom-gitignore>` contains a helpful overview. Afterwards,
  60. you just need to save the file once to your dataset so that it is version controlled.
  61. If you have new content you do not want to track, you can add
  62. new paths or patterns to the file, and save these modifications.
  63. Let's try this with a very basic example: Let's git-ignore all content in
  64. a ``tmp/`` directory in the ``DataLad-101`` dataset:
  65. .. runrecord:: _examples/DL-101-179-101
  66. :workdir: dl-101/DataLad-101
  67. :language: console
  68. $ cat << EOT > .gitignore
  69. tmp/*
  70. EOT
  71. .. runrecord:: _examples/DL-101-179-102
  72. :workdir: dl-101/DataLad-101
  73. :language: console
  74. $ datalad status
  75. .. runrecord:: _examples/DL-101-179-103
  76. :workdir: dl-101/DataLad-101
  77. :language: console
  78. $ datalad save -m "add something to ignore" .gitignore
  79. This ``.gitignore`` file is very minimalistic, but its sufficient to show
  80. how it works. If you now create a ``tmp/`` directory, all of its contents would be
  81. ignored by your datasets version control. Let's do so, and add a file into it
  82. that we do not (yet?) want to save to the dataset's history.
  83. .. runrecord:: _examples/DL-101-179-104
  84. :workdir: dl-101/DataLad-101
  85. :language: console
  86. $ mkdir tmp
  87. $ echo "this is just noise" > tmp/a_random_ignored_file
  88. .. runrecord:: _examples/DL-101-179-105
  89. :workdir: dl-101/DataLad-101
  90. :language: console
  91. $ datalad status
  92. As expected, the file does not show up as untracked -- it is being
  93. ignored! Therefore, a ``.gitignore`` file can give you a space inside of
  94. your dataset to be messy, if you want to be.
  95. .. find-out-more:: Rules for .gitignore files
  96. :name: fom-gitignore
  97. Here are some general rules for the patterns you can put into a ``.gitignore``
  98. file, taken from the book `Pro Git <https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository#_ignoring>`_ :
  99. - Blank lines or lines starting with ``#`` are ignored
  100. - Standard :term:`globbing` patterns work. The line
  101. .. code-block:: bash
  102. *.[oa]
  103. lets all files ending in ``.o`` or ``.a`` be ignored. Importantly, these patterns
  104. will be applied recursively through your dataset, so that a file matching this
  105. rule will be ignored, even if it is in a subdirectory of your dataset. If you
  106. want to ignore specific files in the directory your ``.gitignore`` file lies in,
  107. but not any subdirectories, start the pattern with a forward slash (``/``), as
  108. in ``/TODO``.
  109. - To specify directories, you can end patterns with a forward slash (``/``), for
  110. example ``build/``.
  111. - You can negate a pattern by starting it with an exclamation point (``!``), such
  112. as ``!lib.a``. This would track the file ``lib.a``, even if you would be ignoring
  113. all other files with ``.a`` extension.
  114. The manpage of ``gitignore`` has an extensive and well explained overview.
  115. To read it, simply type ``man gitignore`` into your terminal.
  116. You can have a single ``.gitignore`` file in the root of your dataset,
  117. and its rules apply recursively to the entire hierarchy of the dataset (but not
  118. subdatasets!). Alternatively, you can have additional ``.gitignore`` files in
  119. subdirectories of your dataset. The rules in these nested ``.gitignore`` files only
  120. apply to the files under the directory where they are located.
  121. .. importantnote:: Implications of git-ignored outputs for rerunning
  122. Note one caveat: If a command creates an output that is git-ignored,
  123. (e.g. anything inside of ``tmp/`` in our dataset), a subsequent command
  124. that requires it as an undisclosed input will only succeed if both
  125. commands a ran in succession. The second command will fail if re-ran on its own,
  126. however.
  127. .. find-out-more:: Globally ignoring files
  128. It is not only possible to define files or patterns for files to ignore inside
  129. of individual datasets, but to also set global specifications to have every
  130. single dataset you own ignore certain files or file types.
  131. This can be useful, for example, for unwanted files that your operating system
  132. or certain software creates, such as `lock files <https://fileinfo.com/extension/lock>`_,
  133. `.swp files <https://www.networkworld.com/article/2931534/what-are-unix-swap-swp-files.html>`_,
  134. `.DS_Store files <https://en.wikipedia.org/wiki/.DS_Store>`_,
  135. `Thumbs.DB <https://en.wikipedia.org/wiki/Windows_thumbnail_cache#Thumbs.db>`_,
  136. or others.
  137. To set rules to ignore files for all of your datasets, you need to create a
  138. *global* ``.gitignore`` file. The only difference between a repository-specific
  139. and a global ``.gitignore`` file is its location on your file
  140. system. You can put it either in its default location ``~/.config/git/ignore``
  141. (you may need to create the ``~/.config/git`` directory first),
  142. or place it into any other location and point Git to it. If you create a
  143. file at ``~/.gitignore_global`` and run
  144. .. code-block:: bash
  145. $ git config --global core.excludesfile ~/.gitignore_global
  146. Git -- and consequently DataLad -- will not bother you about any of the files
  147. or file types you have specified. The following snippet defines a typical
  148. collection of ignored files to be defined across different platforms, and should work on Unix-like systems (like macOS and Linux distributions).
  149. .. code-block:: bash
  150. $ touch ~/.gitignore_global
  151. $ for f in .DS_Store ._.DS_Store '*.swp' Thumbs.db ehthumbs.db; do \
  152. echo "$f" >> ~/.gitignore_global; done
  153. .. only:: adminmode
  154. Add a tag at the section end.
  155. .. runrecord:: _examples/DL-101-179-106
  156. :language: console
  157. :workdir: dl-101/DataLad-101
  158. $ git branch sct_hide_content
  159. As this is currently the last section in the book, I'll add siblings to the
  160. published showroom datasets to it here:
  161. .. runrecord:: _examples/DL-101-179-107
  162. :language: console
  163. :workdir: dl-101/DataLad-101
  164. $ datalad siblings add -d . --name public --url https://github.com/datalad-handbook/DataLad-101.git
  165. .. runrecord:: _examples/DL-101-179-108
  166. :language: console
  167. :workdir: dl-101/DataLad-101/midterm_project
  168. $ datalad siblings add -d . --name public --url https://github.com/datalad-handbook/midterm_project.git
  169. .. runrecord:: _examples/DL-101-179-109
  170. :language: console
  171. :workdir: dl-101/DataLad-101
  172. $ git config -f .gitmodules --replace-all submodule.midterm_project.url https://github.com/datalad-handbook/midterm_project
  173. $ datalad save -m "SERVICE COMMIT - IGNORE. This commit only serves to appropriately reference the subdataset in the public showroom dataset"
  174. This allows to automatically push all section branches (not accidentally synced or adjusted annex branches) with
  175. git push. Note: requires git push; datalad publish cannot handle this atm (see https://github.com/datalad/datalad/issues/4006)
  176. .. runrecord:: _examples/DL-101-179-110
  177. :language: console
  178. :workdir: dl-101/DataLad-101
  179. $ git config --local remote.public.push 'refs/heads/sct*'
  180. $ git config --local --add remote.public.push 'refs/heads/main'