123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239 |
- .. _gitignore:
- How to hide content from DataLad
- --------------------------------
- You have progressed quite far in the DataLad-101 course,
- and by now, you have gotten a good overview on the basics
- and *not-so-basic-anymore*\s of DataLad.
- You know how to add, modify, and save files, even completely
- reproducibly, and how to share your work with others.
- By now, the :dlcmd:`save` command is probably
- the most often used command in this dataset.
- This means that you have seen some of its peculiarities.
- The most striking was that it by default
- will save the complete datasets status if one does not provide
- a path to a file change. This would result in all content
- that is either modified or untracked being saved in a single
- commit.
- In principle, a general recommendation may be to keep your DataLad
- dataset clean. This assists a structured way of working and prevents
- clutter, and it also nicely records provenance inside your dataset.
- If you have content in your dataset that has been untracked for 9 months
- it will be hard to remember where this content came from, whether it
- is relevant, and if it is relevant, for what. Adding content to your
- dataset will thus usually not do harm -- certainly not for your
- dataset.
- However, there may be valid reasons to keep content out of
- DataLad's version control and tracking. Maybe you hide your secret
- ``my-little-pony-themesongs/`` collection within ``Deathmetal/``
- and do not want a record of this in your history or the directory
- being shared together with the rest of the dataset. Who knows?
- We would not judge in any way.
- In principle, you already know a few
- tricks on how to be "messy" and have untracked files.
- For :dlcmd:`save`, you know that precise file paths allow
- you to save only those modifications you want to change.
- For :dlcmd:`run` you know that one
- can specify the ``--explicit`` option
- to only save those modifications that are specified in the ``--output``
- argument.
- Beyond these tricks, there are two ways to leave *untracked* content unaffected
- by a :dlcmd:`save`. One is the ``-u/--updated`` option of
- :dlcmd:`save`::
- $ datalad save -m "my commit message here" -u/--updated
- will only save dataset modifications to previously tracked
- paths. If ``my-little-pony-themesongs/`` is not yet tracked,
- a ``datalad save -u`` will leave it untouched, and its existence
- or content is not written to the history of your dataset.
- A second way of hiding content from DataLad is a ``.gitignore``
- file. As the name suggests, it is a :term:`Git` related solution,
- but it works just as well for DataLad.
- A ``.gitignore`` file is a file that specifies which files should
- be *ignored* by the version control tool.
- To use a ``.gitignore`` file, simply create a file with this
- name in the root of your dataset (be mindful: remember the leading ``.``!).
- You can use one of `thousands of publicly shared examples <https://github.com/github/gitignore>`_,
- or create your own one.
- To specify dataset content to be git-ignored, you can either write
- a full file name, e.g. ``playlists/my-little-pony-themesongs/Friendship-is-magic.mp3``
- into this file, or paths or patterns that make use of globbing, such as
- ``playlists/my-little-pony-themesongs/*``. The :find-out-more:`on general rules for patterns in .gitignore files <fom-gitignore>` contains a helpful overview. Afterwards,
- you just need to save the file once to your dataset so that it is version controlled.
- If you have new content you do not want to track, you can add
- new paths or patterns to the file, and save these modifications.
- Let's try this with a very basic example: Let's git-ignore all content in
- a ``tmp/`` directory in the ``DataLad-101`` dataset:
- .. runrecord:: _examples/DL-101-179-101
- :workdir: dl-101/DataLad-101
- :language: console
- $ cat << EOT > .gitignore
- tmp/*
- EOT
- .. runrecord:: _examples/DL-101-179-102
- :workdir: dl-101/DataLad-101
- :language: console
- $ datalad status
- .. runrecord:: _examples/DL-101-179-103
- :workdir: dl-101/DataLad-101
- :language: console
- $ datalad save -m "add something to ignore" .gitignore
- This ``.gitignore`` file is very minimalistic, but its sufficient to show
- how it works. If you now create a ``tmp/`` directory, all of its contents would be
- ignored by your datasets version control. Let's do so, and add a file into it
- that we do not (yet?) want to save to the dataset's history.
- .. runrecord:: _examples/DL-101-179-104
- :workdir: dl-101/DataLad-101
- :language: console
- $ mkdir tmp
- $ echo "this is just noise" > tmp/a_random_ignored_file
- .. runrecord:: _examples/DL-101-179-105
- :workdir: dl-101/DataLad-101
- :language: console
- $ datalad status
- As expected, the file does not show up as untracked -- it is being
- ignored! Therefore, a ``.gitignore`` file can give you a space inside of
- your dataset to be messy, if you want to be.
- .. find-out-more:: Rules for .gitignore files
- :name: fom-gitignore
- Here are some general rules for the patterns you can put into a ``.gitignore``
- file, taken from the book `Pro Git <https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository#_ignoring>`_ :
- - Blank lines or lines starting with ``#`` are ignored
- - Standard :term:`globbing` patterns work. The line
- .. code-block:: bash
- *.[oa]
- lets all files ending in ``.o`` or ``.a`` be ignored. Importantly, these patterns
- will be applied recursively through your dataset, so that a file matching this
- rule will be ignored, even if it is in a subdirectory of your dataset. If you
- want to ignore specific files in the directory your ``.gitignore`` file lies in,
- but not any subdirectories, start the pattern with a forward slash (``/``), as
- in ``/TODO``.
- - To specify directories, you can end patterns with a forward slash (``/``), for
- example ``build/``.
- - You can negate a pattern by starting it with an exclamation point (``!``), such
- as ``!lib.a``. This would track the file ``lib.a``, even if you would be ignoring
- all other files with ``.a`` extension.
- The manpage of ``gitignore`` has an extensive and well explained overview.
- To read it, simply type ``man gitignore`` into your terminal.
- You can have a single ``.gitignore`` file in the root of your dataset,
- and its rules apply recursively to the entire hierarchy of the dataset (but not
- subdatasets!). Alternatively, you can have additional ``.gitignore`` files in
- subdirectories of your dataset. The rules in these nested ``.gitignore`` files only
- apply to the files under the directory where they are located.
- .. importantnote:: Implications of git-ignored outputs for rerunning
- Note one caveat: If a command creates an output that is git-ignored,
- (e.g. anything inside of ``tmp/`` in our dataset), a subsequent command
- that requires it as an undisclosed input will only succeed if both
- commands a ran in succession. The second command will fail if re-ran on its own,
- however.
- .. find-out-more:: Globally ignoring files
- It is not only possible to define files or patterns for files to ignore inside
- of individual datasets, but to also set global specifications to have every
- single dataset you own ignore certain files or file types.
- This can be useful, for example, for unwanted files that your operating system
- or certain software creates, such as `lock files <https://fileinfo.com/extension/lock>`_,
- `.swp files <https://www.networkworld.com/article/2931534/what-are-unix-swap-swp-files.html>`_,
- `.DS_Store files <https://en.wikipedia.org/wiki/.DS_Store>`_,
- `Thumbs.DB <https://en.wikipedia.org/wiki/Windows_thumbnail_cache#Thumbs.db>`_,
- or others.
- To set rules to ignore files for all of your datasets, you need to create a
- *global* ``.gitignore`` file. The only difference between a repository-specific
- and a global ``.gitignore`` file is its location on your file
- system. You can put it either in its default location ``~/.config/git/ignore``
- (you may need to create the ``~/.config/git`` directory first),
- or place it into any other location and point Git to it. If you create a
- file at ``~/.gitignore_global`` and run
- .. code-block:: bash
- $ git config --global core.excludesfile ~/.gitignore_global
- Git -- and consequently DataLad -- will not bother you about any of the files
- or file types you have specified. The following snippet defines a typical
- collection of ignored files to be defined across different platforms, and should work on Unix-like systems (like macOS and Linux distributions).
- .. code-block:: bash
- $ touch ~/.gitignore_global
- $ for f in .DS_Store ._.DS_Store '*.swp' Thumbs.db ehthumbs.db; do \
- echo "$f" >> ~/.gitignore_global; done
- .. only:: adminmode
- Add a tag at the section end.
- .. runrecord:: _examples/DL-101-179-106
- :language: console
- :workdir: dl-101/DataLad-101
- $ git branch sct_hide_content
- As this is currently the last section in the book, I'll add siblings to the
- published showroom datasets to it here:
- .. runrecord:: _examples/DL-101-179-107
- :language: console
- :workdir: dl-101/DataLad-101
- $ datalad siblings add -d . --name public --url https://github.com/datalad-handbook/DataLad-101.git
- .. runrecord:: _examples/DL-101-179-108
- :language: console
- :workdir: dl-101/DataLad-101/midterm_project
- $ datalad siblings add -d . --name public --url https://github.com/datalad-handbook/midterm_project.git
- .. runrecord:: _examples/DL-101-179-109
- :language: console
- :workdir: dl-101/DataLad-101
- $ git config -f .gitmodules --replace-all submodule.midterm_project.url https://github.com/datalad-handbook/midterm_project
- $ datalad save -m "SERVICE COMMIT - IGNORE. This commit only serves to appropriately reference the subdataset in the public showroom dataset"
- This allows to automatically push all section branches (not accidentally synced or adjusted annex branches) with
- git push. Note: requires git push; datalad publish cannot handle this atm (see https://github.com/datalad/datalad/issues/4006)
- .. runrecord:: _examples/DL-101-179-110
- :language: console
- :workdir: dl-101/DataLad-101
- $ git config --local remote.public.push 'refs/heads/sct*'
- $ git config --local --add remote.public.push 'refs/heads/main'
|