123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105 |
- .. _text2git:
- Data safety
- -----------
- Later in the day, after seeing and solving so many DataLad error messages,
- you fall tired into your
- bed. Just as you are about to fall asleep, a thought crosses your mind:
- "I now know that tracked content in a dataset is protected by :term:`git-annex`.
- Whenever tracked contents are ``saved``, they get locked and should not be
- modifiable. But... what about the notes that I have been taking since the first day?
- Should I not need to unlock them before I can modify them? And also the script!
- I was able to modify this despite giving it to DataLad to track, with
- no permission denied errors whatsoever! How does that work?"
- This night, though, your question stays unanswered and you fall into a restless
- sleep filled with bad dreams about "permission denied" errors. The next day you are
- the first student in your lecturer's office hours.
- "Oh, you are really attentive. This is a great question!" our lecturer starts
- to explain.
- .. figure:: ../artwork/src/teacher.svg
- :width: 50%
- .. index:: ! dataset procedure; text2git
- Do you remember that we created the ``DataLad-101`` dataset with a
- specific configuration template? It was the ``-c text2git`` option we
- provided in the beginning of :ref:`createDS`. It is because of this configuration
- that we can modify ``notes.txt`` without unlocking its content first.
- The second commit message in our datasets history summarizes this (outputs are shortened):
- .. runrecord:: _examples/DL-101-114-101
- :language: console
- :workdir: dl-101
- :emphasize-lines: 3
- :lines: 1-10
- :realcommand: cd DataLad-101 && git log --reverse --oneline
- :notes: Confusing: Why could we modify the tsv file without unlocking? The reason is in the dataset configuration with text2git
- :cast: 03_git_annex_basics
- $ git log --reverse --oneline
- Instead of giving text files such as your notes or your script
- to git-annex, the dataset stores it in :term:`Git`.
- But what does it mean if files are in Git instead of git-annex?
- Well, procedurally it means that everything that is stored in git-annex is
- content-locked, and everything that is stored in Git is not. You can modify
- content stored in Git straight away, without unlocking it first.
- .. _fig-gitvsannex:
- .. figure:: ../artwork/src/git_vs_gitannex.svg
- :alt: A simplified illustration of content lock in files managed by git-annex.
- :width: 50%
- A simplified overview of the tools that manage data in your dataset.
- That's easy enough, and illustrated in :numref:`fig-gitvsannex`.
- "So, first of all: If we hadn't provided the ``-c text2git`` argument, text files
- would get content-locked, too?". "Yes, indeed. However, there are also ways to
- later change how file content is handled based on its type or size. It can be specified
- in the ``.gitattributes`` file, using ``annex.largefile`` options.
- But there will be a lecture on that [#f1]_."
- "Okay, well, second: Isn't it much easier to just not bother with locking and
- unlocking, and have everything 'stored in Git'? Even if :dlcmd:`run` takes care
- of unlocking content, I do not see the point of git-annex", you continue.
- Here it gets tricky. To begin with the most important, and most straight-forward fact:
- It is not possible to store
- large files in Git. This is because Git would very quickly run into severe performance
- issues. And hosting sites for projects using Git, such as :term:`GitHub` or :term:`GitLab`
- also do not allow files larger than a few dozen MB of size.
- For now, we have solved the mystery of why text files can be modified
- without unlocking, and this is a small
- improvement in the vast amount of questions that have piled up in our curious
- minds. Essentially, git-annex protects your data from accidental modifications
- and thus keeps it safe. :dlcmd:`run` commands mitigate any technical
- complexity of this completely if ``-o/--output`` is specified properly, and
- :dlcmd:`unlock` commands can be used to unlock content "by hand" if
- modifications are performed outside of a :dlcmd:`run`.
- .. index::
- pair: adjusted mode; git-annex concept
- But there comes the second, tricky part: There are ways to get rid of locking and
- unlocking within git-annex, using so-called :term:`adjusted branch`\es.
- This functionality is dependent on the git-annex version one has installed, the git-annex version of the repository, and a use-case dependent comparison of the pros and cons.
- On Windows systems, this *adjusted mode* is even the *only* mode of operation.
- In later sections we will see how to use this feature.
- The next lecture, in any way, will guide us deeper into git-annex, and improve our understanding a slight bit further.
- .. rubric:: Footnotes
- .. [#f1] If you cannot wait to read about ``.gitattributes`` and other
- configuration files, jump ahead to chapter :ref:`chapter_config`,
- starting with section :ref:`config`.
|