101-137-history.rst 31 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844
  1. .. _history:
  2. Back and forth in time
  3. ----------------------
  4. Almost everyone inadvertently deleted or overwrote files at some point with
  5. a hasty operation that caused data fatalities or at least troubles to
  6. reobtain or restore data.
  7. With DataLad, no mistakes are forever: One powerful feature of datasets
  8. is the ability to revert data to a previous state and thus view earlier content or
  9. correct mistakes. As long as the content was version controlled (i.e., tracked),
  10. it is possible to look at previous states of the data, or revert changes --
  11. even years after they happened -- thanks to the underlying version control
  12. system :term:`Git`.
  13. .. figure:: ../artwork/src/versioncontrol.svg
  14. :width: 70%
  15. To get a glimpse into how to work with the history of a dataset, today's lecture
  16. has an external Git-expert as a guest lecturer.
  17. "I do not have enough time to go through all the details in only
  18. one lecture. But I'll give you the basics, and an idea of what is possible.
  19. Always remember: Just google what you need. You will find thousands of helpful tutorials
  20. or questions on `Stack Overflow <https://stackoverflow.com>`_ right away.
  21. Even experts will *constantly* seek help to find out which Git command to
  22. use, and how to use it.", he reassures with a wink.
  23. The basis of working with the history is to *look at it* with tools such
  24. as :term:`tig`, :term:`gitk`, or simply the :gitcmd:`log` command.
  25. The most important information in an entry (commit) in the history is
  26. the :term:`shasum` (or hash) associated with it.
  27. This hash is how dataset modifications in the history are identified,
  28. and with this hash you can communicate with DataLad or :term:`Git` about these
  29. modifications or version states [#f1]_.
  30. Here is an excerpt from the ``DataLad-101`` history to show a
  31. few abbreviated hashes of the 15 most recent commits [#f2]_:
  32. .. runrecord:: _examples/DL-101-137-101
  33. :workdir: dl-101/DataLad-101
  34. :language: console
  35. $ git log -15 --oneline
  36. "I'll let you people direct this lecture", the guest lecturer proposes.
  37. "You tell me what you would be interested in doing, and I'll show you how it's
  38. done. For the rest of the lecture, call me Google!"
  39. Fixing (empty) commit messages
  40. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  41. From the back of the lecture hall comes a question you are really glad
  42. someone asked: "It has happened to me that I accidentally did a
  43. :dlcmd:`save` and forgot to specify the commit message,
  44. how can I fix this?".
  45. The room nods in agreement -- apparently, others have run into this
  46. premature slip of the ``Enter`` key as well.
  47. Let's demonstrate a simple example. First, let's create some random files.
  48. Do this right in your dataset.
  49. .. runrecord:: _examples/DL-101-137-102
  50. :language: console
  51. :workdir: dl-101/DataLad-101
  52. $ cat << EOT > Gitjoke1.txt
  53. Git knows what you did last summer!
  54. EOT
  55. $ cat << EOT > Gitjoke2.txt
  56. Knock knock. Who's there? Git.
  57. Git-who?
  58. Sorry, 'who' is not a git command - did you mean 'show'?
  59. EOT
  60. $ cat << EOT > Gitjoke3.txt
  61. In Soviet Russia, git commits YOU!
  62. EOT
  63. This will generate three new files in your dataset. Run a
  64. :dlcmd:`status` to verify this:
  65. .. runrecord:: _examples/DL-101-137-103
  66. :language: console
  67. :workdir: dl-101/DataLad-101
  68. $ datalad status
  69. And now:
  70. .. runrecord:: _examples/DL-101-137-104
  71. :language: console
  72. :workdir: dl-101/DataLad-101
  73. $ datalad save
  74. Whooops! A :dlcmd:`save` without a
  75. commit message that saved all of the files.
  76. .. runrecord:: _examples/DL-101-137-105
  77. :language: console
  78. :workdir: dl-101/DataLad-101
  79. :emphasize-lines: 6
  80. $ git log -p -1
  81. As expected, all of the modifications present prior to the
  82. command are saved into the most recent commit, and the commit
  83. message DataLad provides by default, ``[DATALAD] Recorded changes``,
  84. is not very helpful.
  85. Changing the commit message of the most recent commit can be done with
  86. the command :gitcmd:`commit --amend`. Running this command will open
  87. an editor (the default, as configured in Git), and allow you
  88. to change the commit message. Make sure to read the :ref:`find-out-more on changing other than the most recent commit <fom-rebase1>` in case you want to improve the commit message of more commits than only the latest.
  89. Try running the :gitcmd:`commit --amend` command right now and give
  90. the commit a new commit message (you can just delete the one created by
  91. DataLad in the editor)!
  92. .. index::
  93. pair: save --amend; DataLad command
  94. pair: add changes to previous commit; with DataLad
  95. pair: change the last commit message; with DataLad
  96. .. gitusernote:: 'git commit --amend' versus 'datalad save --amend'
  97. Similar to ``git commit``, ``datalad save`` also has an ``--amend`` option.
  98. Like its Git equivalent, it can be used to record changes not in a new, separate commit, but integrate them with the previously saved state.
  99. Though this has not been the use case for ``git commit --amend`` here, experienced Git users will be accustomed to using ``git commit --amend`` to achieve something similar in their Git workflows.
  100. In contrast to ``git commit --amend``, ``datalad save --amend`` will not open up an interactive editor to potentially change a commit message (unless the configuration ``datalad.save.no-message`` is set to ``interactive``), but a new commit message can be supplied with the ``-m``/``--message`` option.
  101. .. index::
  102. pair: change historical commit messages; with Git
  103. pair: rebase; Git command
  104. pair: rewrite history; with Git
  105. .. find-out-more:: Changing the commit messages of not-the-most-recent commits
  106. :name: fom-rebase1
  107. :float:
  108. The :gitcmd:`commit --amend` command will let you
  109. rewrite the commit message of the most recent commit. If you
  110. however need to rewrite commit messages of older commits, you
  111. can do so during a so-called "interactive rebase". The command
  112. for this is
  113. .. code-block:: console
  114. $ git rebase -i HEAD~N
  115. where ``N`` specifies how far back you want to rewrite commits.
  116. ``git rebase -i HEAD~3``, for example, lets you apply changes to the
  117. any number of commit messages within the last three commits.
  118. Be aware that an interactive rebase lets you *rewrite* history.
  119. This can lead to confusion or worse if the history you are rewriting
  120. is shared with others, e.g., in a collaborative project. Be also aware
  121. that rewriting history that is *pushed*/*published* (e.g., to GitHub)
  122. will require a force-push!
  123. Running this command gives you a list of the N most recent commits
  124. in your text editor (which may be :term:`vim`!), sorted with
  125. the most recent commit on the bottom.
  126. This is how it may look like:
  127. .. code-block:: bash
  128. pick 8503f26 Add note on adding siblings
  129. pick 23f0a52 add note on configurations and git config
  130. pick c42cba4 add note on DataLad's procedures
  131. # Rebase b259ce8..c42cba4 onto b259ce8 (3 commands)
  132. #
  133. # Commands:
  134. # p, pick <commit> = use commit
  135. # r, reword <commit> = use commit, but edit the commit message
  136. # e, edit <commit> = use commit, but stop for amending
  137. # s, squash <commit> = use commit, but meld into previous commit
  138. # f, fixup <commit> = like "squash", but discard this commit's log message
  139. # x, exec <command> = run command (the rest of the line) using shell
  140. # b, break = stop here (continue rebase later with 'git rebase --continue')
  141. # d, drop <commit> = remove commit
  142. # l, label <label> = label current HEAD with a name
  143. An interactive rebase allows to apply various modifying actions to any
  144. number of commits in the list. Below the list are descriptions of these
  145. different actions. Among them is "reword", which lets you "edit the commit
  146. message". To apply this action and reword the top-most commit message in this list
  147. (``8503f26 Add note on adding siblings``, three commits back in the history),
  148. exchange the word ``pick`` in the beginning of the line with the word
  149. ``reword`` or simply ``r`` like this:
  150. .. code-block:: bash
  151. r 8503f26 Add note on adding siblings
  152. If you want to reword more than one commit message, exchange several
  153. ``pick``\s. Any commit with the word ``pick`` at the beginning of the line will
  154. be kept as is. Once you are done, save and close the editor. This will
  155. sequentially open up a new editor for each commit you want to reword. In
  156. it, you will be able to change the commit message. Save to proceed to
  157. the next commit message until the rebase is complete.
  158. But be careful not to delete any lines in the above editor view --
  159. **An interactive rebase can be dangerous, and if you remove a line, this commit will be lost!**
  160. .. index::
  161. pair: stop content tracking; with Git
  162. Untracking accidentally saved contents (tracked in Git)
  163. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  164. The next question comes from the front:
  165. "It happened that I forgot to give a path to the :dlcmd:`save`
  166. command when I wanted to only start tracking a very specific file.
  167. Other times I just didn't remember that
  168. additional, untracked files existed in the dataset and saved unaware of
  169. those. I know that it is good practice to only save
  170. those changes together that belong together, so is there a way to
  171. disentangle an accidental :dlcmd:`save` again?"
  172. Let's say instead of saving *all three* previously untracked Git jokes
  173. you intended to save *only one* of those files. What we
  174. want to achieve is to keep all of the files and their contents
  175. in the dataset, but get them out of the history into an
  176. *untracked* state again, and save them *individually* afterwards.
  177. .. importantnote:: Untracking is different for Git versus git-annex!
  178. Note that this is a case with *text files* (stored in Git)! For
  179. accidental annexing of files, please make sure to check out
  180. the next paragraph!
  181. This is a task for the :gitcmd:`reset` command. It essentially allows to
  182. undo commits by resetting the history of a dataset to an earlier version.
  183. :gitcmd:`reset` comes with several *modes* that determine the
  184. exact behavior it, but the relevant one for this aim is ``--mixed`` [#f3]_.
  185. Specifying the command:
  186. .. code-block:: console
  187. $ git reset --mixed COMMIT
  188. will preserve all changes made to files since the specified
  189. commit in the dataset but remove them from the dataset's history.
  190. This means all commits *since* ``COMMIT`` (but *not including* ``COMMIT``)
  191. will not be in your history anymore and become "untracked files" or
  192. "unsaved changes" instead. In other words, the modifications
  193. you made in these commits that are "undone" will still be present
  194. in your dataset -- just not written to the history anymore. Let's
  195. try this to get a feel for it.
  196. The COMMIT in the command can either be a hash or a reference
  197. with the HEAD pointer.
  198. .. index::
  199. pair: branch; Git concept
  200. pair: HEAD; Git concept
  201. .. find-out-more:: Git terminology: branches and HEADs?
  202. A Git repository (and thus any DataLad dataset) is built up as a tree of
  203. commits. A *branch* is a named pointer (reference) to a commit, and allows you
  204. to isolate developments. The default branch is called ``main``. ``HEAD`` is
  205. a pointer to the branch you are currently on, and thus to the last commit
  206. in the given branch.
  207. .. image:: ../artwork/src/git_branch_HEAD.png
  208. :width: 50%
  209. Using ``HEAD``, you can identify the most recent commit, or count backwards
  210. starting from the most recent commit. ``HEAD~1`` is the ancestor of the most
  211. recent commit, i.e., one commit back (``f30ab`` in the figure above). Apart from
  212. the notation ``HEAD~N``, there is also ``HEAD^N`` used to count backwards, but
  213. `less frequently used and of importance primarily in the case of merge
  214. commits <https://stackoverflow.com/q/2221658/10068927>`__.
  215. Let's stay with the hash, and reset to the commit prior to saving the Git jokes.
  216. First, find out the shasum, and afterwards, reset it.
  217. .. runrecord:: _examples/DL-101-137-106
  218. :language: console
  219. :workdir: dl-101/DataLad-101
  220. $ git log -n 3 --oneline
  221. .. runrecord:: _examples/DL-101-137-107
  222. :language: console
  223. :workdir: dl-101/DataLad-101
  224. :realcommand: echo "$ git reset --mixed $(git rev-parse HEAD~1)" && git reset --mixed $(git rev-parse HEAD~1)
  225. Let's see what has happened. First, let's check the history:
  226. .. runrecord:: _examples/DL-101-137-108
  227. :language: console
  228. :workdir: dl-101/DataLad-101
  229. $ git log -n 2 --oneline
  230. As you can see, the commit in which the jokes were tracked
  231. is not in the history anymore! Go on to see what :dlcmd:`status`
  232. reports:
  233. .. runrecord:: _examples/DL-101-137-109
  234. :workdir: dl-101/DataLad-101
  235. :language: console
  236. $ datalad status
  237. Nice, the files are present, and untracked again. Do they contain
  238. the content still? We will read all of them with :shcmd:`cat`:
  239. .. runrecord:: _examples/DL-101-137-110
  240. :workdir: dl-101/DataLad-101
  241. :language: console
  242. $ cat Gitjoke*
  243. Great. Now we can go ahead and save only the file we intended
  244. to track:
  245. .. runrecord:: _examples/DL-101-137-111
  246. :workdir: dl-101/DataLad-101
  247. :language: console
  248. $ datalad save -m "save my favorite Git joke" Gitjoke2.txt
  249. Finally, let's check how the history looks afterwards:
  250. .. runrecord:: _examples/DL-101-137-112
  251. :workdir: dl-101/DataLad-101
  252. :language: console
  253. $ git log -2
  254. Wow! You have rewritten history [#f4]_!
  255. .. index::
  256. pair: stop content tracking; with git-annex
  257. Untracking accidentally saved contents (stored in git-annex)
  258. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  259. The previous :gitcmd:`reset` undid the tracking of *text* files.
  260. However, those files are stored in Git, and thus their content
  261. is also stored in Git. Files that are annexed, however, have
  262. their content stored in git-annex, and not the file itself is stored
  263. in the history, but a symlink pointing to the location of the file
  264. content in the dataset's annex. This has consequences for
  265. a :gitcmd:`reset` command: Reverting a save of a file that is
  266. annexed would revert the save of the symlink into Git, but it will
  267. not revert the *annexing* of the file.
  268. Thus, what will be left in the dataset is an untracked symlink.
  269. To undo an accidental save of that annexed a file, the annexed file
  270. has to be "unlocked" first with a :dlcmd:`unlock` command.
  271. We will simulate such a situation by creating a PDF file that
  272. gets annexed with an accidental :dlcmd:`save`:
  273. .. runrecord:: _examples/DL-101-137-113
  274. :language: console
  275. :workdir: dl-101/DataLad-101
  276. $ # create an empty pdf file
  277. $ convert xc:none -page Letter apdffile.pdf
  278. $ # accidentally save it
  279. $ datalad save
  280. This accidental :dlcmd:`save` has thus added both text files
  281. stored in Git, but also a PDF file to the history of the dataset.
  282. As an :shcmd:`ls -l` reveals, the PDF file has been annexed and is
  283. thus a :term:`symlink`:
  284. .. runrecord:: _examples/DL-101-137-114
  285. :language: console
  286. :realcommand: ls -l --time-style=long-iso apdffile.pdf
  287. :workdir: dl-101/DataLad-101
  288. $ ls -l apdffile.pdf
  289. Prior to resetting, the PDF file has to be unannexed.
  290. To unannex files, i.e., get the contents out of the object tree,
  291. the :dlcmd:`unlock` command is relevant:
  292. .. runrecord:: _examples/DL-101-137-115
  293. :language: console
  294. :workdir: dl-101/DataLad-101
  295. $ datalad unlock apdffile.pdf
  296. The file is now no longer symlinked:
  297. .. runrecord:: _examples/DL-101-137-116
  298. :language: console
  299. :realcommand: ls -l --time-style=long-iso apdffile.pdf
  300. :workdir: dl-101/DataLad-101
  301. $ ls -l apdffile.pdf
  302. Finally, :gitcmd:`reset --mixed` can be used to revert the
  303. accidental :dlcmd:`save`. Again, find out the shasum first, and
  304. afterwards, reset it.
  305. .. runrecord:: _examples/DL-101-137-117
  306. :language: console
  307. :workdir: dl-101/DataLad-101
  308. $ git log -n 3 --oneline
  309. .. runrecord:: _examples/DL-101-137-118
  310. :language: console
  311. :workdir: dl-101/DataLad-101
  312. :realcommand: echo "$ git reset --mixed $(git rev-parse HEAD~1)" && git reset --mixed $(git rev-parse HEAD~1)
  313. To see what has happened, let's check the history:
  314. .. runrecord:: _examples/DL-101-137-119
  315. :language: console
  316. :workdir: dl-101/DataLad-101
  317. $ git log -n 2 --oneline
  318. ... and also the status of the dataset:
  319. .. runrecord:: _examples/DL-101-137-120
  320. :language: console
  321. :workdir: dl-101/DataLad-101
  322. $ datalad status
  323. The accidental save has been undone, and the file is present
  324. as untracked content again.
  325. As before, this action has not been recorded in your history.
  326. Viewing previous versions of files and datasets
  327. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  328. The next question is truly magical: How does one *see*
  329. data as it was at a previous state in history?
  330. This magic trick can be performed with the :gitcmd:`checkout`.
  331. It is a very heavily used command for various tasks, but among
  332. many it can send you back in time to view the state of a dataset
  333. at the time of a specific commit.
  334. Let's say you want to find out which notes you took in the first
  335. few chapters of the handbook. Find a commit :term:`shasum` in your history
  336. to specify the point in time you want to go back to:
  337. .. runrecord:: _examples/DL-101-137-121
  338. :language: console
  339. :workdir: dl-101/DataLad-101
  340. $ git log -n 16 --oneline
  341. Let's go 15 commits back in time:
  342. .. runrecord:: _examples/DL-101-137-122
  343. :language: console
  344. :workdir: dl-101/DataLad-101
  345. :realcommand: echo "$ git checkout $(git rev-parse HEAD~15)" && git checkout $(git rev-parse HEAD~15)
  346. How did your ``notes.txt`` file look at this point?
  347. .. runrecord:: _examples/DL-101-137-123
  348. :language: console
  349. :workdir: dl-101/DataLad-101
  350. $ tail notes.txt
  351. Neat, isn't it? By checking out a commit shasum you can explore a previous
  352. state of a datasets history. And this does not only apply to simple text
  353. files, but every type of file in your dataset, regardless of size.
  354. The checkout command however led to something that Git calls a "detached HEAD state".
  355. While this sounds scary, a :gitcmd:`checkout main` will bring you
  356. back into the most recent version of your dataset and get you out of the
  357. "detached HEAD state":
  358. .. runrecord:: _examples/DL-101-137-124
  359. :language: console
  360. :workdir: dl-101/DataLad-101
  361. $ git checkout main
  362. Note one very important thing: The previously untracked files are still
  363. there.
  364. .. runrecord:: _examples/DL-101-137-125
  365. :language: console
  366. :workdir: dl-101/DataLad-101
  367. $ datalad status
  368. The contents of ``notes.txt`` will now be the most recent version again:
  369. .. runrecord:: _examples/DL-101-137-126
  370. :language: console
  371. :workdir: dl-101/DataLad-101
  372. $ tail notes.txt
  373. ... Wow! You traveled back and forth in time!
  374. But an even more magical way to see the contents of files in previous
  375. versions is Git's :shcmd:`cat-file` command: Among many other things, it lets
  376. you read a file's contents as of any point in time in the history, without a
  377. prior :gitcmd:`checkout` (note that the output is shortened for brevity and shows only the last few lines of the file):
  378. .. runrecord:: _examples/DL-101-137-127
  379. :language: console
  380. :workdir: dl-101/DataLad-101
  381. :lines: 2, 48-
  382. :realcommand: echo "$ git cat-file --textconv $(git rev-parse HEAD~15):notes.txt" && git cat-file --textconv $(git rev-parse HEAD~15):notes.txt
  383. .. index::
  384. pair: cat-file; Git command
  385. The cat-file command is very versatile, and
  386. `it's documentation <https://git-scm.com/docs/git-cat-file>`_ will list all
  387. of its functionality. To use it to see the contents of a file at a previous
  388. state as done above, this is how the general structure looks like:
  389. .. code-block:: console
  390. $ git cat-file --textconv SHASUM:<path/to/file>
  391. Undoing latest modifications of files
  392. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  393. Previously, we saw how to remove files from a datasets history that
  394. were accidentally saved and thus tracked for the first time.
  395. How does one undo a *modification* to a tracked file?
  396. Let's modify the saved ``Gitjoke1.txt``:
  397. .. runrecord:: _examples/DL-101-137-128
  398. :language: console
  399. :workdir: dl-101/DataLad-101
  400. $ echo "this is by far my favorite joke!" >> Gitjoke2.txt
  401. .. runrecord:: _examples/DL-101-137-129
  402. :language: console
  403. :workdir: dl-101/DataLad-101
  404. $ cat Gitjoke2.txt
  405. .. runrecord:: _examples/DL-101-137-130
  406. :language: console
  407. :workdir: dl-101/DataLad-101
  408. $ datalad status
  409. .. runrecord:: _examples/DL-101-137-131
  410. :language: console
  411. :workdir: dl-101/DataLad-101
  412. $ datalad save -m "add joke evaluation to joke" Gitjoke2.txt
  413. How could this modification to ``Gitjoke2.txt`` be undone?
  414. With the :gitcmd:`reset` command again. If you want to
  415. "unsave" the modification but keep it in the file, use
  416. :gitcmd:`reset --mixed` as before. However, if you want to
  417. get rid of the modifications entirely, use the option ``--hard``
  418. instead of ``--mixed``:
  419. .. runrecord:: _examples/DL-101-137-132
  420. :language: console
  421. :workdir: dl-101/DataLad-101
  422. $ git log -n 2 --oneline
  423. .. runrecord:: _examples/DL-101-137-133
  424. :language: console
  425. :workdir: dl-101/DataLad-101
  426. :realcommand: echo "$ git reset --hard $(git rev-parse HEAD~1)" && git reset --hard $(git rev-parse HEAD~1)
  427. .. runrecord:: _examples/DL-101-137-134
  428. :language: console
  429. :workdir: dl-101/DataLad-101
  430. $ cat Gitjoke2.txt
  431. The change has been undone completely. This method will work with
  432. files stored in Git and annexed files.
  433. Note that this operation only restores this one file, because the commit that
  434. was undone only contained modifications to this one file. This is a
  435. demonstration of one of the reasons why one should strive for commits to
  436. represent meaningful logical units of change -- if necessary, they can be
  437. undone easily.
  438. Undoing past modifications of files
  439. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  440. What :gitcmd:`reset` did was to undo commits from
  441. the most recent version of your dataset. How
  442. would one undo a change that happened a while ago, though,
  443. with important changes being added afterwards that you want
  444. to keep?
  445. Let's save a bad modification to ``Gitjoke2.txt``,
  446. but also a modification to ``notes.txt``:
  447. .. runrecord:: _examples/DL-101-137-140
  448. :language: console
  449. :workdir: dl-101/DataLad-101
  450. $ echo "bad modification" >> Gitjoke2.txt
  451. .. runrecord:: _examples/DL-101-137-141
  452. :language: console
  453. :workdir: dl-101/DataLad-101
  454. $ datalad save -m "did a bad modification" Gitjoke2.txt
  455. .. runrecord:: _examples/DL-101-137-142
  456. :language: console
  457. :workdir: dl-101/DataLad-101
  458. $ cat << EOT >> notes.txt
  459. Git has many handy tools to go back in forth in time and work with the
  460. history of datasets. Among many other things you can rewrite commit
  461. messages, undo changes, or look at previous versions of datasets.
  462. A superb resource to find out more about this and practice such Git
  463. operations is this chapter in the Pro-git book:
  464. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History
  465. EOT
  466. .. runrecord:: _examples/DL-101-137-143
  467. :language: console
  468. :workdir: dl-101/DataLad-101
  469. $ datalad save -m "add note on helpful git resource" notes.txt
  470. The objective is to remove the first, "bad" modification, but
  471. keep the more recent modification of ``notes.txt``. A :gitcmd:`reset`
  472. command is not convenient, because resetting would need to reset
  473. the most recent, "good" modification as well.
  474. One way to accomplish it is with an *interactive rebase*, using the
  475. :gitcmd:`rebase -i` command [#f5]_. Experienced Git-users will know
  476. under which situations and how to perform such an interactive rebase.
  477. However, outlining an interactive rebase here in the handbook could lead to
  478. problems for readers without (much) Git experience: An interactive rebase,
  479. even if performed successfully, can lead to many problems if it is applied with
  480. too little experience, for example, in any collaborative real-world project.
  481. .. index::
  482. pair: revert; Git command
  483. Instead, we demonstrate a different, less intrusive way to revert one or more
  484. changes at any point in the history of a dataset: the :gitcmd:`revert`
  485. command.
  486. Instead of *rewriting* the history, it will add an additional commit in which
  487. the changes of an unwanted commit are reverted.
  488. The command looks like this:
  489. .. code-block:: console
  490. $ git revert SHASUM
  491. where ``SHASUM`` specifies the commit hash of the modification that should
  492. be reverted.
  493. .. index::
  494. pair: revert multiple commit; with Git
  495. .. find-out-more:: Reverting more than a single commit
  496. You can also specify a range of commits like this:
  497. .. code-block:: console
  498. $ git revert OLDER_SHASUM..NEWERSHASUM
  499. This command will revert all commits starting with the one after
  500. ``OLDER_SHASUM`` (i.e. **not including** this commit) until and **including**
  501. the one specified with ``NEWERSHASUM``.
  502. For each reverted commit, one new commit will be added to the history that
  503. reverts it. Thus, if you revert a range of three commits, there will be three
  504. reversal commits. If you however want the reversal of a range of commits
  505. saved in a single commit, supply the ``--no-commit`` option as in
  506. .. code-block:: console
  507. $ git revert --no-commit OLDER_SHASUM..NEWERSHASUM
  508. After running this command, run a single ``git commit`` to conclude the
  509. reversal and save it in a single commit.
  510. Let's see how it looks like:
  511. .. runrecord:: _examples/DL-101-137-144
  512. :language: console
  513. :workdir: dl-101/DataLad-101
  514. :realcommand: echo "$ git revert $(git rev-parse HEAD~1)" && git revert $(git rev-parse HEAD~1)
  515. This is the state of the file in which we reverted a modification:
  516. .. runrecord:: _examples/DL-101-137-145
  517. :language: console
  518. :workdir: dl-101/DataLad-101
  519. $ cat Gitjoke2.txt
  520. It does not contain the bad modification anymore. And this is what happened in
  521. the history of the dataset:
  522. .. runrecord:: _examples/DL-101-137-146
  523. :language: console
  524. :workdir: dl-101/DataLad-101
  525. :emphasize-lines: 6-8, 20
  526. $ git log -n 3
  527. The commit that introduced the bad modification is still present, but it
  528. transparently gets undone with the most recent commit. At the same time, the
  529. good modification of ``notes.txt`` was not influenced in any way. The
  530. :gitcmd:`revert` command is thus a transparent and safe way of undoing past
  531. changes. Note though that this command can only be used efficiently if the
  532. commits in your datasets history are meaningful, independent units -- having
  533. several unrelated modifications in a single commit may make an easy solution
  534. with :gitcmd:`revert` impossible and instead require a complex
  535. :shcmd:`checkout`, :shcmd:`revert`, or :shcmd:`rebase` operation.
  536. Finally, let's take a look at the state of the dataset after this operation:
  537. .. runrecord:: _examples/DL-101-137-147
  538. :language: console
  539. :workdir: dl-101/DataLad-101
  540. $ datalad status
  541. As you can see, unsurprisingly, the :gitcmd:`revert` command had no
  542. effects on anything else but the specified commit, and previously untracked
  543. files are still present.
  544. .. index::
  545. pair: resolve merge conflict; with Git
  546. Oh no! I'm in a merge conflict!
  547. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  548. When working with the history of a dataset, especially when rewriting
  549. the history with an interactive rebase or when reverting commits, it is
  550. possible to run into so-called *merge conflicts*.
  551. Merge conflicts happen when Git needs assistance in deciding
  552. which changes to keep and which to apply. It will require
  553. you to edit the file the merge conflict is happening in with
  554. a text editor, but such merge conflict are by far not as scary as
  555. they may seem during the first few times of solving merge conflicts.
  556. This section is not a guide on how to solve merge-conflicts, but a broad
  557. overview on the necessary steps, and a pointer to a more comprehensive guide.
  558. - The first thing to do if you end up in a merge conflict is
  559. to read the instructions Git is giving you -- they are a useful guide.
  560. - Also, it is reassuring to remember that you can always get out of
  561. a merge conflict by aborting the operation that led to it (e.g.,
  562. ``git rebase --abort``).
  563. - To actually solve a merge conflict, you will have to edit files: In the
  564. documents the merge conflict applies to, Git marks the sections it needs
  565. help with with markers that consists of ``>``, ``<``, and ``=``
  566. signs and commit shasums or branch names.
  567. There will be two marked parts, and you have to delete the one you do not
  568. want to keep, as well as all markers.
  569. - Afterwards, run ``git add <path/to/file>`` and finally a ``git commit``.
  570. GitHub has an `excellent resource on how to deal with merge conflicts <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line>`_.
  571. Summary
  572. ^^^^^^^
  573. This guest lecture has given you a glimpse into how to work with the
  574. history of your DataLad datasets.
  575. To conclude this section, let's remove all untracked contents from
  576. the dataset. This can be done with :gitcmd:`clean`: The command
  577. :gitcmd:`clean -f` swipes your dataset clean and removes any untracked
  578. file.
  579. **Careful! This is not revertible, and content lost with this commands cannot be recovered!**
  580. If you want to be extra sure, run :gitcmd:`clean -fn` beforehand -- this will
  581. give you a list of the files that would be deleted.
  582. .. runrecord:: _examples/DL-101-137-148
  583. :language: console
  584. :workdir: dl-101/DataLad-101
  585. $ git clean -f
  586. Afterwards, the :dlcmd:`status` returns nothing, indicating a
  587. clean dataset state with no untracked files or modifications.
  588. .. runrecord:: _examples/DL-101-137-149
  589. :language: console
  590. :workdir: dl-101/DataLad-101
  591. $ datalad status
  592. Finally, if you want, apply your new knowledge about reverting commits
  593. to remove the ``Gitjoke2.txt`` file.
  594. .. only:: adminmode
  595. Add a tag at the section end.
  596. .. runrecord:: _examples/DL-101-137-160
  597. :language: console
  598. :workdir: dl-101/DataLad-101
  599. $ git branch sct_back_and_forth_in_time
  600. .. rubric:: Footnotes
  601. .. [#f1] For example, the :dlcmd:`rerun` command introduced in section
  602. :ref:`run2` takes such a hash as an argument, and re-executes
  603. the ``datalad run`` or ``datalad rerun`` :term:`run record` associated with
  604. this hash. Likewise, the :gitcmd:`diff` command can work with commit hashes.
  605. .. [#f2] There are other alternatives to reference commits in the history of a dataset,
  606. for example, "counting" ancestors of the most recent commit using the notation
  607. ``HEAD~2``, ``HEAD^2`` or ``HEAD@{2}``. However, using hashes to reference
  608. commits is a very fail-save method and saves you from accidentally miscounting.
  609. .. [#f3] The option ``--mixed`` is the default mode for a :gitcmd:`reset`
  610. command, omitting it (i.e., running just ``git reset``) leads to the
  611. same behavior. It is explicitly stated in this book to make the mode
  612. clear, though.
  613. .. [#f4] Note though that rewriting history can be dangerous, and you should
  614. be aware of what you are doing. For example, rewriting parts of the
  615. dataset's history that have been published (e.g., to a GitHub repository)
  616. already or that other people have copies of, is not advised.
  617. .. [#f5] When in need to interactively rebase, please consult further documentation
  618. and tutorials. It is out of the scope of this handbook to be a complete
  619. guide on rebasing, and not all interactive rebasing operations are
  620. complication-free. However, you can always undo mistakes that occur
  621. during rebasing with the help of the `reflog <https://git-scm.com/docs/git-reflog>`_.