101-109-rerun.rst 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319
  1. .. index::
  2. pair: rerun; DataLad command
  3. .. _run2:
  4. DataLad, rerun!
  5. ----------------
  6. So far, you created a ``.tsv`` file of all
  7. speakers and talk titles in the ``longnow/`` podcasts subdataset.
  8. Let's actually take a look into this file now:
  9. .. runrecord:: _examples/DL-101-109-101
  10. :language: console
  11. :workdir: dl-101/DataLad-101
  12. :lines: 1-3,5-7
  13. :append: -✂--✂-
  14. :notes: The script produced a simple list of podcast titles. let's take a look into our output file. What's cool is that is was created in a way that the code and output are linked:
  15. :cast: 02_reproducible_execution
  16. $ less recordings/podcasts.tsv
  17. Not too bad, and certainly good enough for the podcast night people.
  18. What's been cool about creating this file is that it was created with
  19. a script within a :dlcmd:`run` command. Thanks to :dlcmd:`run`,
  20. the output file ``podcasts.tsv`` is associated with the script it
  21. generated.
  22. Upon reviewing the list you realized that you made a mistake, though: you only
  23. listed the talks in the SALT series (the
  24. ``Long_Now__Seminars_About_Long_term_Thinking/`` directory), but not
  25. in the ``Long_Now__Conversations_at_The_Interval/`` directory.
  26. Let's fix this in the script. Replace the contents in ``code/list_titles.sh``
  27. with the following, fixed script:
  28. .. windows-wit:: Here's a script adjustment for Windows users
  29. .. include:: topic/globscript2-windows.rst
  30. .. runrecord:: _examples/DL-101-109-102
  31. :language: console
  32. :workdir: dl-101/DataLad-101
  33. :emphasize-lines: 2
  34. :notes: Dang, we made a mistake in our script: we only listed a part of the podcasts! Let's fix the script:
  35. :cast: 02_reproducible_execution
  36. $ cat << EOT >| code/list_titles.sh
  37. for i in recordings/longnow/Long_Now*/*.mp3; do
  38. # get the filename
  39. base=\$(basename "\$i");
  40. # strip the extension
  41. base=\${base%.mp3};
  42. printf "\${base%%__*}\t" | tr '_' '-';
  43. # name and title without underscores
  44. printf "\${base#*__}\n" | tr '_' ' ';
  45. done
  46. EOT
  47. Because the script is now modified, save the modifications to the dataset.
  48. We can use the shorthand "BF" to denote "Bug fix" in the commit message.
  49. .. runrecord:: _examples/DL-101-109-103
  50. :language: console
  51. :workdir: dl-101/DataLad-101
  52. :cast: 02_reproducible_execution
  53. $ datalad status
  54. .. runrecord:: _examples/DL-101-109-104
  55. :language: console
  56. :workdir: dl-101/DataLad-101
  57. :cast: 02_reproducible_execution
  58. $ datalad save -m "BF: list both directories content" \
  59. code/list_titles.sh
  60. What we *could* do is run the same :dlcmd:`run` command as before to recreate
  61. the file, but now with all of the contents:
  62. .. code-block:: console
  63. $ # do not execute this!
  64. $ datalad run -m "create a list of podcast titles" \
  65. "bash code/list_titles.sh > recordings/podcasts.tsv"
  66. However, think about any situation where the command would be longer than this,
  67. or that is many months past the first execution. It would not be easy to remember
  68. the command, nor would it be very convenient to copy it from the ``run record``.
  69. Luckily, a fellow student remembered the DataLad way of re-executing
  70. a ``run`` command, and he's eager to show it to you.
  71. "In order to re-execute a :dlcmd:`run` command,
  72. find the commit and use its :term:`shasum` (or a :term:`tag`, or anything else that Git
  73. understands) as an argument for the
  74. :dlcmd:`rerun` command! That's it!",
  75. he says happily.
  76. So you go ahead and find the commit :term:`shasum` in your history:
  77. .. runrecord:: _examples/DL-101-109-105
  78. :language: console
  79. :workdir: dl-101/DataLad-101
  80. :lines: 1-12
  81. :emphasize-lines: 8
  82. :notes: We could execute the same command as before. However, we can also let DataLad take care of it, and use the datalad rerun command.
  83. :cast: 02_reproducible_execution
  84. $ git log -n 2
  85. Take that shasum and paste it after :dlcmd:`rerun`
  86. (the first 6-8 characters of the shasum would be sufficient,
  87. here we are using all of them).
  88. .. runrecord:: _examples/DL-101-109-106
  89. :language: console
  90. :workdir: dl-101/DataLad-101
  91. :realcommand: echo "$ datalad rerun $(git rev-parse HEAD~1)" && datalad rerun $(git rev-parse HEAD~1)
  92. :notes: We'll find the shasum of the run commit and plug it into rerun
  93. :cast: 02_reproducible_execution
  94. Now DataLad has made use of the ``run record``, and
  95. re-executed the original command based on the information in it.
  96. Because we updated the script, the output ``podcasts.tsv``
  97. has changed and now contains the podcast
  98. titles of both subdirectories.
  99. You've probably already guessed it, but the easiest way
  100. to check whether a :dlcmd:`rerun`
  101. has changed the desired output file is
  102. to check whether the rerun command appears in the datasets history:
  103. If a :dlcmd:`rerun` does not add or change any content in the dataset,
  104. it will also not be recorded in the history.
  105. .. runrecord:: _examples/DL-101-109-107
  106. :language: console
  107. :workdir: dl-101/DataLad-101
  108. :notes: how does a rerun look in the history?
  109. :cast: 02_reproducible_execution
  110. $ git log -n 1
  111. In the dataset's history,
  112. we can see that a new :dlcmd:`run` was recorded. This action is
  113. committed by DataLad under the original commit message of the ``run``
  114. command, and looks just like the previous :dlcmd:`run` commit.
  115. .. index::
  116. pair: diff; DataLad command
  117. Two cool tools that go beyond the :gitcmd:`log`
  118. are the :dlcmd:`diff` and :gitcmd:`diff` commands.
  119. Both commands can report differences between two states of
  120. a dataset. Thus, you can get an overview of what changed between two commits.
  121. Both commands have a similar, but not identical structure: :dlcmd:`diff`
  122. compares one state (a commit specified with ``-f``/``--from``,
  123. by default the latest change)
  124. and another state from the dataset's history (a commit specified with
  125. ``-t``/``--to``). Let's do a :dlcmd:`diff` between the current state
  126. of the dataset and the previous commit (called "``HEAD~1``" in Git terminology [#f1]_):
  127. .. index::
  128. pair: show dataset modification; on Windows with DataLad
  129. pair: diff; DataLad command
  130. pair: corresponding branch; in adjusted mode
  131. .. windows-wit:: please use 'datalad diff --from main --to HEAD~1'
  132. .. include:: topic/adjustedmode-diff.rst
  133. .. index::
  134. pair: diff; Git command
  135. pair: show dataset modification; with DataLad
  136. .. runrecord:: _examples/DL-101-109-108
  137. :language: console
  138. :workdir: dl-101/DataLad-101
  139. :notes: The datalad diff command can help us find out what changed between the last two commands:
  140. :cast: 02_reproducible_execution
  141. $ datalad diff --to HEAD~1
  142. .. index::
  143. pair: diff; Git command
  144. pair: show dataset modification; with Git
  145. This indeed shows the output file as "modified". However, we do not know
  146. what exactly changed. This is a task for :gitcmd:`diff` (get out of the
  147. diff view by pressing ``q``):
  148. .. runrecord:: _examples/DL-101-109-109
  149. :language: console
  150. :workdir: dl-101/DataLad-101
  151. :notes: The git diff command has even more insights:
  152. :cast: 02_reproducible_execution
  153. :lines: 1-20
  154. $ git diff HEAD~1
  155. This output actually shows the precise changes between the contents created
  156. with the first version of the script and the second script with the bug fix.
  157. All of the files that are added after the second directory
  158. was queried as well are shown in the ``diff``, preceded by a ``+``.
  159. Quickly create a note about these two helpful commands in ``notes.txt``:
  160. .. runrecord:: _examples/DL-101-109-110
  161. :language: console
  162. :workdir: dl-101/DataLad-101
  163. :notes: Let's make a note about this.
  164. :cast: 02_reproducible_execution
  165. $ cat << EOT >> notes.txt
  166. There are two useful functions to display changes between two
  167. states of a dataset: "datalad diff -f/--from COMMIT -t/--to COMMIT"
  168. and "git diff COMMIT COMMIT", where COMMIT is a shasum of a commit
  169. in the history.
  170. EOT
  171. Finally, save this note.
  172. .. runrecord:: _examples/DL-101-109-111
  173. :language: console
  174. :workdir: dl-101/DataLad-101
  175. :cast: 02_reproducible_execution
  176. $ datalad save -m "add note datalad and git diff"
  177. Note that :dlcmd:`rerun` can re-execute the run records of both a :dlcmd:`run`
  178. or a :dlcmd:`rerun` command,
  179. but not with any other type of DataLad command in your history
  180. such as a :dlcmd:`save` on results or outputs after you executed a script.
  181. Therefore, make it a
  182. habit to record the execution of scripts by plugging it into :dlcmd:`run`.
  183. This very basic example of a :dlcmd:`run` is as simple as it can get, but it
  184. is already
  185. convenient from a memory-load perspective: Now you do not need to
  186. remember the commands or scripts involved in creating an output. DataLad kept track
  187. of what you did, and you can instruct it to "``rerun``" it.
  188. Also, incidentally, we have generated :term:`provenance` information. It is
  189. now recorded in the history of the dataset how the output ``podcasts.tsv`` came
  190. into existence. And we can interact with and use this provenance information with
  191. other tools than from the machine-readable ``run record``.
  192. For example, to find out who (or what) created or modified a file,
  193. give the file path to :gitcmd:`log` (prefixed by ``--``):
  194. .. index::
  195. pair: show history for particular paths; on Windows with Git
  196. pair: log; Git command
  197. pair: corresponding branch; in adjusted mode
  198. .. windows-wit:: use 'git log main -- recordings/podcasts.tsv'
  199. .. include:: topic/adjustedmode-log-path.rst
  200. .. index::
  201. pair: show history for particular paths; with Git
  202. .. runrecord:: _examples/DL-101-109-112
  203. :language: console
  204. :workdir: dl-101/DataLad-101
  205. :notes: An amazing thing is that DataLad captured all of the provenance of the output file, and we get use git tools to find out about it
  206. :cast: 02_reproducible_execution
  207. $ git log -- recordings/podcasts.tsv
  208. Neat, isn't it?
  209. Still, this :dlcmd:`run` was very simple.
  210. The next section will demonstrate how :dlcmd:`run` becomes handy in
  211. more complex standard use cases: situations with *locked* contents.
  212. But prior to that, make a note about :dlcmd:`run` and :dlcmd:`rerun` in your
  213. ``notes.txt`` file.
  214. .. runrecord:: _examples/DL-101-109-113
  215. :language: console
  216. :workdir: dl-101/DataLad-101
  217. :notes: Another final note on run and rerun
  218. :cast: 02_reproducible_execution
  219. $ cat << EOT >> notes.txt
  220. The datalad run command can record the impact a script or command has
  221. on a Dataset. In its simplest form, datalad run only takes a commit
  222. message and the command that should be executed.
  223. Any datalad run command can be re-executed by using its commit shasum
  224. as an argument in datalad rerun CHECKSUM. DataLad will take
  225. information from the run record of the original commit, and re-execute
  226. it. If no changes happen with a rerun, the command will not be written
  227. to history. Note: you can also rerun a datalad rerun command!
  228. EOT
  229. Finally, save this note.
  230. .. runrecord:: _examples/DL-101-109-114
  231. :language: console
  232. :workdir: dl-101/DataLad-101
  233. :notes: Another final note on run and rerun
  234. :cast: 02_reproducible_execution
  235. $ datalad save -m "add note on basic datalad run and datalad rerun"
  236. .. only:: adminmode
  237. Add a tag at the section end.
  238. .. runrecord:: _examples/DL-101-109-115
  239. :language: console
  240. :workdir: dl-101/DataLad-101
  241. $ git branch sct_datalad_rerun
  242. .. rubric:: Footnotes
  243. .. [#f1] The section :ref:`history` will elaborate more on common :term:`Git` commands
  244. and terminology.