101-145-hooks.rst 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198
  1. .. index:: ! 2-003
  2. pair: result hooks; DataLad concept
  3. .. _2-003:
  4. .. _hooks:
  5. DataLad's result hooks
  6. ^^^^^^^^^^^^^^^^^^^^^^
  7. If you are particularly keen on automating tasks in your datasets, you may be
  8. interested in running DataLad commands automatically as soon
  9. as previous commands are executed and resulted in particular outcomes or states.
  10. For example, you may want to automatically :dlcmd:`unlock` all dataset contents
  11. right after an installation in one go. However, you'd also want to make sure that
  12. the :dlcmd:`install` command was *successful* before attempting an
  13. :dlcmd:`unlock`. Therefore, you would like to automatically
  14. run the :dlcmd:`unlock .` command right after the :dlcmd:`install`
  15. command, *but only* if the previous :dlcmd:`install` command was successful.
  16. Such automation allows for flexible and yet automatic responses to the results
  17. of DataLad commands, and can be done with DataLad's *result hooks*.
  18. Generally speaking, `hooks <https://en.wikipedia.org/wiki/Hooking>`__ intercept
  19. function calls or events and allow to extend the functionality of a program.
  20. DataLad's result hooks are calls to other DataLad commands after the command
  21. resulted in a specified result -- such as a successful install.
  22. To understand how hooks can be used and defined, we have to briefly mention
  23. DataLad's *command result evaluations*. Whenever a DataLad
  24. command is executed, an internal evaluation generates a *report* on the status
  25. and result of the command. To get a glimpse into such an evaluation, you can call
  26. any DataLad command with the ``datalad`` option
  27. ``-f/--output-format <default, json, json_pp, tailored, '<template>'>`` to
  28. return the command result evaluations with a specific formatting. Here is how this
  29. can look like for a :dlcmd:`create`::
  30. $ datalad -f json_pp create somedataset
  31. [INFO ] Creating a new annex repo at /tmp/somedataset
  32. {
  33. "action": "create",
  34. "path": "/tmp/somedataset",
  35. "refds": null,
  36. "status": "ok",
  37. "type": "dataset"
  38. }
  39. Internally, this is useful for final result
  40. rendering, error detection, and logging. However, by using hooks, you can
  41. utilize these evaluations for your own purposes and "hook" in more commands
  42. whenever an evaluation fulfills your criteria.
  43. To be able to specify matching criteria, you need to be aware of the potential
  44. criteria you can match against. The evaluation report is a dictionary with
  45. ``key:value`` pairs. :numref:`table-result-keyvalues` provides an overview on
  46. some of the available keys and their possible values.
  47. .. tabularcolumns:: \Y{.33}\Y{.66}
  48. .. list-table:: Common result keys and their values. This is only a selection of
  49. available key-value pairs. The actual set of possible key-value pairs is
  50. potentially unlimited, as any third-party extension could introduce new keys,
  51. for example. If in doubt, use the ``-f/--output-format`` option with the
  52. command of your choice to explore how your matching criteria may look like.
  53. :name: table-result-keyvalues
  54. :widths: 50 100
  55. :header-rows: 1
  56. * - Key name
  57. - Values
  58. * - ``action``
  59. - ``get``, ``install``, ``drop``, ``status``, ... (any command's name)
  60. * - ``type``
  61. - ``file``, ``dataset``, ``symlink``, ``directory``
  62. * - ``status``
  63. - ``ok``, ``notneeded``, ``impossible``, ``error``
  64. * - ``path``
  65. - The path the previous command operated on
  66. These key-value pairs provide the basis to define matching rules that -- once met --
  67. can trigger the execution of custom hooks.
  68. To define a hook based on certain command results, two configuration variables
  69. need to be set:
  70. .. index::
  71. single: configuration item; datalad.result-hook.<name>.match-json
  72. single: configuration item; datalad.result-hook.<name>.call-json
  73. .. code-block:: bash
  74. datalad.result-hook.<name>.match-json
  75. and
  76. .. code-block:: bash
  77. datalad.result-hook.<name>.call-json
  78. Here is what you need to know about these variables:
  79. - The ``<name>`` part of the configurations is the same for both variables, and can be
  80. an arbitrarily [#f2]_ chosen name that serves as an identifier for the hook you are
  81. defining.
  82. - The first configuration variable, ``datalad.result-hook.<name>.match-json``, defines
  83. the requirements that a result evaluation needs to match in order to trigger the hook.
  84. - The second configuration variable, ``datalad.result-hook.<name>.call-json``, defines
  85. what the hook execution comprises. It can be any DataLad command of your choice.
  86. And here is how to set the values for these variables:
  87. - When set via the :gitcmd:`config` command, the value for
  88. ``datalad.result-hook.<name>.match-json`` needs to be specified as
  89. a JSON-encoded dictionary with any number of keys, such as
  90. .. code-block:: bash
  91. {"type": "file", "action": "get", "status": "notneeded"}
  92. This translates to: "Match a "not-needed" after :dlcmd:`get` of a file."
  93. If all specified values in the keys in this dictionary match the values of the
  94. same keys in the result evaluation, the hook is executed. Apart from ``==``
  95. evaluations, ``in``, ``not in``, and ``!=`` are supported. To make use of such
  96. operations, the test value needs to be wrapped into a list, with the first item
  97. being the operation, and the second value the test value, such as
  98. .. code-block:: bash
  99. {"type": ["in", ["file", "directory"]], "action": "get", "status": "notneeded"}
  100. This translates to: "Match a "not-needed" after :dlcmd:`get` of a file or directory."
  101. Another example is
  102. .. code-block:: bash
  103. {"type":"dataset","action":"install","status":["eq", "ok"]}
  104. which translates to: "Match a successful installation of a dataset".
  105. - The value for ``datalad.result-hook.<name>.call-json`` is specified in its
  106. Python notation, and its options -- when set via the :gitcmd:`config`
  107. command -- are specified as a JSON-encoded dictionary
  108. with keyword arguments. Conveniently, a number of string substitutions are
  109. supported: a ``dsarg`` argument expands to the ``dataset`` given to the initial
  110. command the hook operates on, and any key from the result evaluation can be
  111. expanded to the respective value in the result dictionary. Curly braces need to
  112. be escaped by doubling them.
  113. This is not the easiest specification there is, but its also not as hard as it
  114. may sound. Here is how this could look like for a :dlcmd:`unlock`::
  115. $ unlock {{"dataset": "{dsarg}", "path": "{path}"}}
  116. This translates to "unlock the path the previous command operated on, in the
  117. dataset the previous command operated on". Another example is this run command::
  118. $ run {{"cmd": "cp ~/Templates/standard-readme.txt {path}/README", "dataset": "{dsarg}", "explicit": true}}
  119. This translate to "execute a run command in the dataset the previous command operated
  120. on. In this run command, copy a README template file from ``~/Templates/standard-readme.txt``
  121. and place it into the newly created dataset." A final example is this::
  122. $ run_procedure {{"dataset":"{path}","spec":"cfg_metadatatypes bids"}}
  123. This hook will run the procedure ``cfg_metadatatypes`` with the argument ``bids``
  124. and thus set the standard metadata extractor to be bids.
  125. As these variables are configuration variables, they can be set via
  126. :gitcmd:`config` -- either for the dataset (``--local``), or the
  127. user (``--global``) [#f3]_::
  128. $ git config --global --add datalad.result-hook.readme.call-json 'run {{"cmd":"cp ~/Templates/standard-readme.txt {path}/README", "outputs":["{path}/README"], "dataset":"{path}","explicit":true}}'
  129. $ git config --global --add datalad.result-hook.readme.match-json '{"type": "dataset","action":"create","status":"ok"}'
  130. Here is what this writes to the ``~/.gitconfig`` file::
  131. [datalad "result-hook.readme"]
  132. call-json = run {{\"cmd\":\"cp ~/Templates/standard-readme.txt {path}/README\", \"outputs\":[\"{path}/READ>
  133. match-json = {\"type\": \"dataset\",\"action\":\"create\",\"status\":\"ok\"}
  134. Note how characters such as quotation marks are automatically escaped via
  135. backslashes. If you want to set the variables "by hand" with an editor instead
  136. of using :gitcmd:`config`, pay close attention to escape them as well.
  137. Given this configuration in the global ``~/.gitconfig`` file, the
  138. "``readme``" hook would be executed whenever you successfully create a new dataset
  139. with :dlcmd:`create`. The "``readme``" hook would then automatically copy a
  140. file, ``~/Templates/standard-readme.txt`` (this could be a standard README template
  141. you defined), into the new dataset.
  142. .. rubric:: Footnotes
  143. .. [#f2] It only needs to be compatible with :gitcmd:`config`. This means that
  144. it, for example, should not contain any dots (``.``).
  145. .. [#f3] To re-read about the :gitcmd:`config` command and other configurations
  146. of DataLad and its underlying tools, go back to the chapter on Configurations,
  147. starting with :ref:`config`.
  148. **Note that hooks are only read from Git's config files, not .datalad/config!**
  149. Else, this would pose a severe security risk, as it would allow installed datasets to
  150. alter DataLad commands to perform arbitrary executions on a system.