101-135-help.rst 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429
  1. .. _help:
  2. How to get help
  3. ---------------
  4. All DataLad errors or problems you encounter during ``DataLad-101`` are intentional
  5. and serve illustrative purposes. But what if you run into any DataLad errors
  6. outside of this course?
  7. Fortunately, the syllabus has a whole section on that, and on
  8. one lazy, warm summer afternoon you flip through it.
  9. .. figure:: ../artwork/src/reading.svg
  10. :width: 50%
  11. You realize that you already know the most important things:
  12. The number one advice on how to get help is
  13. "Read the error message".
  14. The second advice it
  15. "I'm not kidding: Read the error message".
  16. The third advice, finally, says
  17. "Honestly, read the f***ing error message".
  18. Help yourself
  19. ^^^^^^^^^^^^^
  20. If you run into a DataLad problem and you have followed the three
  21. steps above, but the error message
  22. `does not give you a clue on how to proceed <https://xkcd.com/1833>`_,
  23. the first you should do is
  24. #. find out which *version* of DataLad you use
  25. #. read the *help page* of the command that failed
  26. The first step is important in order to find out whether a
  27. command failed due to using a wrong DataLad version. In order
  28. to use this book and follow along, your DataLad version
  29. should be ``datalad-0.18`` or higher, for example.
  30. To find out which version you are using, run
  31. .. runrecord:: _examples/DL-101-135-101
  32. :language: console
  33. :workdir: dl-101/DataLad-101
  34. $ datalad --version
  35. .. index::
  36. pair: wtf; DataLad command
  37. pair: get system information; with DataLad
  38. If you want a comprehensive overview of your full setup,
  39. :dlcmd:`wtf` [#f1]_ is the command to turn to. Running this command will
  40. generate a report about the DataLad installation and configuration.
  41. The output below shows an excerpt.
  42. .. runrecord:: _examples/DL-101-135-102
  43. :language: console
  44. :workdir: dl-101/DataLad-101
  45. :linereplace:
  46. ,PATH: /tmp/.*,PATH: REDACTED,
  47. :lines: 1-10
  48. $ datalad wtf
  49. This lengthy output will report all information that might
  50. be relevant -- from DataLad to :term:`git-annex` or Python
  51. up to your operating system.
  52. The second step, finding and reading the help page of the command
  53. in question, is important in order to find out how the
  54. command that failed is used. Are arguments specified correctly?
  55. Does the help list any caveats?
  56. There are multiple ways to find help on DataLad commands.
  57. You could turn to the `documentation <https://docs.datalad.org>`_.
  58. Alternatively, to get help right inside the terminal,
  59. run any command with the ``-h``/``--help`` option (shown
  60. as an excerpt here):
  61. .. runrecord:: _examples/DL-101-135-103
  62. :language: console
  63. :workdir: dl-101/DataLad-101
  64. :lines: 1-16,83-92,101-112
  65. :append: -✂--✂-
  66. $ datalad get --help
  67. This, for example, is the help page on :dlcmd:`get`, the same you would find in the documentation, but in your terminal (here heavily trimmed to only show the main components).
  68. It contains a command description, a list
  69. of all the available options with a short explanation of them, and
  70. example commands. The two *arguments* sections provide a comprehensive
  71. list of command arguments with details on their possibilities and
  72. requirements. A first thing to check would be whether your command call
  73. specified all of the required arguments.
  74. An additional source of information is the `PsyInf knowledge base
  75. <https://knowledge-base.psychoinformatics.de>`_. It contains a curated
  76. collection of solutions and workarounds that have not yet made it into other
  77. documentation.
  78. Asking questions (right)
  79. ^^^^^^^^^^^^^^^^^^^^^^^^
  80. If nothing you do on your own helps to solve the problem,
  81. consider asking others. Check out `neurostars <https://neurostars.org>`_
  82. and search for your problem -- likely,
  83. `somebody already encountered the same error before <https://xkcd.com/979>`_
  84. and fixed it, but if not, just ask a new question with a ``datalad`` tag.
  85. Make sure your question is as informative as it can be for others.
  86. Include
  87. - *context* -- what did you want to do and why?
  88. - the *problem* -- post the error message, and provide the
  89. steps necessary to reproduce it. Do not shorten the error message, unless it contains sensitive information.
  90. - *technical details* -- what version of DataLad are you using, what version
  91. of git-annex, and which git-annex repository type, what is your operating
  92. system and -- if applicable -- Python version? :dlcmd:`wtf` is your friend
  93. to find all of this information.
  94. .. index:: debugging
  95. Debugging like a DataLad-developer
  96. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  97. If you have read a command's help from start to end, checked all software versions twice, even `asked colleagues to reproduce your problem (unsuccessfully) <https://xkcd.com/2083>`_, and you still don't have any clue what is going on, then welcome to the debugging section!
  98. .. figure:: ../artwork/src/debug.svg
  99. :width: 50%
  100. It's not as bad as this
  101. It is not always straightforward to see *why* a particular DataLad command has failed.
  102. Given that operations with DataLad can be quite complicated, and could involve complexities such as different forms of authentication, different file systems, interactions with the environment, configurations, and other software, and *much* more, there are what may feel like an infinite amount of sources for the problem at hand.
  103. The resulting error message, however, may not display the underlying cause correctly because the error message of whichever process failed is not propagated into the final result report.
  104. In situations where there is no obvious reason for a command to fail, it can be helpful -- either for yourself or for further information to include in :term:`GitHub` issues -- to start `debugging <https://xkcd.com/1722>`_, or *logging at a higher granularity* than is the default.
  105. This allows you to gain more insights into the actions DataLad and its underlying tools are taking, where *exactly* they fail, and to even play around with the program at the state of the failure.
  106. :term:`Debugging` and :term:`logging` are not as complex as these terms may sound if you have never consciously debugged.
  107. Procedurally, it can be as easy as adding an additional flag to a command call, and cognitively, it can be as easy as engaging your visual system in a visual search task for the color red or the word "error", or reading more DataLad output than you are used to.
  108. We will start with the general concepts, and then collect concrete debugging strategies for different problems.
  109. .. _logging:
  110. Logging
  111. """""""
  112. In order to gain more insights into the steps performed by a program and capture as many details as possible for troubleshooting an error, you can turn to :term:`logging`.
  113. Logging simply refers to the fact that DataLad and its underlying tools tell you what they are doing:
  114. This information can be coarse, such as a mere ``[INFO] Downloading <some_url> into <some_target>``, or fine-grained, such as ``[DEBUG] Resolved dataset for status reporting: <dataset>``.
  115. The :term:`log level` in brackets at the beginning of the line indicates how many details DataLad shares with you.
  116. Note that :term:`logging` is not a sealed book, and happens automatically during the execution of any DataLad command.
  117. While you were reading the handbook, you have seen a lot of log messages already.
  118. Anything printed to your terminal preceded by ``[INFO]``, for example, is a log message (in this case, on the ``info`` level).
  119. When you are *consciously* logging, you simply set the log-level to the desired amount of information, or increase the amount of verbosity until the output gives you a hint of what went wrong.
  120. Likewise, adjusting the log-level also works the other way around, and lets you *decrease* the amount of information you receive in your terminal.
  121. .. index::
  122. pair: log level; DataLad concept
  123. .. find-out-more:: Log levels
  124. Log levels provide the means to adjust how much information you want, and are described in human readable terms, ordered by the severity of the failures or problems reported.
  125. The following log levels can be chosen from:
  126. - **critical**: Only catastrophes are reported. Currently, there is nothing inside of DataLad that would log at this level, so setting the log level to *critical* will result in getting no details at all, not even about errors or failures.
  127. - **error**: With this log level you will receive reports on any errors that occurred within the program during command execution.
  128. - **warning**: At this log level, the command execution will report on usual situations and anything that *might* be a problem, in addition to report anything from the *error* log level. .
  129. - **info**: This log level will include reports by the program that indicate normal behavior and serve to keep you up to date about the current state of things, in additions to warning and error logging messages.
  130. - **debug**: This log level is very useful to troubleshoot a problem, and results in DataLad telling you *a lot* about what it is doing.
  131. Other than log *levels*, you can also adjust the amount of information provided with numerical granularity. Instead of specifying a log level, provide an integer between 1 and 50, with lower values denoting more debugging information.
  132. Raising the log level (e.g, to ``error``, or ``40``) will decrease the amount of information and output you will receive, while lowering it (e.g., to ``debug`` or ``10``) will increase it.
  133. Setting a log level can be done in the form of an :term:`environment variable`, a configuration, or with the ``-l``/``--log-level`` flag appended directly after the main :shcmd:`datalad` command.
  134. To get extensive information on what :dlcmd:`status` does underneath the hood, your command could look like this (but its output is shortened):
  135. .. runrecord:: _examples/DL-101-135-105
  136. :language: console
  137. :workdir: dl-101/DataLad-101
  138. :lines: 1,6, 67-
  139. $ datalad --log-level debug status
  140. .. index::
  141. single: configuration item; datalad.log.level
  142. pair: configure verbosity of command output; with DataLad
  143. .. find-out-more:: ... and how does it look when using environment variables or configurations?
  144. The log level can also be set (for different scopes) using the ``datalad.log.level`` configuration variable, or the corresponding environment variable ``DATALAD_LOG_LEVEL``.
  145. To set the log level for a single command, for example, set it in front of the command:
  146. .. code-block:: console
  147. $ DATALAD_LOG_LEVEL=debug datalad status
  148. And to set the log level for the rest of the shell session, export it:
  149. .. code-block:: console
  150. $ export DATALAD_LOG_LEVEL=debug
  151. $ datalad status
  152. $ ...
  153. You can find out a bit more on environment variable :ref:`in the Findoutmore on environment variables <fom-envvar>`.
  154. The configuration variable can be used to set the log level on a user (global) or system-wide level with the :gitcmd:`config` command:
  155. .. code-block:: console
  156. $ git config --global datalad.log.level debug
  157. This output is extensive and detailed, but it precisely shows the sequence of commands and arguments that are run prior to a failure or crash, and all additional information that is reported with the log levels ``info`` or ``debug`` can be very helpful to find out what is wrong.
  158. Even if the vast amount of detail in output generated with ``debug`` logging appears overwhelming, it can make sense to find out at which point an execution stalls, whether arguments, commands, or datasets reported in the debug output are what you expect them to be, and to forward all information into any potential GitHub issue you will be creating.
  159. Finally, other than logging with a DataLad command, it sometimes can be useful to turn to :term:`git-annex` or :term:`Git` for logging.
  160. For failing :dlcmd:`get` calls, it may be useful to retry the retrieval using :gitannexcmd:`get -d -v <file>`, where ``-d`` (debug) and ``-v`` (verbose) increase the amount of detail about the command execution and failure.
  161. In rare cases where you suspect something might be wrong with Git, setting the environment variables ``GIT_TRACE`` and ``GIT_TRACE_SETUP`` to ``2`` prior to running a Git command will give you debugging output.
  162. .. _debug:
  163. Debugging
  164. """""""""
  165. If the additional level of detail provided by logging messages is not enough, you can go further with actual :term:`debugging`.
  166. For this, add the ``--dbg`` or ``--idbg`` flag to the main :shcmd:`datalad` command, as in ``datalad --dbg status``.
  167. Adding this flag will enter a `Python <https://docs.python.org/3/library/pdb.html>`_ or `IPython debugger <https://ipython.org>`_ when something unexpectedly crashes.
  168. This allows you to debug the program right when it fails, inspect available variables and their values, or step back and forth through the source code.
  169. Note that debugging experience is not a prerequisite when using DataLad -- although it is `an exciting life skill <https://www.monkeyuser.com/2017/step-by-step-debugging>`_.
  170. `The official Python docs <https://docs.python.org/3/library/pdb.html#debugger-commands>`_ provide a good overview on the available debugger commands if you are interested in learning more about this.
  171. Debugging: A concrete example
  172. """""""""""""""""""""""""""""
  173. It is common for :dlcmd:`get` errors to originate in :term:`git-annex`, the software used by DataLad to transfer data. Here are a few suggestions to debug them:
  174. - Take a deep breath, or preferably a walk in a nice park :)
  175. - Check that you are using a recent version of git-annex
  176. - ``git-annex version`` returns the version of git-annex on the first line of its input, and it is also reported in the output of :dlcmd:`wtf`.
  177. - The version number contains the release date of the version in use. For instance, git-annex version: ``8.20200330-g971791563`` was released on 30 March 2020.
  178. - If the version that you are using is older than a few months, consider updating using the instructions in :ref:`install`.
  179. - Try to download the file using ``git-annex get -v -d <file_name>``. If this doesn't succeed, the DataLad command may not succeed. Options ``-d/--debug`` and ``-v`` are here to provide as much verbosity in error messages as possible
  180. - Read the output of :term:`git-annex`, identify the error, breathe again, and solve the issue!
  181. Common warnings and errors
  182. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  183. A lot of output you will see while working with DataLad originates from warnings
  184. or errors by DataLad, git-annex, or Git.
  185. Some of these outputs can be wordy and not trivial to comprehend - and even if
  186. everything works, some warnings can be hard to understand.
  187. This following section will list some common git-annex warnings and errors and
  188. attempts to explain them. If you encounter warnings or errors that you would
  189. like to see explained in this book, please let us know by
  190. `filing an issue <https://github.com/datalad-handbook/book/issues/new>`_.
  191. Output produced by Git
  192. """"""""""""""""""""""
  193. **Unset Git identity**
  194. If you have not configured your Git identity, you will
  195. see warnings like this when running any DataLad command:
  196. .. code-block:: console
  197. [WARNING] It is highly recommended to configure git first (set both user.name and user.email) before using DataLad.
  198. To set your Git identity, go back to section :ref:`installconfig`.
  199. **Rejected pushes**
  200. One error you can run into when publishing dataset contents is that your
  201. :dlcmd:`push` to a sibling is rejected.
  202. One example is this:
  203. .. code-block:: console
  204. $ datalad push --to public
  205. [ERROR ] refs/heads/main->public:refs/heads/main [rejected] (non-fast-forward) [publish(/home/me/dl-101/DataLad-101)]
  206. This example is an attempt to push a local dataset to its sibling on GitHub. The
  207. push is rejected because it is a ``non-fast-forward`` merge situation. Usually,
  208. this means that the sibling contains changes that your local dataset does not yet
  209. know about. It can be fixed by updating from the sibling first with a
  210. :dlcmd:`update --merge`.
  211. .. _nonbarepush:
  212. Here is a different push rejection:
  213. .. code-block:: console
  214. $ datalad push --to roommate
  215. publish(ok): . (dataset) [refs/heads/git-annex->roommate:refs/heads/git-annex 023a541..59a6f8d]
  216. [ERROR ] refs/heads/main->roommate:refs/heads/main [remote rejected] (branch is currently checked out) [publish(/home/me/dl-101/DataLad-101)]
  217. publish(error): . (dataset) [refs/heads/main->roommate:refs/heads/main [remote rejected] (branch is currently checked out)]
  218. action summary:
  219. publish (error: 1, ok: 1)
  220. As you can see, the :term:`git-annex branch` was pushed successfully, but updating
  221. the ``main`` branch was rejected: ``[remote rejected] (branch is currently checked out) [publish(/home/me/dl-101/DataLad-101)]``.
  222. In this particular case, this is because it was an attempt to push from ``DataLad-101``
  223. to the ``roommate`` sibling that was created in chapter :ref:`chapter_collaboration`.
  224. This is a special case of pushing, because it -- in technical terms -- is a push
  225. to a non-bare repository. Unlike :term:`bare Git repositories`, non-bare
  226. repositories cannot be pushed to at all times. To fix this, you either want to
  227. checkout another branch
  228. in the ``roommate`` sibling or push to a non-checked out branch in the ``roommate``
  229. sibling. Alternatively, you can configure ``roommate`` to receive the push with
  230. Git's ``receive.denyCurrentBranch`` configuration key. By default, this configuration
  231. is set to ``refuse``. Setting it to ``updateInstead``
  232. with ``git config receive.denyCurrentBranch updateInstead`` will allow updating
  233. the checked out branch. See ``git config``\s
  234. `man page entry <https://git-scm.com/docs/git-config#Documentation/git-config.txt-receivedenyCurrentBranch>`_
  235. on ``receive.denyCurrentBranch`` for more.
  236. **Detached HEADs**
  237. One warning that you may encounter during an installation of a dataset is:
  238. .. code-block:: console
  239. [INFO ] Submodule HEAD got detached. Resetting branch main to point to 046713bb. Original location was 47e53498
  240. Even though "detached HEAD" sounds slightly worrisome, this is merely an information
  241. and does not require an action from your side. It is related to
  242. `Git submodules <https://git-scm.com/book/en/v2/Git-Tools-Submodules>`_ (the underlying
  243. Git concept for subdatasets), and informs you about the current state a
  244. subdataset is saved in the superdataset you have just cloned.
  245. Output produced by git-annex
  246. """"""""""""""""""""""""""""
  247. **Unusable annexes**
  248. Upon installation of a dataset, you may see:
  249. .. code-block:: console
  250. [INFO ] Remote origin not usable by git-annex; setting annex-ignore
  251. [INFO ] This could be a problem with the git-annex installation on the
  252. remote. Please make sure that git-annex-shell is available in PATH when you
  253. ssh into the remote. Once you have fixed the git-annex installation,
  254. run: git annex enableremote origin
  255. This warning lets you know that git-annex will not attempt to download
  256. content from the :term:`remote` "origin". This can have
  257. many reasons, but as long as there are other remotes you can access the
  258. data from, you are fine.
  259. A similar warning message may appear when adding a sibling that is a pure Git
  260. :term:`remote`, such as a repository on GitHub:
  261. .. code-block:: console
  262. [INFO ] Failed to enable annex remote github, could be a pure git or not
  263. accessible
  264. [WARNING] Failed to determine if github carries annex. Remote was marked by
  265. annex as annex-ignore. Edit .git/config to reset if you think that was done
  266. by mistake due to absent connection etc
  267. These messages indicate that the sibling ``github`` does not carry an annex.
  268. Thus, annexed file contents cannot be pushed to this sibling. This happens
  269. if the sibling indeed does not have an annex (which would be true, for example,
  270. for siblings on :term:`GitHub`, :term:`GitLab`, :term:`Bitbucket`, ..., and
  271. would not require any further action or worry), or
  272. if the remote could not be reached, e.g., due to a missing internet
  273. connection (in which case you could set the key ``annex-ignore`` in
  274. ``.git/config`` to ``false``).
  275. Speaking of remotes that are not available, this will probably be one of the most
  276. commonly occurring git-annex errors to see - failing :dlcmd:`get` calls
  277. because remotes are not available:
  278. Other errors
  279. ^^^^^^^^^^^^
  280. Sometimes, registered subdatasets URLs have an :term:`SSH` instead of :term:`https` address, for example ``git@github.com:datalad-datasets/longnow-podcasts.git`` instead of ``https://github.com/datalad-datasets/longnow-podcasts.git``.
  281. If one does not have an SSH key configured for the required service (e.g., GitHub, or a server), installing or getting the subdataset and its contents fails, with messages starting similar to this:
  282. .. code-block:: console
  283. [INFO ] Cloning https://github.com/psychoinformatics-de/paper-remodnav.git/remodnav [2 other candidates] into '/home/.../remodnav'
  284. Permission denied (publickey).
  285. If you encounter these errors, make sure to create and/or upload an SSH key (see section :ref:`Gin` for an example) as necessary, or reconfigure/edit the URL into a HTTPS URL.
  286. **git-annex as the default branch on GitHub**
  287. If you publish a dataset to :term:`GitHub`, but the resulting repository seems to consist of cryptic directories instead of your actual file names and directories, GitHub may have made the :term:`git-annex branch` the default.
  288. .. figure:: ../artwork/src/defaultgitannex_light.png
  289. Typically, you can change the default branch settings in the webinterface to fix this.
  290. **Windows adds whitespace line-endings to unchanged files**
  291. The type of line ending (a typically invisible character that indicates a line break) differs between operating system.
  292. While Linux and macOS use a *line feed* (LF), Windows uses *carriage return* + *line feed* (CRLF).
  293. When you only collaborate across operating systems of the same type, this is a very boring fun fact at most.
  294. But if Windows- and Non-Windows users collaborate, or if you are working with files across different operating systems, the different type of line ending that Windows uses may show up as unintended modifications on other system.
  295. In most cases, this is prevented by a default cross-platform compatible line-ending configuration on Windows that is set during installation:
  296. .. figure:: ../artwork/src/crlf.png
  297. To fix this behavior outside of the installation process and standardize line endings across operating systems, Windows users are advised to set the configuration ``core.autcrlf true`` with ``git config --global core.autocrfl true``.
  298. .. rubric:: Footnotes
  299. .. [#f1] ``wtf`` in :dlcmd:`wtf` could stand for many things. "Why the Face?"
  300. "Wow, that's fantastic!", "What's this for?", "What to fix", "What the FAQ",
  301. "Where's the fire?", "Wipe the floor", "Welcome to fun",
  302. "Waste Treatment Facility", "What's this foolishness", "What the fruitcake", ...
  303. Pick a translation of your choice and make running this command more joyful.