101-134-summary.rst 2.8 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
  1. .. _summary_containers:
  2. Summary
  3. -------
  4. The last two sections have first of all extended your knowledge on dataset nesting:
  5. - When subdatasets are created or installed, they are registered to the superdataset
  6. in their current version state (as identified by their most recent commit's hash).
  7. For a freshly created subdatasets, the most recent commit is at the same time its
  8. first commit.
  9. - Once the subdataset evolves, the superdataset recognizes this as a ``modification``
  10. of the subdatasets version state. If you want to record this, you need to
  11. :dlcmd:`save` it in the superdataset:
  12. .. code-block:: console
  13. $ datalad save -m "a short summary of changes in subds" <path to subds>
  14. But more than nesting concepts, they have also extended your knowledge on
  15. reproducible analyses with :dlcmd:`run` and you have experienced
  16. for yourself why and how software containers can go hand-in-hand with DataLad:
  17. - A software container encapsulates a complete software environment, independent
  18. from the environment of the computer it runs on. This allows you to create or
  19. use secluded software and also share it together with your analysis to ensure
  20. computational reproducibility. The DataLad extension
  21. `datalad containers <https://docs.datalad.org/projects/container>`_
  22. can make this possible.
  23. - The command :dlcmd:`containers-add` registers an :term:`container image` from a path or
  24. URL to your dataset.
  25. - If you use :dlcmd:`containers-run` instead of :dlcmd:`run`,
  26. you can reproducibly execute a command of your choice *within* the software
  27. environment.
  28. - A :dlcmd:`rerun` of a commit produced with :dlcmd:`containers-run`
  29. will re-execute the command in the same software environment.
  30. .. index::
  31. pair: hub; Docker
  32. Now what can I do with it?
  33. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  34. For one, you will not be surprised if you ever see a subdataset being shown as
  35. ``modified`` by :dlcmd:`status`: You now know that if a subdataset
  36. evolves, it's most recent state needs to be explicitly saved to the superdatasets
  37. history.
  38. On a different matter, you are now able to capture and share analysis provenance that
  39. includes the relevant software environment. This does not only make your analyses
  40. projects automatically reproducible, but automatically *computationally* reproducible -
  41. you can make sure that your analyses runs on any computer with Singularity,
  42. regardless of the software environment on this computer. Even if you are unsure how you can wrap up an
  43. environment into a software :term:`container image` at this point, you could make use of
  44. hundreds of publicly available images on `Singularity-Hub <https://singularity-hub.org>`_ and
  45. `Docker-Hub <https://hub.docker.com>`_.
  46. With this, you have also gotten a first glimpse into an extension of DataLad: A
  47. Python module you can install with Python package managers such as ``pip`` that
  48. extends DataLad's functionality.