101-128-summary_yoda.rst 3.3 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
  1. .. _summary_yoda:
  2. Summary
  3. -------
  4. The YODA principles are a small set of guidelines that can make a huge
  5. difference towards reproducibility, comprehensibility, and transparency
  6. in a data analysis project. By applying them in your own midterm analysis
  7. project, you have experienced their immediate benefits.
  8. You also noticed that these standards are not complex -- quite the opposite,
  9. they are very intuitive.
  10. They structure essential components of a data analysis project --
  11. data, code, potentially computational environments, and lastly also the results --
  12. in a modular and practical way, and use basic principles and commands
  13. of DataLad you are already familiar with.
  14. There are many advantages to this organization of contents.
  15. - Having input data as independent dataset(s) that are not influenced (only
  16. consumed) by an analysis allows for a modular reuse of pure data datasets,
  17. and does not conflate the data of an analysis with the results or the code.
  18. You have experienced this with the ``iris_data`` subdataset.
  19. - Keeping code within an independent, version-controlled directory, but as a part
  20. of the analysis dataset, makes sharing code easy and transparent, and helps
  21. to keep directories neat and organized. Moreover,
  22. with the data as subdatasets, data and code can be automatically shared together.
  23. By complying to this principle, you were able to submit both code and data
  24. in a single superdataset.
  25. - Keeping an analysis dataset fully self-contained with relative instead of
  26. absolute paths in scripts is critical to ensure that an analysis reproduces
  27. easily on a different computer.
  28. - DataLad's Python API makes all of DataLad's functionality available in
  29. Python, either as standalone functions that are exposed via ``datalad.api``,
  30. or as methods of the ``Dataset`` class.
  31. This provides an alternative to the command line, but it also opens up the
  32. possibility of performing DataLad commands directly inside of scripts.
  33. - Including the computational environment into an analysis dataset encapsulates
  34. software and software versions, and thus prevents re-computation failures
  35. (or sudden differences in the results) once
  36. software is updated, and software conflicts arising on different machines
  37. than the one the analysis was originally conducted on. You have not yet
  38. experienced how to do this first-hand, but you will in a later section.
  39. - Having all of these components as part of a DataLad dataset allows version
  40. controlling all pieces within the analysis regardless of their size, and
  41. generates provenance for everything, especially if you make use of the tools
  42. that DataLad provides. This way, anyone can understand and even reproduce
  43. your analysis without much knowledge about your project.
  44. - The yoda procedure is a good starting point to build your next data analysis
  45. project up on.
  46. Now what can I do with it?
  47. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  48. Using tools that DataLad provides you are able to make the most out of
  49. your data analysis project. The YODA principles are a guide to accompany
  50. you on your path to reproducibility and provenance-tracking.
  51. What should have become clear in this section is that you are already
  52. equipped with enough DataLad tools and knowledge that complying to these
  53. standards felt completely natural and effortless in your midterm analysis
  54. project.