1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768 |
- .. _summary_yoda:
- Summary
- -------
- The YODA principles are a small set of guidelines that can make a huge
- difference towards reproducibility, comprehensibility, and transparency
- in a data analysis project. By applying them in your own midterm analysis
- project, you have experienced their immediate benefits.
- You also noticed that these standards are not complex -- quite the opposite,
- they are very intuitive.
- They structure essential components of a data analysis project --
- data, code, potentially computational environments, and lastly also the results --
- in a modular and practical way, and use basic principles and commands
- of DataLad you are already familiar with.
- There are many advantages to this organization of contents.
- - Having input data as independent dataset(s) that are not influenced (only
- consumed) by an analysis allows for a modular reuse of pure data datasets,
- and does not conflate the data of an analysis with the results or the code.
- You have experienced this with the ``iris_data`` subdataset.
- - Keeping code within an independent, version-controlled directory, but as a part
- of the analysis dataset, makes sharing code easy and transparent, and helps
- to keep directories neat and organized. Moreover,
- with the data as subdatasets, data and code can be automatically shared together.
- By complying to this principle, you were able to submit both code and data
- in a single superdataset.
- - Keeping an analysis dataset fully self-contained with relative instead of
- absolute paths in scripts is critical to ensure that an analysis reproduces
- easily on a different computer.
- - DataLad's Python API makes all of DataLad's functionality available in
- Python, either as standalone functions that are exposed via ``datalad.api``,
- or as methods of the ``Dataset`` class.
- This provides an alternative to the command line, but it also opens up the
- possibility of performing DataLad commands directly inside of scripts.
- - Including the computational environment into an analysis dataset encapsulates
- software and software versions, and thus prevents re-computation failures
- (or sudden differences in the results) once
- software is updated, and software conflicts arising on different machines
- than the one the analysis was originally conducted on. You have not yet
- experienced how to do this first-hand, but you will in a later section.
- - Having all of these components as part of a DataLad dataset allows version
- controlling all pieces within the analysis regardless of their size, and
- generates provenance for everything, especially if you make use of the tools
- that DataLad provides. This way, anyone can understand and even reproduce
- your analysis without much knowledge about your project.
- - The yoda procedure is a good starting point to build your next data analysis
- project up on.
- Now what can I do with it?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
- Using tools that DataLad provides you are able to make the most out of
- your data analysis project. The YODA principles are a guide to accompany
- you on your path to reproducibility and provenance-tracking.
- What should have become clear in this section is that you are already
- equipped with enough DataLad tools and knowledge that complying to these
- standards felt completely natural and effortless in your midterm analysis
- project.
|