Scheduled service maintenance on November 22


On Friday, November 22, 2024, between 06:00 CET and 18:00 CET, GIN services will undergo planned maintenance. Extended service interruptions should be expected. We will try to keep downtimes to a minimum, but recommend that users avoid critical tasks, large data uploads, or DOI requests during this time.

We apologize for any inconvenience.

101-163-summary.rst 1.5 KB

1234567891011121314151617181920212223242526272829303132
  1. .. _gobigsummary:
  2. Summary
  3. -------
  4. If you want to go big, DataLad is a suitable tool and can overcome shortcomings
  5. of Git and git-annex, if used correctly. Scaling up involves
  6. some thought, and in some instances compromise, though.
  7. - The general mechanism that allows scaling up is nesting datasets. This process
  8. can be done by hand or programmatically. Recursive operations ease working
  9. across a hierarchy of datasets and create a monorepo-like experience
  10. - Beware of accidentally placing to many (even small) files into Git's version
  11. control in a single dataset!
  12. ``.gitignore`` files can keep irrelevant files out of version control, the
  13. ``explicit`` option :dlcmd:`run` may be helpful, and
  14. custom largefile rules in ``.gitattributes`` may be necessary to override
  15. dataset configurations such as ``text2git``.
  16. - Don't consider only the limits of version control software, but also the
  17. limits of your file system. Too many files in single directories can become
  18. problematic even without version control.
  19. - If things go wrong, it's not all lost. There are ways to clean up your dataset
  20. if it ever gets clogged, although they are the software equivalent of a
  21. blowtorch and should be handled with care.
  22. Now what can I do with it?
  23. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  24. Go big, if you want to. :ref:`Distribute 80TB of files <usecase_HCP_dataset>`
  25. or `more <https://github.com/datalad/datalad-ukbiobank>`_, or version control
  26. large analyses with minimized performance loss of your version control tools.