101-139-gitlfs.rst 3.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
  1. .. _gitlfs:
  2. Walk-through: Git LFS as a special remote on GitHub
  3. ---------------------------------------------------
  4. Some repository hosting services provide for-pay support for large files, and can thus be used as special remotes as well.
  5. GitHub and GitLab, for example, support `Git Large File Storage <https://github.com/git-lfs/git-lfs>`_ (Git LFS) for managing data files using Git.
  6. A free GitHub subscription allows up to `1GB of free storage and up to 1GB of bandwidth monthly <https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage>`_.
  7. As such, it might be sufficient for some use cases, and could be configured
  8. quite easily.
  9. In order to store annexed dataset contents on GitHub, we need first to create a repository on GitHub:
  10. .. code-block:: console
  11. $ datalad create-sibling-github test-github-lfs --access-protocol ssh
  12. .: github(-) [git@github.com:yarikoptic/test-github-lfs.git (git)]
  13. 'git@github.com:yarikoptic/test-github-lfs.git' configured as sibling 'github' for <Dataset path=/tmp/test-github-lfs>
  14. and then initialize a :term:`special remote` of type ``git-lfs``, pointing to the same GitHub repository:
  15. .. code-block:: console
  16. $ git annex initremote github-lfs type=git-lfs url=git@github.com:yarikoptic/test-github-lfs autoenable=true encryption=none embedcreds=no
  17. If you would like to compress data in Git LFS, you need to take a detour via
  18. encryption during :gitannexcmd:`initremote` -- this has compression as a
  19. convenient side effect. Here is an example initialization:
  20. .. code-block:: console
  21. $ git annex initremote --force github-lfs type=git-lfs url=git@github.com:yarikoptic/test-github-lfs autoenable=true encryption=shared
  22. With this single step it becomes possible to transfer contents to GitHub:
  23. .. code-block:: console
  24. $ git annex copy --to=github-lfs file.dat
  25. copy file.dat (to github-lfs...)
  26. ok
  27. (recording state in git...)
  28. and the entire dataset to the same GitHub repository:
  29. .. code-block:: console
  30. $ datalad push --to=github
  31. [INFO ] Publishing <Dataset path=/tmp/test-github-lfs> to github
  32. publish(ok): . (dataset) [pushed to github: ['[new branch]', '[new branch]']]
  33. Alternatively, to make publication even easier for you, the dataset provider, you can establish a :term:`publication dependency` such that a :dlcmd:`push` performs the data transfer to ``git-lfs`` automatically:
  34. .. code-block:: console
  35. $ datalad siblings configure -s github --publish-depends github-lfs
  36. $ # afterwards, only datalad push is needed to publish dataset contents and history
  37. $ datalad push --to github
  38. Consumers of your dataset should be able to retrieve files right after cloning the dataset without a ``siblings enable`` command, as shown in section :ref:`dropbox`, because of the ``autoenable=true`` configuration for the special remote.
  39. .. index::
  40. pair: drop (LFS); with DataLad
  41. .. importantnote:: No drop from LFS
  42. Unfortunately, it is impossible to :dlcmd:`drop` contents from Git LFS:
  43. `help.github.com/en/github/managing-large-files <https://docs.github.com/en/repositories/working-with-files/managing-large-files/removing-files-from-git-large-file-storage#git-lfs-objects-in-your-repository>`_