123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566 |
- .. _gitlfs:
- Walk-through: Git LFS as a special remote on GitHub
- ---------------------------------------------------
- Some repository hosting services provide for-pay support for large files, and can thus be used as special remotes as well.
- GitHub and GitLab, for example, support `Git Large File Storage <https://github.com/git-lfs/git-lfs>`_ (Git LFS) for managing data files using Git.
- A free GitHub subscription allows up to `1GB of free storage and up to 1GB of bandwidth monthly <https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage>`_.
- As such, it might be sufficient for some use cases, and could be configured
- quite easily.
- In order to store annexed dataset contents on GitHub, we need first to create a repository on GitHub:
- .. code-block:: console
- $ datalad create-sibling-github test-github-lfs --access-protocol ssh
- .: github(-) [git@github.com:yarikoptic/test-github-lfs.git (git)]
- 'git@github.com:yarikoptic/test-github-lfs.git' configured as sibling 'github' for <Dataset path=/tmp/test-github-lfs>
- and then initialize a :term:`special remote` of type ``git-lfs``, pointing to the same GitHub repository:
- .. code-block:: console
- $ git annex initremote github-lfs type=git-lfs url=git@github.com:yarikoptic/test-github-lfs autoenable=true encryption=none embedcreds=no
- If you would like to compress data in Git LFS, you need to take a detour via
- encryption during :gitannexcmd:`initremote` -- this has compression as a
- convenient side effect. Here is an example initialization:
- .. code-block:: console
- $ git annex initremote --force github-lfs type=git-lfs url=git@github.com:yarikoptic/test-github-lfs autoenable=true encryption=shared
- With this single step it becomes possible to transfer contents to GitHub:
- .. code-block:: console
- $ git annex copy --to=github-lfs file.dat
- copy file.dat (to github-lfs...)
- ok
- (recording state in git...)
- and the entire dataset to the same GitHub repository:
- .. code-block:: console
- $ datalad push --to=github
- [INFO ] Publishing <Dataset path=/tmp/test-github-lfs> to github
- publish(ok): . (dataset) [pushed to github: ['[new branch]', '[new branch]']]
- Alternatively, to make publication even easier for you, the dataset provider, you can establish a :term:`publication dependency` such that a :dlcmd:`push` performs the data transfer to ``git-lfs`` automatically:
- .. code-block:: console
- $ datalad siblings configure -s github --publish-depends github-lfs
- $ # afterwards, only datalad push is needed to publish dataset contents and history
- $ datalad push --to github
- Consumers of your dataset should be able to retrieve files right after cloning the dataset without a ``siblings enable`` command, as shown in section :ref:`dropbox`, because of the ``autoenable=true`` configuration for the special remote.
- .. index::
- pair: drop (LFS); with DataLad
- .. importantnote:: No drop from LFS
- Unfortunately, it is impossible to :dlcmd:`drop` contents from Git LFS:
- `help.github.com/en/github/managing-large-files <https://docs.github.com/en/repositories/working-with-files/managing-large-files/removing-files-from-git-large-file-storage#git-lfs-objects-in-your-repository>`_
|