123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564 |
- .. _FAQ:
- Frequently asked questions
- --------------------------
- This section answers frequently asked questions about high-level DataLad
- concepts or commands. If you have a question you want to see answered in here,
- `please create an issue <https://github.com/datalad-handbook/book/issues/new>`_
- or a `pull request <https://handbook.datalad.org/contributing.html>`_.
- For a series of specialized command snippets for various use cases, please see
- section :ref:`gists`.
- What is Git?
- ^^^^^^^^^^^^
- Git is a free and open source distributed version control system. In a
- directory that is initialized as a Git repository, it can track small-sized
- files and the modifications done to them.
- Git thinks of its data like a *series of snapshots* -- it basically takes a
- picture of what all files look like whenever a modification in the repository
- is saved. It is a powerful and yet small and fast tool with many features such
- as *branching and merging* for independent development, *checksumming* of
- contents for integrity, and *easy collaborative workflows* thanks to its
- distributed nature.
- DataLad uses Git underneath the hood. Every DataLad dataset is a Git
- repository, and you can use any Git command within a DataLad dataset. Based
- on the configurations in ``.gitattributes``, file content can be version
- controlled by Git or managed by git-annex, based on path pattern, file types,
- or file size. The section :ref:`config2` details how these configurations work.
- `This chapter <https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F>`_
- gives a comprehensive overview on what Git is.
- Where is Git's "staging area" in DataLad datasets?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- As mentioned in :ref:`populate`, a local version control workflow with
- DataLad "skips" the staging area (that is typical for Git workflows) from the
- user's point of view.
- What is git-annex?
- ^^^^^^^^^^^^^^^^^^
- git-annex (`https://git-annex.branchable.com/ <https://git-annex.branchable.com>`_)
- is a distributed file synchronization system written by Joey Hess. It can
- share and synchronize large files independent from a commercial service or a
- central server. It does so by managing all file *content* in a separate
- directory (the *annex*, *object tree*, or *key-value-store* in ``.git/annex/objects/``),
- and placing only file names and
- metadata into version control by Git. Among many other features, git-annex
- can ensure sufficient amounts of file copies to prevent accidental data loss and
- enables a variety of data transfer mechanisms.
- DataLad uses git-annex underneath the hood for file content tracking and
- transport logistics. git-annex offers an astonishing range of functionality
- that DataLad tries to expose in full. That being said, any DataLad dataset
- (with the exception of datasets configured to be pure Git repositories) is
- fully compatible with git-annex -- you can use any git-annex command inside a
- DataLad dataset.
- The chapter :ref:`chapter_gitannex` can give you more insights into how git-annex
- takes care of your data. git-annex's `website <https://git-annex.branchable.com>`_
- can give you a complete walk-through and detailed technical background
- information.
- What does DataLad add to Git and git-annex?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- DataLad sits on top of Git and git-annex and tries to integrate and expose
- their functionality fully. While DataLad thus is a "thin layer" on top of
- these tools and tries to minimize the use of unique/idiosyncratic functionality,
- it also tries to simplify working with repositories and adds a range of useful concepts
- and functions:
- - Both Git and git-annex are made to work with a single repository at a time.
- For example, while nesting pure Git repositories is possible via Git
- submodules (that DataLad also uses internally), *cleaning up* after
- placing a random file somewhere into this repository hierarchy can be very
- painful. A key advantage that DataLad brings to the table is that it makes
- the boundaries between repositories vanish from a user's point
- of view. Most core commands have a ``--recursive`` option that will discover
- and traverse any subdatasets and do-the-right-thing.
- Whereas git and git-annex would require the caller to first cd to the target
- repository, DataLad figures out which repository the given paths belong to and
- then works within that repository.
- :dlcmd:`save . --recursive` will solve the subdataset problem above,
- for example, no matter what was changed/added, no matter where in a tree
- of subdatasets.
- - DataLad provides users with the ability to act on "virtual" file paths. If
- software needs data files that are carried in a subdataset (in Git terms:
- submodule) for a computation or test, a ``datalad get`` will discover if
- there are any subdatasets to install at a particular version to eventually
- provide the file content.
- - DataLad adds metadata facilities for metadata extraction in various flavors,
- and can store extracted and aggregated metadata under ``.datalad/metadata``.
- .. todo::
- more here.
- Does DataLad host my data?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
- No, DataLad manages your data, but it does not host it. When publishing a
- dataset with annexed data, you will need to find a place that the large file
- content can be stored in -- this could be a web server, a cloud service such
- as `Dropbox <https://www.dropbox.com>`_, an S3 bucket, or many other storage
- solutions -- and set up a publication dependency on this location.
- This gives you all the freedom to decide where your data lives, and who can
- have access to it. Once this set up is complete, publishing and accessing a
- published dataset and its data are as easy as if it would lie on your own
- machine.
- You can find a typical workflow in the chapter :ref:`chapter_thirdparty`.
- How does GitHub relate to DataLad?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- DataLad can make good use of GitHub, if you have figured out storage for your
- large files otherwise. You can make DataLad publish file content to one location
- and afterwards automatically push an update to GitHub, such that
- users can install directly from GitHub and seemingly also obtain large file
- content from GitHub. GitHub is also capable of resolving submodule/subdataset
- links to other GitHub repos, which makes for a nice UI.
- Does DataLad scale to large dataset sizes?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- In general, yes. The largest dataset managed by DataLad at this point is the `Human Connectome Project <http://www.humanconnectomeproject.org>`_ data, encompassing 80 Terabytes of data in 15 million files, and larger projects (up to 500TB) are currently actively worked on.
- The chapter :ref:`chapter_gobig` is a guide to "beyond-household-quantity datasets".
- What is the difference between a superdataset, a subdataset, and a dataset?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Conceptually and technically, there is no difference between a dataset, a
- subdataset, or a superdataset. The only aspect that makes a dataset a sub- or
- superdataset is whether it is *registered* in another dataset (by means of an entry in the
- ``.gitmodules``, automatically performed upon an appropriate ``datalad
- install -d`` or ``datalad create -d`` command) or contains registered datasets.
- How can I convert/import/transform an existing Git or git-annex repository into a DataLad dataset?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- You can transform any existing Git or git-annex repository of yours into a
- DataLad dataset by running:
- .. code-block:: console
- $ datalad create -f
- inside of it. Afterwards, you may want to tweak settings in ``.gitattributes``
- according to your needs (see sections :ref:`config` and :ref:`config2` for
- additional insights on this).
- The chapter :ref:`chapter_retro` guides you through transitioning an existing project into DataLad.
- How can I convert an existing DataLad dataset with annexed data back to a plain Git repository?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- If you decide to stop using git-annex or DataLad, or if you want to turn an annex repo back into a Git repo, you can do so with the git-annex uninit command.
- The section :ref:`uninit` contains more details.
- How can I cite DataLad?
- ^^^^^^^^^^^^^^^^^^^^^^^
- Please cite the official paper on DataLad:
- Halchenko et al., (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262, `https://doi.org/10.21105/joss.03262 <https://doi.org/10.21105/joss.03262>`_.
- .. _dataset_textblock:
- How can I help others get started with a shared dataset?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- If you want to share your dataset with users that are not already familiar with
- DataLad, it is helpful to include some information on how to interact with
- DataLad datasets in your dataset's ``README`` (or similar) file.
- If you do not want to invent a description yourself, you can run
- :dlcmd:`add-readme` in your dataset, and have one added automatically.
- .. only:: html
- Below, we provide a standard text block that you can use (and adapt as you
- wish) for such purposes.
- .. find-out-more:: Textblock in .rst format:
- .. code-block:: rst
- DataLad datasets and how to use them
- ------------------------------------
- This repository is a `DataLad <https://www.datalad.org>`__ dataset. It provides
- fine-grained data access down to the level of individual files, and allows for
- tracking future updates. In order to use this repository for data retrieval,
- `DataLad <https://www.datalad.org>`_ is required.
- It is a free and open source command line tool, available for all
- major operating systems, and builds up on Git and `git-annex
- <https://git-annex.branchable.com>`__ to allow sharing, synchronizing, and
- version controlling collections of large files. You can find information on
- how to install DataLad at `handbook.datalad.org/intro/installation.html
- <https://handbook.datalad.org/intro/installation.html>`_.
- Get the dataset
- ^^^^^^^^^^^^^^^
- A DataLad dataset can be ``cloned`` by running:
- .. code-block:: bash
- datalad clone <url>
- Once a dataset is cloned, it is a light-weight directory on your local machine.
- At this point, it contains only small metadata and information on the
- identity of the files in the dataset, but not actual *content* of the
- (sometimes large) data files.
- Retrieve dataset content
- ^^^^^^^^^^^^^^^^^^^^^^^^
- After cloning a dataset, you can retrieve file contents by running:
- .. code-block:: bash
- datalad get <path/to/directory/or/file>
- This command will trigger a download of the files, directories, or
- subdatasets you have specified.
- DataLad datasets can contain other datasets, so called *subdatasets*. If you
- clone the top-level dataset, subdatasets do not yet contain metadata and
- information on the identity of files, but appear to be empty directories. In
- order to retrieve file availability metadata in subdatasets, run:
- .. code-block:: bash
- datalad get -n <path/to/subdataset>
- Afterwards, you can browse the retrieved metadata to find out about
- subdataset contents, and retrieve individual files with ``datalad get``. If you
- use ``datalad get <path/to/subdataset>``, all contents of the subdataset will
- be downloaded at once.
- Stay up-to-date
- ^^^^^^^^^^^^^^^
- DataLad datasets can be updated. The command ``datalad update`` will *fetch*
- updates and store them on a different branch (by default
- ``remotes/origin/main``). Running
- .. code-block:: bash
- datalad update --merge
- will *pull* available updates and integrate them in one go.
- Find out what has been done
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
- DataLad datasets contain their history in the ``git log``.
- By running ``git log`` (or a tool that displays Git history) in the dataset or on
- specific files, you can find out what has been done to the dataset or to individual files
- by whom, and when.
- More information
- ^^^^^^^^^^^^^^^^
- More information on DataLad and how to use it can be found in the DataLad Handbook at
- `handbook.datalad.org <https://handbook.datalad.org>`_. The
- chapter "DataLad datasets" can help you to familiarize yourself with the
- concept of a dataset.
- .. find-out-more:: Textblock in markdown format
- .. code-block:: md
- [![made-with-datalad](https://www.datalad.org/badges/made_with.svg)](https://datalad.org)
- ## DataLad datasets and how to use them
- This repository is a [DataLad](https://www.datalad.org/) dataset. It provides
- fine-grained data access down to the level of individual files, and allows for
- tracking future updates. In order to use this repository for data retrieval,
- [DataLad](https://www.datalad.org/) is required. It is a free and
- open source command line tool, available for all major operating
- systems, and builds up on Git and [git-annex](https://git-annex.branchable.com/)
- to allow sharing, synchronizing, and version controlling collections of
- large files. You can find information on how to install DataLad at
- [handbook.datalad.org/intro/installation.html](https://handbook.datalad.org/intro/installation.html).
- ### Get the dataset
- A DataLad dataset can be `cloned` by running
- ```
- datalad clone <url>
- ```
- Once a dataset is cloned, it is a light-weight directory on your local machine.
- At this point, it contains only small metadata and information on the
- identity of the files in the dataset, but not actual *content* of the
- (sometimes large) data files.
- ### Retrieve dataset content
- After cloning a dataset, you can retrieve file contents by running
- ```
- datalad get <path/to/directory/or/file>
- ```
- This command will trigger a download of the files, directories, or
- subdatasets you have specified.
- DataLad datasets can contain other datasets, so called *subdatasets*.
- If you clone the top-level dataset, subdatasets do not yet contain
- metadata and information on the identity of files, but appear to be
- empty directories. In order to retrieve file availability metadata in
- subdatasets, run
- ```
- datalad get -n <path/to/subdataset>
- ```
- Afterwards, you can browse the retrieved metadata to find out about
- subdataset contents, and retrieve individual files with `datalad get`.
- If you use `datalad get <path/to/subdataset>`, all contents of the
- subdataset will be downloaded at once.
- ### Stay up-to-date
- DataLad datasets can be updated. The command `datalad update` will
- *fetch* updates and store them on a different branch (by default
- `remotes/origin/main`). Running
- ```
- datalad update --merge
- ```
- will *pull* available updates and integrate them in one go.
- ### Find out what has been done
- DataLad datasets contain their history in the `git log`.
- By running `git log` (or a tool that displays Git history) in the dataset or on
- specific files, you can find out what has been done to the dataset or to individual files
- by whom, and when.
- ### More information
- More information on DataLad and how to use it can be found in the DataLad Handbook at
- [handbook.datalad.org](https://handbook.datalad.org/index.html). The chapter
- "DataLad datasets" can help you to familiarize yourself with the concept of a dataset.
- .. find-out-more:: Textblock without formatting
- .. code-block:: md
- DataLad datasets and how to use them
- This repository is a DataLad (https://www.datalad.org) dataset. It provides
- fine-grained data access down to the level of individual files, and allows
- for tracking future updates. In order to use this repository for data
- retrieval, DataLad is required. It is a free and open source command line
- tool, available for all major operating systems, and builds up on Git and
- git-annex (https://git-annex.branchable.com) to allow sharing,
- synchronizing, and version controlling collections of large files. You can
- find information on how to install DataLad at
- https://handbook.datalad.org/intro/installation.html.
- Get the dataset
- A DataLad dataset can be "cloned" by running 'datalad clone <url>'.
- Once a dataset is cloned, it is a light-weight directory on your local
- machine.
- At this point, it contains only small metadata and information on the
- identity of the files in the dataset, but not actual *content* of the
- (sometimes large) data files.
- Retrieve dataset content
- After cloning a dataset, you can retrieve file contents by running
- 'datalad get <path/to/directory/or/file>'
- This command will trigger a download of the files, directories, or
- subdatasets you have specified.
- DataLad datasets can contain other datasets, so called "subdatasets".
- If you clone the top-level dataset, subdatasets do not yet contain
- metadata and information on the identity of files, but appear to be
- empty directories. In order to retrieve file availability metadata in
- subdatasets, run 'datalad get -n <path/to/subdataset>'
- Afterwards, you can browse the retrieved metadata to find out about
- subdataset contents, and retrieve individual files with 'datalad get'.
- If you use 'datalad get <path/to/subdataset>', all contents of the
- subdataset will be downloaded at once.
- Stay up-to-date
- DataLad datasets can be updated. The command 'datalad update' will
- "fetch" updates and store them on a different branch (by default
- 'remotes/origin/main'). Running 'datalad update --merge' will "pull"
- available updates and integrate them in one go.
- Find out what has been done
- DataLad datasets contain their history in the Git log.
- By running 'git log' (or a tool that displays Git history) in the dataset or on
- specific files, you can find out what has been done to the dataset or to individual files
- by whom, and when.
- More information
- More information on DataLad and how to use it can be found in the DataLad Handbook at
- https://handbook.datalad.org/index.html. The chapter "DataLad datasets"
- can help you to familiarize yourself with the concept of a dataset.
- What is the difference between DataLad, Git LFS, and Flywheel?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- `Flywheel <https://flywheel.io>`_ is an informatics platform for biomedical
- research and collaboration.
- `Git Large File Storage <https://github.com/git-lfs/git-lfs>`_ (Git LFS) is a
- command line tool that extends Git with the ability to manage large files. In
- that it appears similar to git-annex.
- .. todo::
- this.
- A more elaborate delineation from related solutions can be found in the DataLad
- `developer documentation <https://docs.datalad.org/related.html>`_.
- What is the difference between DataLad and DVC?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- `DVC <https://dvc.org>`_ is a version control system for machine learning projects.
- We have compared the two tools in a dedicated handbook section, :ref:`dvc`.
- DataLad version-controls my large files -- great. But how much is saved in total?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. todo::
- this.
- .. _copydata:
- How can I copy data out of a DataLad dataset?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Moving or copying data out of a DataLad dataset is always possible and works in
- many cases just like in any regular directory. The only
- caveat exists in the case of annexed data: If file content is managed with
- git-annex and stored in the :term:`object-tree`, what *appears* to be the
- file in the dataset is merely a symlink (please read section :ref:`symlink`
- for details). Moving or copying this symlink will not yield the
- intended result -- instead you will have a broken symlink outside of your
- dataset.
- When using the terminal command ``cp`` [#f1]_, it is sufficient to use the
- ``-L``/``--dereference`` option. This will follow symbolic links, and make
- sure that content gets moved instead of symlinks.
- Remember that if you are copying some annexed content out of a dataset without
- unlocking it first, you will only have "read" :term:`permissions` on the files you have just
- copied. Therefore you can :
- - either unlock the files before copying them out,
- - or copy them and then use the command ``chmod`` to be able to edit the file.
- .. code-block:: console
- $ # this will give you 'write' permission on the file
- $ chmod +w filename
-
- If you are not familiar with how the ``chmod`` works (or if you forgot - let's be honest we
- all google it sometimes), this is `a nice tutorial <https://bids.github.io/2015-06-04-berkeley/shell/07-perm.html>`_ .
- With tools other than ``cp`` (e.g., graphical file managers), to copy or move
- annexed content, make sure it is *unlocked* first:
- After a :dlcmd:`unlock` copying and moving contents will work fine.
- A subsequent :dlcmd:`save` in the dataset will annex the content
- again.
- Is there Python 2 support for DataLad?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- No, Python 2 support has been dropped in
- `September 2019 <https://github.com/datalad/datalad/pull/3629>`_.
- Is there a graphical user interface for DataLad?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Yes, a dedicated :term:`DataLad extension`, ``datalad-gooey``, provides a graphical user interface for DataLad.
- You can read more about it in the section :ref:`gooey`.
- How does DataLad interface with OpenNeuro?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- `OpenNeuro <https://openneuro.org>`_ is a free and open platform for sharing MRI,
- MEG, EEG, iEEG, and ECoG data. It publishes hosted data as DataLad datasets on
- :term:`GitHub`. The entire collection can be found at
- `github.com/OpenNeuroDatasets <https://github.com/OpenNeuroDatasets>`_. You can
- obtain the datasets just as any other DataLad datasets with :dlcmd:`clone`
- or :dlcmd:`install`.
- There is more info about this in the :ref:`OpenNeuro Quickstart Guide <openneuro>`.
- .. _bidsvalidator:
- BIDS validator issues in datasets with missing file content
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- As outlined in section :ref:`symlink`, all unretrieved files in datasets are broken symlinks.
- This is desired, and not a problem per se, but some tools, among them the `BIDS validator <https://github.com/bids-standard/bids-validator>`_, can be confused by this.
- Should you attempt to validate a dataset in which all or some file contents are missing, for example after cloning a dataset or after dropping file contents, the validator may fail to report on the validity of the complete dataset or the specific unretrieved files.
- If you aim for a complete validation of your dataset, re-do the validation after retrieving all necessary file contents.
- If you only aim to validate file names and structure, invoke the bids validator with the additional flags ``--ignoreNiftiHeaders`` and ``--ignoreSymlinks``.
- .. _gitannexbranch:
- What is the git-annex branch?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- If your DataLad dataset contains an annex, there is also a ``git-annex`` :term:`branch`
- that is created, used, and maintained solely by :term:`git-annex`. It is completely
- unconnected to any other branches in your dataset, and contains different types
- of log files.
- The contents of this branch are used for git-annex internal tracking of the
- dataset and its annexed contents. For example, git-annex stores information where
- file content can be retrieved from in a ``.log`` file for each object, and if the object
- was obtained from web-sources (e.g., with :dlcmd:`download-url`), a
- ``.log.web`` file stores the URL. Other files in this branch store information about
- the known remotes of the dataset and their description, if they have one.
- You can find out much more about the ``git-annex`` branch and its contents in the
- `documentation <https://git-annex.branchable.com/internals>`_.
- This branch, however, is managed by git-annex, and you should not tamper with it.
- .. _gitannexdefault:
- Help - Why does Github display my dataset with git-annex as the default branch?
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- If your dataset is represented on GitHub with cryptic directories instead of actual file names, GitHub probably declared the :term:`git-annex branch` to be your repositories "default branch".
- Here is an example:
- .. figure:: ../artwork/src/defaultgitannex_light.png
- This is related to GitHub's decision to make ``main`` `the default branch for newly created repositories <https://github.blog/changelog/2020-10-01-the-default-branch-for-newly-created-repositories-is-now-main>`_ -- datasets that do not have a ``main`` branch (but, for example, a ``master`` branch) may end up with a different branch being displayed on GitHub than intended.
- To fix this for present and/or future datasets, the default branch can be configured to a branch name of your choice on a repository- or organizational level `via GitHub's web-interface <https://github.blog/changelog/2020-08-26-set-the-default-branch-for-newly-created-repositories>`_.
- Alternatively, you can rename existing ``master`` branches into ``main`` using ``git branch -m master main`` (but beware of unforeseen consequences - your collaborators may try to ``update`` the ``master`` branch but fail, continuous integration workflows could still try to use ``master``, etc.).
- Lastly, you can initialize new datasets with ``main`` instead of ``master`` -- either with a global Git configuration [#f2]_ for ``init.defaultBranch`` (``git config --global init.defaultBranch main``), or by passing the ``--initial-branch <branchname>`` option via ``datalad create`` by appending ``--initial-branch main`` to the command (``datalad create mydataset --initial-branch main``) [#f3]_.
- .. rubric:: Footnotes
- .. [#f1] The absolutely amazing `Midnight Commander <https://github.com/MidnightCommander/mc>`_
- ``mc`` can also follow symlinks.
- .. [#f2] See the section :ref:`config` for more info on configurations
- .. [#f3] ``--initial-branch`` is not one of ``datalad create``'s parameters, but a parameter of a ``git init`` call. You can specify any of ``git init``'s parameters as the last arguments of ``datalad create`` (after the ``PATH``) and it will be passed to ``git init``.
|