101-181-metalad.rst 25 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436
  1. .. _metalad:
  2. Metadata-Management with MetaLad
  3. --------------------------------
  4. For many years, :term:`metadata` related functionality was included in the DataLad core package.
  5. A modernized approach, however, is now developed in the `datalad-metalad extension <https://github.com/datalad/datalad-metalad>`_.
  6. .. figure:: ../artwork/src/metadata.svg
  7. MetaLad is a :term:`DataLad extension` that allows you to
  8. * associate :term:`metadata` in any format with a dataset, a subdataset, or a file,
  9. * extract metadata automatically from primary data or handle manually supplied metadata,
  10. * transport metadata separately from primary data,
  11. * dump metadata and, for example, store it in a file, or search through it with a tool of your choice.
  12. The following section illustrates relevant concepts, commands, and workflows.
  13. Primary Data versus Metadata
  14. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  15. You might ask upfront: "What is 'metadata'?"
  16. Very simply put: Metadata is data about data.
  17. In principle, any kind of data could be metadata. What makes it metadata is the fact that it is associated with some "primary" data, and usually describes the primary data in some way.
  18. Consider two simple examples from the physical and the digital world: A library catalog contains metadata about the library's books, such as their *location*; and a file system stores the *creation time* of a file as well as the *user ID* of its creator.
  19. The location, creation time, or creator ID is metadata, while the book in the library or the file on the file system are the primary data the metadata is associated with.
  20. And what does metadata do for you?
  21. Generally, metadata provides additional information about primary data.
  22. This allows to identify primary data with certain properties.
  23. These properties could either be contained within the primary data and (automatically) extracted from it, such as digital photographs captured in a specific time frame at a specific GPS location, or assigned to primary data based on an external policy, such as the directory "Hiking in the alps 2019" on your phone.
  24. Importantly, primary data can have virtually unlimited different metadata associated with it, depending on what is relevant in a given context.
  25. Consider a publication in a medical field, and a few examples for metadata about it from the virtually unlimited metadata space:
  26. 1. The full text for the scanned PDF (manually created, or automatically extracted by `optical character recognition <https://en.wikipedia.org/wiki/Optical_character_recognition>`_)
  27. 2. Citation information, such as the geographic origin of citing papers or type of media outlet reporting about it
  28. 3. Context information, e.g. publications based on similar data
  29. 4. Structural data about the underlying medical acquisitions (such as dataset containment, modification date, or hash), which can provide basic
  30. structural information even without access to the primary data
  31. 5. Special search indices, e.g. graph-based search indices, medical abbreviations
  32. 6. Anonymized information extracted from medical documents.
  33. 7. Information about the used software, e.g. security assessments, `citation.cff <https://citation-file-format.github.io>`_
  34. MetaLad's *extractor* concept
  35. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  36. In the context of MetaLad, each one of those metadata examples above would be called a *schema*, and a process or tool deriving or generating a given schema would be called an *extractor*.
  37. Different metadata schemas are useful in different contexts:
  38. In the example above, citation metadata might come in handy when evaluating the impact of the scientific finding, whereas the publications full text and special search indices could be used for automated meta-analyses.
  39. To allow a variety of metadata use cases, MetaLad can use various metadata schemas simultaneously - if you want to, all schemas from the example above and many more could be created and managed in the same dataset in parallel.
  40. To handle different schemas in parallel, MetaLad represents them based on unique identifiers of the extraction process that generated them.
  41. For example, the automatically scanned full text might be identified with an extractor name ``OCR``, and that of the citation data could be called "`altmetric <https://en.wikipedia.org/wiki/Altmetric>`_".
  42. But while the term "extractor" has a technical feel to it, an "extractor" can also be the manual process of annotating arbitrary information about a file - nothing prevents metadata from medical annotations to be called ``Sam-tracing-brain-regions-by-hand``.
  43. In addition to identifying schemas via extractor names, MetaLad and other :term:`DataLad extension`\s ship with specialized extractor tools to extract metadata of a certain schema.
  44. Likewise, `anyone can build their own extractor to generate schemas of their choice <https://docs.datalad.org/projects/metalad/en/latest/user_guide/writing-extractors.html>`_.
  45. But before we take a closer look into that, let's illustrate the metadata concepts and commands of MetaLad with a toy example.
  46. Adding metadata with meta-add
  47. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  48. In the context of DataLad datasets, metadata can either be associated with entire datasets or individual files inside of it.
  49. Whether a piece of information is *dataset level* or *file level* metadata is dependent on the nature of the metadata and the envisioned use.
  50. Metadata that describe dataset level properties could be dataset owner, dataset authors, dataset licenses, or names of contained files [#f1]_, whereas metadata that describe file level properties could be file :term:`checksum`\s or file-specific information like the time-stamp of a photograph.
  51. .. gitusernote:: Metadata is stored in Git
  52. When MetaLad adds metadata to your datasets, it will store the metadata in :term:`Git` only. Thus, even a plain Git repository is sufficient to work with ``datalad-metalad``. However, the metadata is stored in an unusual and somewhat hidden place, inside of the `Git object store <https://git-scm.com/book/en/v2/Git-Internals-Git-Objects>`_. If you're interested in the technical details, you can find a :ref:`Findoutmore <fom-metadataobjecttree>` a bit further down in this section.
  53. Let's look at a concrete example.
  54. We have a DataLad dataset ``cozy-screensavers`` that contains a single PNG-file called ``zen.png``.
  55. .. runrecord:: _examples/DL-101-181-101
  56. :language: console
  57. :workdir: beyond_basics/meta
  58. $ datalad clone https://github.com/datalad-handbook/cozy-screensavers.git
  59. $ cd cozy-screensavers
  60. .. runrecord:: _examples/DL-101-181-102
  61. :language: console
  62. :workdir: beyond_basics/meta/cozy-screensavers
  63. $ tree
  64. .. runrecord:: _examples/DL-101-181-103
  65. :language: console
  66. :workdir: beyond_basics/meta/cozy-screensavers
  67. $ datalad get zen.jpg
  68. Let's assume there is metadata stemming from an advanced AI called ``Picture2Words`` that is able to describe the content of images - in other words, this AI would be able to extract certain metadata from the file.
  69. In this case the AI describes the image as
  70. .. code-block:: bash
  71. "A lake with waterlilies in front of snow covered mountains"
  72. We would like to add this description as metadata to the file ``./zen.png``, and will identify it with a name corresponding to its extractor, ``"Picture2Words"``.
  73. In order to include metadata in a dataset, users need to provide a metadata entry to the :dlcmd:`meta-add` command.
  74. This metadata entry has two major requirements: It needs to be supplied in a certain format, in particular, as a JSON object [#f2]_, and it needs to include a set of required information in defined fields.
  75. Let's take a look at the JSON object we could generate as a metadata entry for ``zen.png`` and identify required fields::
  76. {
  77. "type": "file",
  78. "path": "zen.png",
  79. "dataset_id": "2d540a9d-2ef7-4b5f-8931-7c92f483f0c7",
  80. "dataset_version": "19f2d98d758116d099d467260a5a71082b2c6a29",
  81. "extractor_name": "Picture2Words",
  82. "extractor_version": "0.1",
  83. "extraction_parameter": {},
  84. "extraction_time": 1675113291.1464975,
  85. "agent_name": "Overworked CTO",
  86. "agent_email": "closetoburnout@randomtechconsultancy.com",
  87. "extracted_metadata": {
  88. "description": "A lake with waterlilies in a front of snow covered mountains"
  89. }
  90. }
  91. When adding file-level metadata to a dataset that contains the file, the metadata JSON object must contain:
  92. * information about the level the metadata applies to (``type``, with ``file`` instead of ``dataset`` as a value),
  93. * the file the metadata belongs to with a ``path``,
  94. * the :term:`dataset ID` (``dataset_id``) and version (``dataset_version``),
  95. * an joint identifier for the metadata extractor and schema ``extractor_name`` (i.e., ``Picture2Words``, as well as details about the metadata extractor like its version (``extractor_version``), its parameterization (``extraction_parameter``), and the date and time of extraction (``extraction_time``) in the form of a Unix time stamp [#f3]_,
  96. * information about the agent supplying the metadata (``agent_name`` and ``agent_email``),
  97. * and finally the metadata itself (``extracted_metadata``).
  98. While certain extractors can generate metadata entries automatically, or one could write scripts wrapping extracting tools to generate them, we can also create such a JSON object manually, for example in an editor.
  99. A valid metadata entry can then be read into ``meta-add`` either from the command line or from standard input (:term:`stdin`).
  100. For example, we can save the metadata entry above as ``metadata-zen.json``:
  101. .. runrecord:: _examples/DL-101-181-104
  102. :language: console
  103. :workdir: beyond_basics/meta/cozy-screensavers
  104. $ cat << EOT > metadata-zen.json
  105. {
  106. "type": "file",
  107. "path": "zen.png",
  108. "dataset_id": "2d540a9d-2ef7-4b5f-8931-7c92f483f0c7",
  109. "dataset_version": "19f2d98d758116d099d467260a5a71082b2c6a29",
  110. "extractor_name": "Picture2Words",
  111. "extractor_version": "0.1",
  112. "extraction_parameter": {},
  113. "extraction_time": 1675113291.1464975,
  114. "agent_name": "Overworked CTO",
  115. "agent_email": "closetoburnout@randomtechconsultancy.com",
  116. "extracted_metadata": {
  117. "description": "A lake with waterlilies in a front of snow covered mountains"
  118. }
  119. }
  120. EOT
  121. Then, we redirect the content of the file into the :dlcmd:`meta-add` command in the command line.
  122. The following call would add the metadata entry to the current dataset, ``cozy-screensavers``:
  123. .. runrecord:: _examples/DL-101-181-105
  124. :language: console
  125. :workdir: beyond_basics/meta/cozy-screensavers
  126. $ datalad meta-add -d . - < metadata-zen.json
  127. .. index::
  128. single: configuration item; datalad.dataset.id
  129. .. find-out-more:: meta-add validity checks
  130. When adding metadata for the first time, its not uncommon to run into errors.
  131. Its quite easy, for example, to miss a comma or quotation mark when creating a JSON object by hand.
  132. But there are also some internal checks that might be surprising.
  133. If you want to add the metadata above to your own dataset, you should make sure to adjust the ``dataset_id`` to the ID of your own dataset, found via the command ``datalad configuration get datalad.dataset.id`` - otherwise you'll see an error [#f4]_, and likewise the ``dataset_version``.
  134. And in case you'd supply the ``extraction_time`` as "this morning at 8AM" instead of a time stamp, the command will be unhappy as well.
  135. In case an error occurs, make sure to read the error message, and turn the the commands' ``--help`` for insights about requirements you might have missed.
  136. After the metadata has been added, you can view it via the command :dlcmd:`meta-dump`.
  137. The simplest form of this command is ``meta-dump -r``, which will show all metadata that is stored in the dataset in the current directory.
  138. To get more specific metadata records, you can give a dataset-file-path-pattern to ``meta-dump``, much like an argument to ``ls``, that identifies :term:`dataset ID`, version and a file within the dataset.
  139. The two parts are separated by ``:``. The following line would just dump all metadata for ``zen.png``.
  140. .. runrecord:: _examples/DL-101-181-106
  141. :language: console
  142. :workdir: beyond_basics/meta/cozy-screensavers
  143. $ datalad meta-dump -d . .:zen.png
  144. This could also be printed a bit more readable:
  145. .. runrecord:: _examples/DL-101-181-107
  146. :language: console
  147. :workdir: beyond_basics/meta/cozy-screensavers
  148. $ datalad -f json_pp meta-dump -d . .:zen.png
  149. .. find-out-more:: More complex metadata-dumps
  150. TODO: add complex Dataset-file-path-pattern examples, e.g., with UUIDs, versions, etc
  151. Using existing extractors to add metadata
  152. """""""""""""""""""""""""""""""""""""""""
  153. If writing JSON objects by hand sounds cumbersome, it indeed is.
  154. To automate metadata extraction or generation, MetaLad can use extractors to do the job.
  155. A few built-in extractors are already shipped with it, for example ``annex`` (reporting on information :term:`git-annex` provides about datasets or files), or ``studyminimeta`` (a `metadata schema for archived studies <https://github.com/christian-monch/datalad-metalad/blob/nf-archived_study_metadata/tools/archive_metadata_validator/docs/source/archived-study-metadata-handbook.rst>`_).
  156. Once an extractor of choice is found, the :dlcmd:`meta-extract` command can do its job:
  157. .. runrecord:: _examples/DL-101-181-108
  158. :language: console
  159. :workdir: beyond_basics/meta/cozy-screensavers
  160. $ datalad meta-extract -d . metalad_core | jq
  161. The extracted metadata can then either be saved into a file as before, or directly :term:`pipe`'d into :dlcmd:`meta-add`.
  162. Creating your own extractor
  163. """""""""""""""""""""""""""
  164. The MetaLad docs have a dedicated user guide that walks you through the process of creating your own extractor. Have a look at `docs.datalad.org/projects/metalad/user_guide/writing-extractors.html <https://docs.datalad.org/projects/metalad/en/latest/user_guide/writing-extractors.html>`_.
  165. Distributing and Getting Metadata
  166. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  167. Once metadata has been added to a dataset, it can be distributed and retrieved.
  168. Instead of creating and adding metadata yourself, you could download fitting pre-existing metadata.
  169. Similarly, instead of repeating a ``meta-add`` process for one and the same files across hierarchies of datasets, metadata added into one dataset can be exported into other datasets.
  170. Regardless of whether it is a distribution or a retrieval process, though, an export with MetaLad will only concern the *metadata*, and never the primary data.
  171. Download Metadata from a remote repository
  172. """"""""""""""""""""""""""""""""""""""""""
  173. Let's start by creating a place where someone else's metadata could live.
  174. .. runrecord:: _examples/DL-101-181-110
  175. :language: console
  176. :workdir: beyond_basics/meta
  177. $ datalad create metadata-assimilation
  178. $ cd metadata-assimilation
  179. Because MetaLad stores metadata in :term:`Git`'s object store, we use Git to directly fetch metadata from a remote repository, such as this demo on :term:`GitHub`: ``https://github.com/christian-monch/metadata-test.git``.
  180. Because metadata added by MetaLad is not transported automatically but needs to be specifically requested, the command to retrieve it looks unfamiliar to non-Git-users: It identifies the precise location of the :term:`ref` that contains the metadata.
  181. .. runrecord:: _examples/DL-101-181-111
  182. :language: console
  183. :workdir: beyond_basics/meta/metadata-assimilation
  184. $ git fetch \
  185. "https://github.com/christian-monch/metadata-test.git" \
  186. "refs/datalad/*:refs/datalad/*"
  187. .. find-out-more:: Exactly where is metadata stored, and why?
  188. :name: fom-metadataobjecttree
  189. MetaLad employs an internal metadata model that makes the following properties possible:
  190. * Metadata has a version encoded, but isn't itself version controlled
  191. * Metadata should not be transported if not explicitly requested
  192. * It should be possible to only retrieve parts of the overall metadata tree, e.g. certain sub-nodes
  193. To fulfill this, metadata is stored in Git's internal object store as a `blob <https://git-scm.com/book/en/v2/Git-Internals-Git-Objects>`_, and Git :term:`ref`\'s are used to point to these blobs.
  194. To not automatically transport them, they are organized in a directory that isn't fetched or pushed by default, but can be transported by explicitly fetching or pushing it: ``.git/refs/datalad`` [#f5]_.
  195. After fetching these refs, they can be found in the ``metadata-assimilation`` dataset:
  196. .. runrecord:: _examples/DL-101-181-113
  197. :language: console
  198. :workdir: beyond_basics/meta/metadata-assimilation
  199. $ tree .git/refs
  200. Just like other Git :term:`ref`\s, these refs are files that identify Git objects or trees.
  201. By utilizing Git's internal plumbing commands, we can follow them:
  202. .. runrecord:: _examples/DL-101-181-114
  203. :language: console
  204. :workdir: beyond_basics/meta/metadata-assimilation
  205. $ cat .git/refs/datalad/dataset-tree-version-list
  206. .. runrecord:: _examples/DL-101-181-115
  207. :language: console
  208. :workdir: beyond_basics/meta/metadata-assimilation
  209. :realcommand: echo "$ git show $(cat .git/refs/datalad/dataset-tree-version-list) | jq" && git show $(cat .git/refs/datalad/dataset-tree-version-list) | jq
  210. .. runrecord:: _examples/DL-101-181-116
  211. :language: console
  212. :workdir: beyond_basics/meta/metadata-assimilation
  213. :realcommand: echo "$ git ls-tree $(git show $(cat .git/refs/datalad/dataset-tree-version-list) | jq | grep "location" | awk '{gsub(/"/, "", $2); print $2}') " && git ls-tree $(git show $(cat .git/refs/datalad/dataset-tree-version-list) | jq | grep "location" | awk '{gsub(/"/, "", $2); print $2}')
  214. The identifier ``study-100`` in a line such as ``040000 tree d1ad9bfa56f5aa25a1d28caf13db719b9e710d28 study-100`` is the ``dataset_path`` value of a given metadata entry.
  215. Commands such as ``meta-dump`` can use them to, e.g., only report on metadata for certain datasets, following the pattern
  216. .. code-block::
  217. [DATASET_PATH] ["@" VERSION-DIGITS] [":" [LOCAL_PATH]]
  218. e.g., ``./study-100``.
  219. While this is no workflow a user would have to do, this exploration might have nevertheless gotten you some insights into the inner workings of the commands and MetaLad's internal storage model.
  220. The metadata is now locally available in the Git repository ``metadata-repo``.
  221. You can verify this by issuing the command ``datalad meta-dump -r``, which will list all metadata from all ``dataset_path``\s in the repository.
  222. Can you guess what type of metadata it contains [#f6]_ ?
  223. .. runrecord:: _examples/DL-101-181-112
  224. :language: console
  225. :workdir: beyond_basics/meta/metadata-assimilation
  226. $ datalad meta-dump -r
  227. A final note is that :dlcmd:`meta-dump` can also be a source of metadata for :dlcmd:`meta-add`.
  228. While metadata can indeed be provided manually, or by running :term:`extractor`\s as outlined so far, it can also be provided by any other means that create correct metadata records, and :dlcmd:`meta-dump` is one of them.
  229. For example, you could copy the complete metadata from ``dataset_0`` to ``dataset_1``, by dumping it from one dataset into another::
  230. $ datalad meta-dump -d dataset_0 -r | \
  231. datalad meta-add -d dataset_1 --json-lines -
  232. Publish metadata to a Git-Repository
  233. """"""""""""""""""""""""""""""""""""
  234. You can also push your metadata to a remote :term:`sibling` (if you have write :term:`permissions`).
  235. This, too, uses a Git command to push only specific :term:`ref`\s.
  236. Let's assume you are in the directory that contains the git repository with your metadata, then you can push your metadata to a remote git repository ``<your repository>``::
  237. $ git push "<your repository>" "refs/datalad/*:refs/datalad/*"
  238. You will notice that no primary data is stored in the repository ``metadata-destination``. That allows you to publish metadata without publishing the primary data at the same time.
  239. Querying metadata
  240. ^^^^^^^^^^^^^^^^^
  241. As the metadata is in a highly structured form, and could correspond to agreed-upon or established schemas, queries through such metadata can use flexible tooling and don't need to rely on DataLad.
  242. One popular choice for working with JSON data, for example, is the JSON command line processor `jq <https://stedolan.github.io/jq>`_.
  243. In conjunction with Unix :term:`pipe`\s, one can assemble powerful queries in a single line.
  244. The (cropped) query below, for example, lists all unique family names of the authors in the institute's scientific project metadata in ``metadata-assimilation``:
  245. .. runrecord:: _examples/DL-101-181-119
  246. :language: console
  247. :workdir: beyond_basics/meta/metadata-assimilation
  248. :lines: 1, 6-20
  249. $ datalad meta-dump -r | jq '.extracted_metadata["@graph"][3]["@list"][].familyName' | sort | uniq
  250. Querying metadata remotely
  251. """"""""""""""""""""""""""
  252. You do not have to download metadata to dump it. It is also possible to specify a git-repository, and let metalad only read the metadata that it requires to fulfill your request. For example, in order to only retrieve metadata from a metadata entry that has the ``dataset_path`` value of ``study-100``, you can simply run:
  253. .. runrecord:: _examples/DL-101-181-120
  254. :language: console
  255. :workdir: beyond_basics/meta
  256. $ datalad meta-dump \
  257. -d https://github.com/christian-monch/metadata-test.git \
  258. ./study-100
  259. As the output shows, this command only downloaded enough data from the remote repository to dump all metadata in the specified dataset tree-path.
  260. If you want to query all metadata remotely from the repository you could issue the following command:
  261. .. runrecord:: _examples/DL-101-181-121
  262. :language: console
  263. :workdir: beyond_basics/meta
  264. $ datalad meta-dump \
  265. -d https://github.com/christian-monch/metadata-test.git -r
  266. This will take a lot longer than the previous command because datalad has to fetch more item from the remote repository. If you use the remote meta-dump option properly, you can quickly examine small subsets of very large metadata repositories.
  267. Using metadata
  268. ^^^^^^^^^^^^^^
  269. Now that we know all about metadata and how it is handled by MetaLad, here's a final note on its utility:
  270. Metadata, especially when it originates from different sources and gets harmonized to a single schema, provides the powerful opportunity to aid data discoverability.
  271. An example of a good use case for metadata is therefore a search or browsing interface, or data bases, such as data portals and graph query databases.
  272. MetaLad-extracted metadata can be used in workflows to generate such interfaces, and a concrete example is the :ref:`DataLad Catalog <catalog>`, which the next section will introduce.
  273. So to aid with the discoverability of data, one could add metadata to DataLad datasets, extract metadata with MetaLad and multiple extractors, translate extracted metadata to the catalog schema, submit it to ``datalad-catalog`` in order to generate catalog entries, which can all be browsed in a user friendly web-based interface.
  274. Intrigued? Read on to the next section for more information.
  275. Installation
  276. ^^^^^^^^^^^^
  277. MetaLad is a stand-alone Python package, and can be installed using
  278. .. code-block:: bash
  279. pip install datalad-metalad
  280. As with DataLad and other Python packages, you might want to do the installation in a :term:`virtual environment`.
  281. .. rubric:: Footnotes
  282. .. [#f1] It may seem like an unnecessary duplicated effort to record the names of contained files or certain file properties as metadata in a dataset already containing these files. However, metadata can be very useful whenever the primary data can't be shared, for example due to its large size or sensitive nature, allowing consumers to, for example, derive anonymized information, aggregate data with search queries, or develop code and submit it to the data holders to be ran on their behalf.
  283. .. [#f2] `JSON <https://en.wikipedia.org/wiki/JSON>`_ is a language-independent, open and lightweight data interchange format. Data is represented as human readable text, organized in key-value pairs (e.g., 'name': 'Bob') or arrays, and thus easily readable by both humans and machines. A *JSON object* is a collection of key-value pairs. Its enclosed in curly brackets, and individual pairs in the object are separated by commas.
  284. .. [#f3] A Unix timestamp is widely used in computing and measures time as the number of seconds passed since January 1st, 1970. The timestamp in the example metadata entry (``1675113291.1464975``) translates to January 30th, 2023, 22:14:51.146497 with the code snippet below. Lots of software tools have the ability to generate timestamps for you, for example Python's `time <https://docs.python.org/3/library/time.html>`_ module or the command ``date +%s`` in a command line on Unix systems.
  285. >>> from datetime import datetime
  286. >>> datetime.fromtimestamp(1675113291.1464975)
  287. datetime.datetime(2023, 1, 30, 21, 14, 51, 146497)
  288. .. [#f4] Alternatively, provide the switch ``-i`` to ``meta-add``, which tells it to just warn about ID mismatches instead of erroring out.
  289. .. [#f5] Other directories underneath ``.git/refs`` are automatically transported, such as ``.git/refs/heads`` or ``.git/refs/remotes`` - this is configured for each remote with a repositories ``.git/config`` file
  290. .. code-block:: bash
  291. $ cat .git/config
  292. [core]
  293. repositoryformatversion = 0
  294. filemode = true
  295. bare = false
  296. logallrefupdates = true
  297. editor = vim
  298. [remote "origin"]
  299. url = git@github.com:my-user/my-dataset.git
  300. fetch = +refs/heads/*:refs/remotes/origin/*
  301. .. [#f6] The answer is minimal information about archived scientific projects of a research institute. While some personal information has been obfuscated, you can still figure out which information is associated with each entry, such as the project name, its authors, or associated publications.