101-182-catalog.rst 29 KB


  1. .. _catalog:
  2. DataCat - a shiny front-end for your dataset
  3. --------------------------------------------
  4. .. importantnote:: Dependency note
  5. This section depends on ``datalad-catalog`` version ``1.1.1`` or later.
  6. If you're looking for ways to showcase your datasets, look no further than the `datalad-catalog extension <https://docs.datalad.org/projects/catalog>`_.
  7. This extension takes your favorite datasets and metadata, and generates a static website from it.
  8. .. figure:: ../artwork/src/catalog/datalad_catalog.svg
  9. For quick access to more resources, have a look at:
  10. - The `live demo catalog <https://datalad.github.io/datalad-catalog>`_
  11. - A 3-minute `explainer video <https://youtu.be/4GERwj49KFc>`_
  12. - The `datalad-catalog documentation <https://docs.datalad.org/projects/catalog>`_
  13. - The `source repository <https://github.com/datalad/datalad-catalog>`_ for an up-to-date overview of functionality
  14. - Below, a step-by-step tutorial to generate your own catalog
  15. Why DataCat?
  16. ^^^^^^^^^^^^
  17. Working collaboratively with large and distributed datasets poses particular challenges for FAIR data access, browsing, and usage:
  18. - the administrative burden of keeping track of different versions of the data, who contributed what, where/how to gain access,
  19. and representing this information centrally and accessibly can be significant
  20. - data privacy regulations might restrict data from being shared or accessed across multi-national sites
  21. - costs of centrally maintained infrastructure for data hosting and web-portal type browsing could be prohibitive
  22. Such challenges impede the many possible gains obtainable from distributed data sharing and access.
  23. Decisions might even be made to forego FAIR principles in favor of saving time, effort and money,
  24. leading to the view that these efforts have seemingly contradicting outcomes.
  25. *DataLad Catalog helps counter this* apparent contradiction by focusing on interoperability with structured, linked, and machine-readable :term:`metadata`.
  26. .. figure:: ../artwork/src/catalog/datalad_catalog_metadata.svg
  27. Metadata about datasets, their file content, and their links to other datasets can be used to create abstract representations
  28. of datasets that are separate from the actual data content. This means that data content can be stored securely while metadata
  29. can be shared and operated on widely, thus improving decentralization and FAIRness.
  30. How does it work?
  31. ^^^^^^^^^^^^^^^^^
  32. DataLad Catalog can receive commands to ``create`` a new catalog, ``add`` and ``remove`` metadata entries to/from an existing catalog, ``serve``
  33. an existing catalog locally, and more. Metadata can be provided to DataLad Catalog from any number of arbitrary metadata sources,
  34. as an aggregated set or as individual items/objects. DataLad Catalog has a dedicated schema (using the `JSON Schema <https://json-schema.org>`_ vocabulary)
  35. against which incoming metadata items are validated. This schema allows for standard metadata fields as one would expect for datasets of any kind
  36. (such as ``name``, ``doi``, ``url``, ``description``, ``license``, ``authors``, and more), as well as fields that support identification, versioning,
  37. dataset context and linkage, and file tree specification.
  38. The process of generating a catalog, after metadata entry validation, involves:
  39. 1. aggregation of the provided metadata into the catalog file tree, and
  40. 2. generating the assets required to render the user interface in a browser.
  41. .. figure:: ../artwork/src/catalog/datalad_catalog_howitworks.svg
  42. The output is a set of structured metadata files, as well as a `Vue.js <https://vuejs.org>`_-based browser interface that understands how to render
  43. this metadata in the browser. What is left for the user is to host this content on their platform of choice and to serve it for the world to see!
  44. The DataLad-based workflow
  45. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  46. The DataLad ecosystem provides a complete set of free and open source tools that, together, provide full control over dataset/file access
  47. and distribution, version control, provenance tracking, metadata addition/extraction/aggregation, and catalog generation.
  48. .. figure:: ../artwork/src/catalog/datalad_catalog_pipeline.svg
  49. - DataLad itself can be used for decentralized management of data as lightweight, portable and extensible representations.
  50. - DataLad MetaLad can extract structured high- and low-level metadata and associate it with these datasets or with individual files.
  51. - And at the end of the workflow, DataLad Catalog can turn the structured metadata into a user-friendly data browser.
  52. .. importantnote:: DataLad Catalog also operates independently
  53. Since it provides its own schema in a standard vocabulary,
  54. any metadata that conforms to this schema can be submitted
  55. to the tool in order to generate a catalog. Metadata items
  56. do not necessarily have to be derived from DataLad datasets,
  57. and the metadata extraction does not have to be conducted via
  58. DataLad MetaLad. Even so, the provided set of tools can be
  59. particularly powerful when used together in a distributed
  60. (meta)data management pipeline.
  61. Step-by-Step
  62. ^^^^^^^^^^^^
  63. Installing DataLad Catalog
  64. """"""""""""""""""""""""""
  65. Let's dive into it and create our own catalog! We'll start by creating and activating a new and empty virtual environment:
  66. .. code-block:: bash
  67. $ python -m venv my_catalog_env
  68. $ source my_catalog_env/bin/activate
  69. Then we can install ``datalad-catalog`` with ``pip``. This process will also install ``datalad`` and other dependencies:
  70. .. code-block:: bash
  71. $ pip install datalad-catalog
  72. After that, you can check the installation by running the ``datalad catalog`` command with the ``--help`` flag:
  73. .. runrecord:: _examples/DL-101-182-101
  74. :language: console
  75. :workdir: catalog
  76. :lines: 1-8
  77. :cast: catalog_basics
  78. :notes: Let's test the installation and look at the help information
  79. $ datalad catalog --help
  80. At this stage, you might be wondering why the catalog command is preceded by ``datalad`` as in ``datalad catalog``.
  81. DataLad Catalog is an extension of DataLad, which provides base functionality that the catalog generation process uses.
  82. It is installed as a dependency during the installation of DataLad Catalog, and provides supporting functionality during
  83. the catalog generation process.
  84. The main catalog functionality
  85. """"""""""""""""""""""""""""""
  86. As you likely saw in the ``--help`` information, DataLad Catalog has several main commands to support
  87. the process of catalog generation. These include ``catalog-``:
  88. - ``create``: create a new catalog
  89. - ``add``: add metadata entries to a catalog
  90. - ``remove``: remove metadata entries from a catalog
  91. - ``serve``: serve the catalog locally on an http server for testing purposes
  92. - ``validate``: validate metadata according to the catalog schema
  93. - ``set``: set catalog properties, such as the dataset that will be displayed as the catalog's ``home`` page
  94. - ``get``: get catalog properties, such as the catalog's configuration
  95. - ``translate``: translate a metalad-extracted metadata item from a particular source structure into the catalog schema
  96. - ``workflow``: run a multi-step workflow for recursive metadata extraction, translating metadata to the catalog schema, and adding the translated metadata to a new catalog
  97. Creating a new catalog
  98. """"""""""""""""""""""
  99. With the ``catalog-create`` command, you can create a new catalog. Let's try it out!
  100. .. runrecord:: _examples/DL-101-182-102
  101. :language: console
  102. :workdir: catalog
  103. :cast: catalog_basics
  104. :notes: Let's test the installation and look at the help information
  105. $ datalad catalog-create --catalog data-cat
  106. The catalog ``create(ok)`` result shows that the catalog was successfully created at the specified location (``./data-cat``),
  107. which was passed to the command with the ``-c/--catalog`` flag.
  108. Now we can inspect the catalog's content with the ``tree`` command:
  109. .. runrecord:: _examples/DL-101-182-103
  110. :language: console
  111. :workdir: catalog
  112. :cast: catalog_basics
  113. :notes: We can inspect the catalog's content with the tree command
  114. $ tree -L 1 data-cat
  115. As you can see, the catalog's root directory contains subdirectories for:
  116. - ``artwork``: images that make the catalog pretty
  117. - ``assets``: mainly the JavaScript and CSS code that underlie the user interface of the catalog.
  118. - ``metadata``: this is where metadata content for any datasets and files rendered by the catalog will be contained
  119. - ``schema``: a copy of the schema files that metadata entries of this catalog conform to
  120. - ``templates``: HTML templates used for rendering different views of the catalog
  121. It also contains an ``index.html`` file, which is the main catalog HTML content that will be served to users in their browsers,
  122. as well as a ``config.json`` file, which contains default and user-specified configuration settings for the catalog rendering.
  123. These directories and files are all populated into their respective locations by the ``datalad catalog-create`` command.
  124. Next, let's have a look at the catalog that we just created.
  125. Rendering a catalog locally
  126. """""""""""""""""""""""""""
  127. Since the catalog contains HTML, JavaScript, and CSS that can be viewed in any common browser
  128. (Google Chrome, Safari, Mozilla Firefox, etc), this content needs to be served.
  129. With the ``serve`` subcommand, you can serve the content of a catalog locally via an :term:`HTTP` server:
  130. .. code-block:: bash
  131. $ datalad catalog-serve --catalog data-cat
  132. If you navigate to the data-cat location (a URL is provided in the ``serve`` command output, typically ``http://localhost:8000/``),
  133. the catalog should be rendered. You should see the 404 page, since there is no metadata in the catalog yet.
  134. (Don't worry, that will change soon!)
  135. .. figure:: ../artwork/src/catalog/catalog_step_404.png
  136. To stop the serving process, you can hit CTRL+C in your shell environment.
  137. Adding catalog metadata
  138. """""""""""""""""""""""
  139. The catalog is, of course, only as useful as the metadata that is contained within it.
  140. So let's add some! This can easily be done with the ``catalog-add`` command and ``-m/--metadata`` flag:
  141. .. code-block:: bash
  142. $ datalad catalog-add --catalog <path-to-catalog> --metadata <path-to-metadata>
  143. DataLad Catalog accepts metadata input in the multiple formats, including:
  144. - a path to a file (typically with extension ``.json``, ``.jsonl``, or ``.txt``) containing JSON lines,
  145. where each line is a single, correctly formatted, JSON object.
  146. - JSON lines from STDIN
  147. - a JSON serialized string
  148. Before we add metadata to our `data-cat` catalog, we'll first introduce a few important concepts and tools.
  149. The Catalog schema
  150. """"""""""""""""""
  151. Each JSON object provided to the Catalog in the metadata file should be structured according to the Catalog schema,
  152. which is based on JSON Schema: a vocabulary that allows you to annotate and validate JSON documents.
  153. The implication is that you will have to format your metadata objects to conform to this standard.
  154. At the core of this standard are the concepts of a dataset and a file, which shouldn't be surprising
  155. to anyone working with data: we have a set of files organized in some kind of hierarchy, and sets of
  156. files are often delineated from other sets of files - here we call this delineation a *dataset*.
  157. There are a few core specifications of metadata objects within the context of the Catalog schema:
  158. - A metadata object can only be about a dataset or a file (its ``type``).
  159. - Each metadata object has multiple "key/value"-pairs that describe it.
  160. For example, an object of type ``dataset`` might have a ``name`` (key) equal
  161. to ``my_test_dataset`` (value), and a ``keywords`` field equal to the list
  162. ``["quick", "brown", "fox"]`` (value).
  163. An object of type ``file`` might have a ``format`` (key) equal to ``JSON`` (value).
  164. - Each metadata object should have a way to identify its related dataset.
  165. For an object of type ``dataset``, this will be the ``dataset_id`` and ``dataset_version``
  166. of the actual dataset. For an object of type ``file``, this will be the ``dataset_id``
  167. and ``dataset_version`` of its parent dataset (i.e. the dataset which the file forms part of).
  168. - Each metadata object of type ``file`` should have a ``path`` key for which the value
  169. specifies exactly where the file is located relative to the root directory of its parent dataset.
  170. - Datasets can have subdatasets.
  171. The Catalog schema specifies exactly which fields are required and which data types
  172. are accepted for each key/value-pair. For an improved understanding of the Catalog schema,
  173. you can inspect the `JSON documents here <https://github.com/datalad/datalad-catalog/tree/main/datalad_catalog/catalog/schema>`_ (``jsonschema_*``).
  174. Sample metadata
  175. """""""""""""""
  176. Let's look at a toy example of metadata that adheres to the Catalog schema.
  177. First a dataset:
  178. .. code-block::
  179. {
  180. "type": "dataset",
  181. "dataset_id":"5df8eb3a-95c5-11ea-b4b9-a0369f287950",
  182. "dataset_version":"dae38cf901995aace0dde5346515a0134f919523",
  183. "name": "My toy dataset",
  184. "short_name": "My toy dataset",
  185. "description": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus nec justo tellus. Nunc sagittis eleifend magna, eu blandit arcu tincidunt eu. Mauris pharetra justo nec volutpat euismod. Curabitur bibendum vitae nunc a pharetra. Donec non rhoncus risus, ac consequat purus. Pellentesque ultricies ut enim non luctus. Sed viverra dolor enim, sed blandit lorem interdum sit amet. Aenean tincidunt et dolor sit amet tincidunt. Vivamus in sollicitudin ligula. Curabitur volutpat sapien erat, eget consectetur mauris dapibus a. Phasellus fringilla justo ligula, et fringilla tortor ullamcorper id. Praesent tristique lacus purus, eu convallis quam vestibulum eget. Donec ullamcorper mi neque, vel tincidunt augue porttitor vel.",
  186. "doi": "",
  187. "url": ["https://github.com/jsheunis/multi-echo-super"],
  188. "license": {
  189. "name": "CC BY 4.0",
  190. "url": "https://creativecommons.org/licenses/by/4.0/"
  191. },
  192. "authors": [
  193. {
  194. "givenName":"Stephan",
  195. "familyName":"Heunis",
  196. }
  197. ],
  198. "keywords": ["lorum", "ipsum", "foxes"],
  199. "funding": [
  200. {
  201. "name":"Stephans Bank Account",
  202. "identifier":"No. 42",
  203. "description":"Nothing to see here"
  204. }
  205. ],
  206. "metadata_sources": {
  207. "key_source_map": {},
  208. "sources": [
  209. {
  210. "source_name": "stephan_manual",
  211. "source_version": "1",
  212. "source_parameter": {},
  213. "source_time": 1652340647.0,
  214. "agent_name": "Stephan Heunis",
  215. "agent_email": ""
  216. }
  217. ]
  218. }
  219. }
  220. And then two files of the dataset:
  221. .. code-block::
  222. {
  223. "type": "file"
  224. "dataset_id": "5df8eb3a-95c5-11ea-b4b9-a0369f287950",
  225. "dataset_version": "dae38cf901995aace0dde5346515a0134f919523",
  226. "contentbytesize": 1403
  227. "path": "README",
  228. "metadata_sources": {
  229. "key_source_map": {},
  230. "sources": [
  231. {
  232. "source_name": "stephan_manual",
  233. "source_version": "1",
  234. "source_parameter": {},
  235. "source_time": 1652340647.0,
  236. "agent_name": "Stephan Heunis",
  237. "agent_email": ""
  238. }
  239. ]
  240. }
  241. }
  242. {
  243. "type": "file"
  244. "dataset_id": "5df8eb3a-95c5-11ea-b4b9-a0369f287950",
  245. "dataset_version": "dae38cf901995aace0dde5346515a0134f919523",
  246. "contentbytesize": 15572
  247. "path": "main_data/main_results.png",
  248. "metadata_sources": {
  249. "key_source_map": {},
  250. "sources": [
  251. {
  252. "source_name": "stephan_manual",
  253. "source_version": "1",
  254. "source_parameter": {},
  255. "source_time": 1652340647.0,
  256. "agent_name": "Stephan Heunis",
  257. "agent_email": ""
  258. }
  259. ]
  260. }
  261. }
  262. Validating your metadata
  263. """"""""""""""""""""""""
  264. For convenience during metadata setup and catalog generation, the ``catalog-validate``
  265. command that let's you test whether your metadata conforms to the
  266. catalog schema before adding it. Let's test it on the toy metadata.
  267. First we'll put the metadata into a file, which is the format currently accepted
  268. when adding metadata to a catalog:
  269. .. runrecord:: _examples/DL-101-182-104
  270. :language: console
  271. :workdir: catalog
  272. :cast: catalog_basics
  273. :notes: Add metadata objects to a text file
  274. $ touch toy_metadata.jsonl
  275. $ echo '{ "type": "dataset", "dataset_id": "5df8eb3a-95c5-11ea-b4b9-a0369f287950", "dataset_version": "dae38cf901995aace0dde5346515a0134f919523", "name": "My toy dataset", "short_name": "My toy dataset", "description": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus nec justo tellus. Nunc sagittis eleifend magna, eu blandit arcu tincidunt eu. Mauris pharetra justo nec volutpat euismod. Curabitur bibendum vitae nunc a pharetra. Donec non rhoncus risus, ac consequat purus. Pellentesque ultricies ut enim non luctus. Sed viverra dolor enim, sed blandit lorem interdum sit amet. Aenean tincidunt et dolor sit amet tincidunt. Vivamus in sollicitudin ligula. Curabitur volutpat sapien erat, eget consectetur mauris dapibus a. Phasellus fringilla justo ligula, et fringilla tortor ullamcorper id. Praesent tristique lacus purus, eu convallis quam vestibulum eget. Donec ullamcorper mi neque, vel tincidunt augue porttitor vel.", "doi": "", "url": "https://github.com/jsheunis/multi-echo-super", "license": { "name": "CC BY 4.0", "url": "https://creativecommons.org/licenses/by/4.0/" }, "authors": [ { "givenName": "Stephan", "familyName": "Heunis"} ], "keywords": [ "lorum", "ipsum", "foxes" ], "funding": [ { "name": "Stephans Bank Account", "identifier": "No. 42", "description": "Nothing to see here" } ], "metadata_sources": { "key_source_map": {}, "sources": [ { "source_name": "stephan_manual", "source_version": "1", "source_parameter": {}, "source_time": 1652340647.0, "agent_name": "Stephan Heunis", "agent_email": "" } ] } }' >> toy_metadata.jsonl
  276. $ echo '{ "type": "file", "dataset_id": "5df8eb3a-95c5-11ea-b4b9-a0369f287950", "dataset_version": "dae38cf901995aace0dde5346515a0134f919523", "contentbytesize": 1403, "path": "README", "metadata_sources": { "key_source_map": {}, "sources": [ { "source_name": "stephan_manual", "source_version": "1", "source_parameter": {}, "source_time": 1652340647.0, "agent_name": "Stephan Heunis", "agent_email": "" } ] } }' >> toy_metadata.jsonl
  277. $ echo '{ "type": "file", "dataset_id": "5df8eb3a-95c5-11ea-b4b9-a0369f287950", "dataset_version": "dae38cf901995aace0dde5346515a0134f919523", "contentbytesize": 15572, "path": "main_data/main_results.png", "metadata_sources": { "key_source_map": {}, "sources": [ { "source_name": "stephan_manual", "source_version": "1", "source_parameter": {}, "source_time": 1652340647.0, "agent_name": "Stephan Heunis", "agent_email": "" } ] } }' >> toy_metadata.jsonl
  278. Then we can validate the metadata in this file:
  279. .. runrecord:: _examples/DL-101-182-105
  280. :language: console
  281. :workdir: catalog
  282. :cast: catalog_basics
  283. :notes: Validate metadata according to the catalog schema
  284. $ datalad catalog-validate --metadata toy_metadata.jsonl
  285. Great! This confirms that we have valid metadata :)
  286. Take note that this validator also runs internally whenever metadata is added to the catalog,
  287. so there is no specific need to run validation explicitly unless you want you.
  288. Adding metadata
  289. """""""""""""""
  290. Finally, we can add metadata!
  291. .. runrecord:: _examples/DL-101-182-106
  292. :language: console
  293. :workdir: catalog
  294. :cast: catalog_basics
  295. :notes: Validate metadata according to the catalog schema
  296. $ datalad catalog-add --catalog data-cat --metadata toy_metadata.jsonl
  297. The ``catalog-add(ok)`` result indicates that our metadata was added successfully to the catalog.
  298. You can inspect this by looking at the content of the metadata directory inside the catalog:
  299. .. runrecord:: _examples/DL-101-182-107
  300. :language: console
  301. :workdir: catalog
  302. :cast: catalog_basics
  303. :notes: Validate metadata according to the catalog schema
  304. $ tree data-cat/metadata
  305. Where previously the metadata directory contained nothing, it now has several subdirectories
  306. and two ``.json``-files. Note, first, that the first two recursive subdirectory names correspond
  307. respectively to the ``dataset_id`` and ``dataset_version`` of the dataset in the toy metadata
  308. that we added to the catalog. This supports the DataLad Catalog's ability to identify specific
  309. datasets and their files by ID and version in order to update the catalog easily (and, when it
  310. comes to decentralized contribution, without conflicts). The subdirectories further down the
  311. hierarchy, as well as the filenames, are just hashes of the path to the specific directory node
  312. relative to the parent dataset. Let's look at the content of these files:
  313. .. runrecord:: _examples/DL-101-182-108
  314. :language: console
  315. :workdir: catalog
  316. :lines: 1-7, 33-35, 47-57, 75-102
  317. :cast: catalog_basics
  318. :notes: Validate metadata according to the catalog schema
  319. $ cat data-cat/metadata/5df8eb3a-95c5-11ea-b4b9-a0369f287950/dae38cf901995aace0dde5346515a0134f919523/449/268b13a1c869555f6c2f6e66d3923.json | jq .
  320. $ cat data-cat/metadata/5df8eb3a-95c5-11ea-b4b9-a0369f287950/dae38cf901995aace0dde5346515a0134f919523/578/b4ba64a67d1d99cbcf06d5d26e0f6.json | jq .
  321. As you can see, the content of these files is very similar to the original toy data, but slightly
  322. transformed. This transformation creates a structure that is easier for the associated browser
  323. application to read and render. Additionally, structuring data into metadata files that represent
  324. nodes in the dataset hierarchy (i.e. a datasets or directories) allows the browser application to
  325. only access the data in those metadata files whenever the user selects the applicable node.
  326. This saves loading time which makes the user experience more seamless.
  327. Viewing a particular dataset
  328. """"""""""""""""""""""""""""
  329. So, that was everything that happened behind the scenes during the ``datalad catalog-add`` procedure,
  330. but what does our updated catalog look like? Let's take a look. If you serve the catalog again
  331. and navigate to the localhost, you should see... no change?!
  332. The reason for this is that we didn't specify the details of the particular dataset that we want to view,
  333. and there is also no default specified for the catalog.
  334. If we want to view the specific dataset that we just added to the catalog, we can specify its
  335. ``dataset_id`` and ``dataset_version`` by appending them to the URL in the format::
  336. <catalog-url>/#/dataset/<dataset_id>/<dataset_version>
  337. This makes it possible to view any uniquely identifiable dataset by navigating to a unique URL.
  338. Let's try it with our toy example. Navigate to the localhost (the 404 page should be displayed), append::
  339. /#/dataset/5df8eb3a-95c5-11ea-b4b9-a0369f287950/dae38cf901995aace0dde5346515a0134f919523
  340. to the end of the URL, and hit ENTER/RETURN. You should see something like this:
  341. .. figure:: ../artwork/src/catalog/catalog_step_dataset.png
  342. This is the dataset view, with the content tab (auto-)selected.
  343. This view displays all the main content related to the dataset that was provided by the metadata,
  344. and allows the user further functionality like downloading the dataset with DataLad,
  345. downloading the metadata, filtering subdatasets by keyword, browsing files, and viewing extended
  346. attributes such as funding information related to the dataset. Below are two more views,
  347. the first with the subdatasets tab selected, and the second with the funding tab selected.
  348. .. figure:: ../artwork/src/catalog/catalog_step_subdatasets.png
  349. .. figure:: ../artwork/src/catalog/catalog_step_funding.png
  350. Setting the catalog home page
  351. """""""""""""""""""""""""""""
  352. When one navigates to a specific catalog's root address, i.e. without a ``dataset_id`` and ``dataset_version``
  353. specified in the URL, the browser application checks if a home page is specified for the catalog. If not,
  354. it renders the 404 page.
  355. The specification of a home page could be useful for cases where the catalog,
  356. when navigated to, should always render the top-level list of available datasets
  357. in the catalog (provided by the metadata as subdatasets to the superdataset).
  358. Let's add our toy dataset as the catalog's home page, using the ``catalog-set`` command
  359. with the ``home`` property, and additionally specifying the dataset's ``dataset_id``
  360. (``-i/--dataset-id`` flag) and ``dataset_version`` (``-v/--dataset-version`` flag):
  361. .. runrecord:: _examples/DL-101-182-109
  362. :language: console
  363. :workdir: catalog
  364. :cast: catalog_basics
  365. :notes: Add a superdataset to the catalog
  366. $ datalad catalog-set --catalog data-cat --dataset-id 5df8eb3a-95c5-11ea-b4b9-a0369f287950 --dataset-version dae38cf901995aace0dde5346515a0134f919523 home
  367. The catalog ``catalog-set(ok)`` result shows that the superdataset was successfully set
  368. for the catalog, and you will now also be able to see an additional ``super.json`` file in the
  369. catalog metadata directory. The content of this file is a simple JSON object specifying the
  370. main dataset's ``dataset_id`` and ``dataset_version``:
  371. .. runrecord:: _examples/DL-101-182-110
  372. :language: console
  373. :workdir: catalog
  374. :cast: catalog_basics
  375. :notes: Display the content of super.json
  376. $ cat data-cat/metadata/super.json | jq .
  377. *Now*, when one navigates to the catalog's root address without a ``dataset_id`` and
  378. ``dataset_version`` specified in the URL, the browser application will find that a
  379. default dataset is indeed specified for the catalog, and it will navigate to that specific
  380. dataset view!
  381. Catalog configuration
  382. """""""""""""""""""""
  383. A useful feature of the catalog process is to be able to configure certain properties according
  384. to your preferences. This is done with help of a config file (in either ``JSON`` or ``YAML`` format)
  385. and the ``-F/--config-file`` flag during catalog generation. DataLad Catalog provides a default
  386. config file with the following content:
  387. .. runrecord:: _examples/DL-101-182-111
  388. :language: console
  389. :workdir: catalog
  390. :cast: catalog_basics
  391. :notes: Display the content of the default config file
  392. $ cat data-cat/config.json | jq .
  393. If no config file is supplied to the ``catalog-create`` command, the default is used.
  394. Let's create a new toy catalog with a new config, specifying a new name, a new logo, and new colors for the links.
  395. This will be the content of the config file, in ``YAML`` format:
  396. .. runrecord:: _examples/DL-101-182-112
  397. :language: console
  398. :workdir: catalog
  399. :cast: catalog_basics
  400. :notes: Add a custom config file
  401. $ cat << EOT >> cat_config.yml
  402. # Catalog properties
  403. catalog_name: "Toy Catalog"
  404. # Styling
  405. logo_path: "datalad_logo_funky.svg" # path to logo
  406. link_color: "#32A287" # hex color code
  407. link_hover_color: "#A9FDAC" # hex color code
  408. # Handling multiple metadata sources
  409. property_sources:
  410. dataset: {}
  411. EOT
  412. We'll ensure that the new custom logo is available locally:
  413. .. runrecord:: _examples/DL-101-182-113
  414. :language: console
  415. :workdir: catalog
  416. :cast: catalog_basics
  417. :notes: Get the custom logo
  418. $ wget -q -O datalad_logo_funky.svg https://raw.githubusercontent.com/datalad/tutorials/5e5fc0a4/notebooks/catalog_tutorials/test_data/datalad_logo_funky.svg
  419. Now we can run all the necessary subcommands for the catalog generation process:
  420. .. runrecord:: _examples/DL-101-182-114
  421. :language: console
  422. :workdir: catalog
  423. :cast: catalog_basics
  424. :notes: Create a new catalog with custom config
  425. $ datalad catalog-create -c custom-cat -m toy_metadata.jsonl -F cat_config.yml
  426. $ datalad catalog-set -c custom-cat -i 5df8eb3a-95c5-11ea-b4b9-a0369f287950 -v dae38cf901995aace0dde5346515a0134f919523 home
  427. To test this, serve the new custom catalog and navigate to the localhost to view it.
  428. You should see the following:
  429. .. figure:: ../artwork/src/catalog/catalog_step_config.png
  430. Well done! You have just configured your catalog with a custom logo and color scheme!
  431. (apologies if you find the colors a bit loud :-P)
  432. The configuration will also come in handy when there are more advanced forms of metadata
  433. in a catalog, especially when multiple sources of metadata are available for the same dataset.
  434. In such cases, one might want to specify or prioritize how these multiple sources are displayed,
  435. and the catalog configuration allows for that via specification of the ``property_sources`` key.
  436. Find out more in the `dedicated documentation <https://docs.datalad.org/projects/catalog/en/latest/catalog_config.html>`_.
  437. And that's it!
  438. """"""""""""""
  439. *For now... :)*
  440. You now know how to install DataLad Catalog and how to employ its basic features in order to create
  441. and configure a browser-based catalog from structured metadata. Congrats!
  442. You might want to explore further to find out how to build more advanced metadata handling and
  443. catalog generation workflows, or to learn how to use additional features. If so, please visit
  444. `DataLad Catalog's user documentation <https://docs.datalad.org/projects/catalog/en/latest>`_.