Browse Source

Merge remote-tracking branch 'refs/remotes/origin/master'

Jan Grewe 2 years ago
parent
commit
81417998c7
1 changed files with 959 additions and 0 deletions
  1. 959 0
      day_1/tutorial_3.ipynb

+ 959 - 0
day_1/tutorial_3.ipynb

@@ -0,0 +1,959 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### INCF Workshop \n",
+    "\n",
+    "# Integrated Storage and Management of Data & Metadata with NIX\n",
+    "\n",
+    "                                    The Neuroscience Information eXchange format\n",
+    "\n",
+    "                                    Jan Grewe1, Michael Sonntag2\n",
+    "\n",
+    "                                    1 Institute for Neurobiology\n",
+    "                                      Eberhard-Karls-Universität Tübingen\n",
+    "                                    \n",
+    "                                    2 Department Biologie II\n",
+    "                                      Ludwig-Maximilians-Universität München\n",
+    "\n",
+    "                                    30.08. - 01.09.2021\n",
+    "\n",
+    "\n",
+    "![G-Node-logo.png](./resources/G-Node-logo.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data and Metadata (data annotation) - Tutorial 3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### What are metadata and why are they needed?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Metadata are data about data. As a non-scientific example: title and director of a movie are metadata.\n",
+    "\n",
+    "In science metadata describe the conditions under which the raw data of an experimental study were acquired or analysed.\n",
+    "\n",
+    "Metadata can be anything that is related to an experiment or an analysis step\n",
+    "- stimulus / protocols\n",
+    "- environmental factors e.g. temperature, gas or liquid concentrations, ...\n",
+    "- operational information e.g. experimenter, date, time, organism, strain, ...\n",
+    "- subject information e.g. animal strain, history, ...\n",
+    "- hardware/software used\n",
+    "- settings\n",
+    "\n",
+    "Traditionally, actively collected metadata will be found in spreadsheets or lab books. Further metadata is found in raw data files, hardware information, code comments, etc.\n",
+    "\n",
+    "The organization of such metadata and their accessibility is not a trivial task, most laboratories developed their home-made solutions over time to keep track of their metadata. The collection and organization of these metadata in its own right is already a tough job since experiments are diverse and may even change over time.\n",
+    "\n",
+    "Metadata is especially important when trying to make sense of data \n",
+    "- that you are not familiar with\n",
+    "- that you have not worked with for a while\n",
+    "\n",
+    "A hard issue in this respect is that most of the metadata information is usually disconnected from the data it belongs to; searching data and retrieving the corresponding metadata or vice versa is usually not trivial, especially after a period of time has passed.\n",
+    "\n",
+    "With NIX, metadata can be stored alongside the data it belongs to, the process of collecting the metadata can be automatized and the results are machine readable and can be searched programatically."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data and data annotation in the same file"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The entities of the data model that were discussed so far carry just enough information to get a basic understanding of the stored data. Often much more information than that is required."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "NIX does not only allow to save initial data and analysed data within the same file. It also allows to create structured annotations of the experiments that were conducted and connects this information directly to the data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Metadata in NIX files is stored in the [odML format](https://g-node.github.io/python-odml); odML is a hierarchically structured data format that provides grouping in nestable `Sections` and stores information in `Property`-`Value` pairs. `Sections` are the main structural elements, while `Properties` hold the actual metadata information."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The odml data model in NIX\n",
+    "![](./resources/nix_odML_model_simplified.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " On a conceptual level, data and metadata in a NIX file live side by side in parallel trees. The different layers can be connected from the data tree to the metadata tree. Corresponding data can be retrieved when exploring the metadata tree."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "    ---------------- NIX File --------\n",
+    "    ├─ Section                  <--- ├─ Block\n",
+    "    |  ├─ Section                    |  ├─ DataArray\n",
+    "    |  |  └─ Property                |  ├─ DataArray\n",
+    "    |  └─ Section                    |  ├─ Tag\n",
+    "    |     └─ Property                |  └─ Multitag\n",
+    "    └─ Section                  <--- └─ Block\n",
+    "       └─ Section               <---    ├─ DataArray\n",
+    "          ├─ Property                   ├─ DataArray\n",
+    "          ├─ Property                   └─ Group\n",
+    "          └─ Property                    \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Storing metadata in NIX"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Metadata basics: creating section-property trees"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To introduce the usage of metadata functions in NIX, we'll keep it simple and abstract for now."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nixio as nix\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Lets explore the metadata functions of NIX before going more into detail\n",
+    "# We will re-use this file throughout the following examples\n",
+    "f = nix.File.open(\"metadata.nix\", nix.FileMode.Overwrite)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# As expected there are no metadata in our current file yet.\n",
+    "print(f.sections)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Help on method create_section in module nixio.file:\n",
+      "\n",
+      "create_section(name, type_='undefined', oid=None) method of nixio.file.File instance\n",
+      "    Create a new metadata section inside the file.\n",
+      "    \n",
+      "    :param name: The name of the section to create.\n",
+      "    :type name: str\n",
+      "    :param type_: The type of the section.\n",
+      "    :type type_: str\n",
+      "    :param oid: object id, UUID string as specified in RFC 4122. If no id\n",
+      "                is provided, an id will be generated and assigned.\n",
+      "    :type oid: str\n",
+      "    \n",
+      "    :returns: The newly created section.\n",
+      "    :rtype: nixio.Section\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Lets check how we can create a new Section. Sections can be created from File and Section objects.\n",
+    "help(f.create_section)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Section: {name = recording.20210405, type = raw.data.recording}]"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# First we need to create a Section that can hold our annotations. We'll use abstract names and types for now.\n",
+    "sec = f.create_section(name=\"recording.20210405\", type_=\"raw.data.recording\")\n",
+    "\n",
+    "f.sections\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "DuplicateName",
+     "evalue": "Duplicate name - names have to be unique for a given entity type & parent. (create_section)",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mDuplicateName\u001b[0m                             Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-15-f747f9038af9>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# Like other NIX objects Section names on the same level have to be unique\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0msection\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate_section\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"recording.20210405\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtype_\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"raw.data.recording\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+      "\u001b[0;32m~/Chaos/software/miniconda3/envs/work/lib/python3.9/site-packages/nixio/file.py\u001b[0m in \u001b[0;36mcreate_section\u001b[0;34m(self, name, type_, oid)\u001b[0m\n\u001b[1;32m    443\u001b[0m         \"\"\"\n\u001b[1;32m    444\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msections\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 445\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mDuplicateName\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"create_section\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    446\u001b[0m         \u001b[0msec\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mSection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcreate_new\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_metadata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtype_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moid\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    447\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0msec\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mDuplicateName\u001b[0m: Duplicate name - names have to be unique for a given entity type & parent. (create_section)"
+     ]
+    }
+   ],
+   "source": [
+    "# Like other NIX objects Section names on the same level have to be unique\n",
+    "section = f.create_section(name=\"recording.20210405\", type_=\"raw.data.recording\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[]"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Sections can hold further multiple Sections as well as multiple Properties.\n",
+    "sec.sections\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[]"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# The section currently does not contain any Properties.\n",
+    "sec.props"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We want to add information about a subject that was used in the experiment.\n",
+    "sub_sec = sec.create_section(name=\"subject\", type_=\"raw.data.recording\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Help on method create_property in module nixio.section:\n",
+      "\n",
+      "create_property(name='', values_or_dtype=0, oid=None, copy_from=None, keep_copy_id=True) method of nixio.section.Section instance\n",
+      "    Add a new property to the section.\n",
+      "    \n",
+      "    :param name: The name of the property to create/copy.\n",
+      "    :type name: str\n",
+      "    :param values_or_dtype: The values of the property or a valid DataType.\n",
+      "    :type values_or_dtype: list of values or a nixio.DataType\n",
+      "    :param oid: object id, UUID string as specified in RFC 4122. If no id\n",
+      "                is provided, an id will be generated and assigned.\n",
+      "    :type oid: str\n",
+      "    :param copy_from: The Property to be copied, None in normal mode\n",
+      "    :type copy_from: nixio.Property\n",
+      "    :param keep_copy_id: Specify if the id should be copied in copy mode\n",
+      "    :type keep_copy_id: bool\n",
+      "    \n",
+      "    :returns: The newly created property.\n",
+      "    :rtype: nixio.Property\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Properties can be created from Section objects.\n",
+    "help(sub_sec.create_property)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We'll add metadata about subjectID, subject species and subject age as Properties to the \"Subject\" section.\n",
+    "_ = sub_sec.create_property(name=\"subjectID\", values_or_dtype=\"78376446-f096-47b9-8bfe-ce1eb43a48dc\")\n",
+    "\n",
+    "_ = sub_sec.create_property(name=\"species\", values_or_dtype=\"Mus Musculus\")\n",
+    "\n",
+    "# To fully describe metadata, properties support saving \"unit\" and \"uncertainty\" together with values.\n",
+    "prop = sub_sec.create_property(name=\"age\", values_or_dtype=\"4\")\n",
+    "\n",
+    "prop.unit = \"weeks\"\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Section: {name = recording.20210405, type = raw.data.recording}]"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Lets check what we have so far at the root of the file.\n",
+    "f.sections\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "File: name = metadata.nix\n",
+      "  recording.20210405 [raw.data.recording]\n",
+      "    subject [raw.data.recording]\n",
+      "        |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)\n",
+      "        |- species: ('Mus Musculus',)\n",
+      "        |- age: ('4',)weeks\n"
+     ]
+    }
+   ],
+   "source": [
+    "# File and Sections also support the \"pprint\" function to make it easier to get an overview \n",
+    "# over the contents of the metadata tree.\n",
+    "f.pprint()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Property: {name = subjectID}, Property: {name = species}, Property: {name = age}]"
+      ]
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# We access all Properties of the subsection containing subject related information.\n",
+    "# Sections can be accessed via index or via name\n",
+    "f.sections[0].sections['subject'].props\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "subject [raw.data.recording]\n",
+      "    |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)\n",
+      "    |- species: ('Mus Musculus',)\n",
+      "    |- age: ('4',)weeks\n"
+     ]
+    }
+   ],
+   "source": [
+    "# We can also again use the pprint function\n",
+    "f.sections[0].sections['subject'].pprint()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "f.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Connecting data and metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "f = nix.File.open(\"metadata.nix\", nix.FileMode.ReadWrite)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We'll add some minimal abstract data\n",
+    "\n",
+    "rec_block = f.create_block(name=\"project.recordings\", type_=\"example.raw.data\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "SampledDimension: {index = 1}"
+      ]
+     },
+     "execution_count": 29,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "example_data_01 = [2, 2, 2, 6, 6, 6, 6, 2, 2, 2]\n",
+    "da = rec_block.create_data_array(name=\"recording.20210405\", array_type=\"shift.data\", data=example_data_01,\n",
+    "                                 label=\"df/f\")\n",
+    "da.append_sampled_dimension(0.001, label=\"time\", unit=\"s\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "SampledDimension: {index = 1}"
+      ]
+     },
+     "execution_count": 31,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "example_data_02 = [2, 2, 2, 8, 8, 8, 8, 2, 2, 2]\n",
+    "da = rec_block.create_data_array(name=\"recording.20210505.01\", array_type=\"shift.data\", data=example_data_02,\n",
+    "                                 label=\"df/f\")\n",
+    "da.append_sampled_dimension(0.001, label=\"time\", unit=\"s\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We'll also create a tag, that will reference a specific section in the data arrays\n",
+    "stim_on = 4\n",
+    "stim_off = 8\n",
+    "# We create the tag on the same block as the DataArrays it should reference\n",
+    "stimulus_tag = rec_block.create_tag(\"stimulus.down.3\", \"stimulus.shift\", position=[stim_on])\n",
+    "stimulus_tag.extent = [stim_off - stim_on]\n",
+    "\n",
+    "# We append the DataArrays of both experiments to the tag\n",
+    "stimulus_tag.references.append(f.blocks[\"project.recordings\"].data_arrays[\"recording.20210405\"])\n",
+    "stimulus_tag.references.append(f.blocks[\"project.recordings\"].data_arrays[\"recording.20210505.01\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now we want to hook up the DataArrays and the Tag to more information - to the metadata we have defined before\n",
+    "\n",
+    "# We will only reference the appropriate metadata for recording 20210405, since we have not defined metadata \n",
+    "# for the second recording yet.\n",
+    "\n",
+    "# We'll set the metadata for both data array and tag\n",
+    "f.blocks[\"project.recordings\"].data_arrays[\"recording.20210405\"].metadata = f.sections[\"recording.20210405\"]\n",
+    "f.blocks[\"project.recordings\"].tags[\"stimulus.down.3\"].metadata = f.sections[\"recording.20210405\"]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "recording.20210405 [raw.data.recording]\n",
+      "  subject [raw.data.recording]\n",
+      "      |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)\n",
+      "      |- species: ('Mus Musculus',)\n",
+      "      |- age: ('4',)weeks\n"
+     ]
+    }
+   ],
+   "source": [
+    "# We can now access the metadata from DataArray and Tag:\n",
+    "\n",
+    "f.blocks[\"project.recordings\"].data_arrays[\"recording.20210405\"].metadata.pprint()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "recording.20210405 [raw.data.recording]\n",
+      "  subject [raw.data.recording]\n",
+      "      |- subjectID: ('78376446-f096-47b9-8bfe-ce1eb43a48dc',)\n",
+      "      |- species: ('Mus Musculus',)\n",
+      "      |- age: ('4',)weeks\n"
+     ]
+    }
+   ],
+   "source": [
+    "f.blocks[\"project.recordings\"].tags[\"stimulus.down.3\"].metadata.pprint()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "f.blocks[\"project.recordings\"].data_arrays[\"recording.20210505.01\"].metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "f.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Until now we have seen how to create and store metadata in NIX files. Now we can check how to connect them to actual data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Automated handling of metadata"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Metadata can become quite complex and it can become tedious to create large trees over and over again. To this end, \"template\" sections can be created and re-used.\n",
+    "\n",
+    "As an example: when running an experiment, usually there are a couple of different stimulus protocols or one or two hardware setups, but the stimulus or the hardware itself does not change. When adding data to an existing NIX file, the hardware metadata can be pre-defined for these setups and attached to the specific experimental data once it is stored in the file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the file that will contain templates for import\n",
+    "ft = nix.File.open(\"metadata_templates.nix\", nix.FileMode.Overwrite)\n",
+    "\n",
+    "# The current example file will contain the data and will import from the templates file\n",
+    "fi = nix.File.open(\"metadata.nix\", nix.FileMode.ReadWrite)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We'll create \"template\" sections in the \"template\"  a metadata tree that has two main branches: \"templates\" and \"sessions\"\n",
+    "# \"templates\" will contain re-usable metadata templates, while \"sessions\" will contain the metadata \n",
+    "# that is linked to the actual experimental data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We will add basic templates simulating two microscopes setups\n",
+    "# Create a container section for the microscope setups with slightly different metadata\n",
+    "sec_micro_A = ft.create_section(name=\"microscope_station_A\", type_=\"hardware.microscopes\")\n",
+    "_ = sec_micro_A.create_property(name=\"Manufacturer\", values_or_dtype=\"Company A\")\n",
+    "_ = sec_micro_A.create_property(name=\"Objective\", values_or_dtype=\"Pln Apo 40x/1.3 oil DIC II\")\n",
+    "_ = sec_micro_A.create_property(name=\"pE LED intensity\", values_or_dtype=\"20\")\n",
+    "\n",
+    "sec_micro_B = ft.create_section(name=\"microscope_station_B\", type_=\"hardware.microscopes\")\n",
+    "_ = sec_micro_B.create_property(name=\"Manufacturer\", values_or_dtype=\"Company B\")\n",
+    "_ = sec_micro_B.create_property(name=\"Objective\", values_or_dtype=\"Pln Apo 40x/1.3 oil DIC II\")\n",
+    "_ = sec_micro_B.create_property(name=\"pE LED intensity\", values_or_dtype=\"30\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "File: name = metadata_templates.nix\n",
+      "  microscope_station_A [hardware.microscopes]\n",
+      "      |- Manufacturer: ('Company A',)\n",
+      "      |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)\n",
+      "      |- pE LED intensity: ('20',)\n",
+      "  microscope_station_B [hardware.microscopes]\n",
+      "      |- Manufacturer: ('Company B',)\n",
+      "      |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)\n",
+      "      |- pE LED intensity: ('30',)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# the root \"templates\" section now contains two microscope setup templates\n",
+    "ft.pprint(max_depth=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# When running an experiment and adding new data to the NIX file, \n",
+    "# the appropriate, full template can be copied and added."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Section: {name = microscope_station_A, type = hardware.microscopes}"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Create a base section in the session file\n",
+    "sec_ses = fi.create_section(name=\"sessions\")\n",
+    "\n",
+    "# On three different days experiments are added and the used setup is documented using the templates:\n",
+    "sec_session01 = fi.sections[\"sessions\"].create_section(name=\"recording.20210505.01\", type_=\"raw-data.ca-imaging\")\n",
+    "\n",
+    "sec_setup_A = ft.sections[\"microscope_station_A\"]\n",
+    "sec_session01.copy_section(sec_setup_A)\n",
+    "\n",
+    "sec_session02 = fi.sections[\"sessions\"].create_section(name=\"recording.20210506.01\", type_=\"raw-data.ca-imaging\")\n",
+    "\n",
+    "sec_setup_B = ft.sections[\"microscope_station_B\"]\n",
+    "sec_session02.copy_section(sec_setup_B)\n",
+    "\n",
+    "sec_session03 = fi.sections[\"sessions\"].create_section(name=\"recording.20210507.01\", type_=\"raw-data.ca-imaging\")\n",
+    "\n",
+    "sec_setup_A = ft.sections[\"microscope_station_A\"]\n",
+    "sec_session03.copy_section(sec_setup_A)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "File: name = metadata_import.nix\n",
+      "  sessions [undefined]\n",
+      "    recording.20210505.01 [raw-data.ca-imaging]\n",
+      "      microscope_station_A [hardware.microscopes]\n",
+      "          |- Manufacturer: ('Company A',)\n",
+      "          |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)\n",
+      "          |- pE LED intensity: ('20',)\n",
+      "    recording.20210506.01 [raw-data.ca-imaging]\n",
+      "      microscope_station_B [hardware.microscopes]\n",
+      "          |- Manufacturer: ('Company B',)\n",
+      "          |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)\n",
+      "          |- pE LED intensity: ('30',)\n",
+      "    recording.20210507.01 [raw-data.ca-imaging]\n",
+      "      microscope_station_A [hardware.microscopes]\n",
+      "          |- Manufacturer: ('Company A',)\n",
+      "          |- Objective: ('Pln Apo 40x/1.3 oil DIC II',)\n",
+      "          |- pE LED intensity: ('20',)\n"
+     ]
+    }
+   ],
+   "source": [
+    "fi.pprint()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ft.close()\n",
+    "fi.close()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We can now connect the Section describing our experiment directly to the MultiTag \n",
+    "#  that references both the raw as well as the analysed data.\n",
+    "\n",
+    "multi_tag = f.blocks['tag_examples'].multi_tags['tag_A']\n",
+    "multi_tag.metadata = f.sections['experiment_42']\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now when we look at the data via a MultiTag we can directly access all metadata that has been attached to it.\n",
+    "# E.g. get information about the subject the experiment was conducted with.\n",
+    "multi_tag.metadata.sections['subject'].props\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We can also attach the same Section to the raw DataArray itself e.g. when no MultTags have been used.\n",
+    "init_data = f.blocks['tag_examples'].data_arrays['membrane_voltage_A']\n",
+    "init_data.metadata = f.sections['experiment_42']\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# And we can also find it in reverse: we can select a Section and find all data, that are connected to it.\n",
+    "sec = f.sections['experiment_42']\n",
+    "\n",
+    "# Either via connected DataArrays.\n",
+    "sec.referring_data_arrays\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Or via connected MultiTags.\n",
+    "sec.referring_multi_tags\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# And finally we close our file.\n",
+    "f.close()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Try it out\n",
+    "\n",
+    "Now we move on to an actual exercise.\n",
+    "\n",
+    "The public repository https://gin.g-node.org/RDMcourse2020/demo-lecture-07 contains a Jupyter notebook \"2020_RDM_course_nix_exercise.ipynb\".\n",
+    "\n",
+    "Start it either\n",
+    "- locally if you can use Python and make sure all dependencies are installed.\n",
+    "- or use Binder if you cannot use Python locally. The repository is already set up for the use with Binder. Check the last lecture if you are unsure how to start the notebook using Binder.\n",
+    "\n",
+    "This repository further contains a folder called \"excercise\". It contains calcium imaging data and rough metadata about the recordings.\n",
+    "\n",
+    "The exercise is to\n",
+    "- read through the README.md and briefly familiarize yourself with the project and the data.\n",
+    "- load the raw data to the notebook. Ideally transfer the \"obj_substracted\" column from the data files (column 3) but it can be any other column as well.\n",
+    "- the \"time_elapsed\" column is roughly 100ms. If you want to you can use a SampledDimension with an interval of 100 which should be easier or try to include the real times as a RangeDimension.\n",
+    "- create a new NIX file and put the raw data traces into NIX DataArrays including labels and units - note that the signal is Flourescence with unit AU (arbitrary unit). \n",
+    "- plot data from these DataArrays.\n",
+    "- read through the metadata, try to put useful metadata into a NIX Section/Property structure and connect it to the DataArrays. Examples would be\n",
+    "  - original file names of raw data files.\n",
+    "  - species.\n",
+    "  - recording equipment.\n",
+    "\n",
+    "- identify and specify a region of interest via the used shift paradigm with start and extent and try to create a MultiTag connecting all three DataArrays via the same paradigm MultiTag.\n",
+    "\n",
+    "Alternatively you can also take some of your own data and try to put it into a NIX file along with some of your metadata."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}