{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Research Data Management in Neuroscience\n",
    "\n",
    "# Introduction to working with NIX\n",
    "\n",
    "                                    The Neuroscience Information eXchange format\n",
    "\n",
    "                                    Michael Sonntag\n",
    "\n",
    "                                    Department Biologie II\n",
    "                                    Ludwig-Maximilians-Universität München\n",
    "\n",
    "                                    Friday, 03 July, 2020\n",
    "\n",
    "\n",
    "![G-Node-logo.png](./resources/G-Node-logo.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For an online introduction to the usage of NIX please see \n",
    "- the [nixpy readthedocs page](https://nixpy.readthedocs.io) (Python implementation).\n",
    "- the [nixio readthedocs page](https://nixio.readthedocs.io) (C++ implementation)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook is set up to be used with Python 3(.6+). Also to properly run this notebook the following libraries need to be installed:\n",
    "- `pip install matplotlib`\n",
    "- `pip install numpy`\n",
    "- `pip install nixio==1.5.0b4`\n",
    "\n",
    "Note: nixio 1.5.0b4 is a beta release with many new exciting features of NIX. As of the time of the presentation (03.07.2020) these features have not made it into the main NIX release. If you are using this notebook at a later point in time, installing via `pip install nixio` should be enough."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Main requirements when storing data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. We want to be able to store n-dimensional data structures.\n",
    "2. The data structures must be self-explanatory; they must contain sufficient information to draw a basic plot of the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![https://nixio.readthedocs.io/en/latest/_images/regular_sampled.png](./resources/nix_regular_sampled.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Considering the simple plot above, we can list all information that is needed in order to reproduce it.\n",
    "\n",
    "- the data (voltage measurements)\n",
    "- the y-axis labeling, i.e. label (voltage) and unit (mV)\n",
    "- the x-axis labeling, i.e. label (time) and unit (s)\n",
    "- the x-position for each data point\n",
    "- a title/legend\n",
    "\n",
    "In most cases it would be inefficient to store x-, and y-position for each data point. The voltage measurements have been done in regular (time) intervals. Thus, we rather need to store\n",
    "- the measured values\n",
    "- a definition of the x-axis consisting of an offset\n",
    "- the sampling interval\n",
    "- a label\n",
    "- and a unit.\n",
    "\n",
    "This is exactly the approach chosen in NIX. For each dimension of the data, a dimension descriptor must be given. NIX defines three (and a half) dimension descriptors:\n",
    "\n",
    "- SampledDimension: used if a dimension is sampled at regular intervals.\n",
    "- RangeDimension: used if a dimension is sampled at irregular intervals.\n",
    "    - There is a special case of the RangeDimension, the AliasRangeDimension, which is used when e.g. event times are stored.\n",
    "- SetDimension: used for dimensions that represent categories rather than physical quantities.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some data to store"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can store any data we need to have it lying around somewhere. Lets re-create the example data for the figure we saw above and then see how we can store this data in a NIX file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets create some example data; we are using numpy.arange to create example data.\n",
    "# numpy.arange: Return evenly spaced values within a given interval.\n",
    "import numpy as np\n",
    "\n",
    "freq = 5.0;\n",
    "samples = 1000\n",
    "sample_interval = 0.001\n",
    "time = np.arange(samples)\n",
    "voltage = np.sin(2 * np.pi * time * freq/samples)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets quickly check how the data we will store actually looks like\n",
    "# The next line is jupyter notebook specific and will allow us to see plots. It only works with Python 3.\n",
    "# Compare the Jupyter/Python magic method reference from the last session.\n",
    "%matplotlib notebook\n",
    "\n",
    "import matplotlib.pyplot as plot\n",
    "\n",
    "plot.plot(time*sample_interval, voltage)\n",
    "plot.xlabel('Time [s]')\n",
    "plot.ylabel('Voltage [mV]')\n",
    "\n",
    "plot.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is perfect data, we would like to keep it and store it in a file. So lets persist this wonderful data in a NIX file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The DataArray"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `DataArray` is the central entity of the NIX data model. As almost all other NIX-entities it requires a `name` and a `type`. Both are not restricted but names must be unique within a `Block`. `type` information can be used to introduce semantic meaning and domain-specificity. Upon creation, a unique ID will be assigned to the `DataArray`.\n",
    "\n",
    "The `DataArray` stores the actual data together with `label` and `unit`. In addition, the `DataArray` needs a dimension descriptor for each dimension. The following code snippet shows how to create a `DataArray` and store data in it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nixio\n",
    "\n",
    "# First create a file we'll use to work with.\n",
    "\n",
    "# Files can be opened in FileMode \"ReadOnly\", \"ReadWrite\" and \"Overwrite\"\n",
    "# ReadOnly ... Opens an existing file for reading\n",
    "# ReadWrite ... Opens an existing file for editing or creates a new file\n",
    "# Overwrite ... Truncates and opens an existing file or creates a new file\n",
    "f = nixio.File.open('Tutorial.nix', nixio.FileMode.Overwrite)\n",
    "\n",
    "# Please note that NIX works on an open file and reads and writes directly from and to this file.\n",
    "# Always close the file using 'f.close()' when you are done.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The NIX data model\n",
    "![https://raw.githubusercontent.com/wiki/G-Node/nix/media/Nix_DataModel_v1.3.2a.png](./resources/nix_datamodel_v1.5.0.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see in the [NIX data model](https://github.com/G-Node/nix/wiki/Model-Definition), NIX files are hierarchically structured. Data is stored in `DataArrays`. `DataArrays` are contained in `Blocks`. When we want to create a `DataArray`, we need to create at least one `Block` first, that will contain the `DataArray`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets check the Blocks we currently have defined in our file; it should be emtpy\n",
    "f.blocks\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets see how we can create a Block in our file; we'll use the handy Python help function to get more information\n",
    "help(f.create_block)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# \"name\" and \"type\" of a Block can be used to filter and \n",
    "#   find our Blocks later on when the file contains more content\n",
    "block = f.create_block(name=\"basic_examples\", type_=\"examples\")\n",
    "\n",
    "# Please note at this point, that the 'name' of any NIX entity e.g. Blocks, DataArrays, etc. \n",
    "#   has to be unique since it can be used to find and return this exact entity via the 'name'.\n",
    "#   The 'type' can also be used to find entities, but it does not need to be unique. You can use \n",
    "#   'name' to uniquely identify a particular entity and use 'type' to find groups of related entities\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Great, we have created an empty Block.\n",
    "block\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# And this Block already resides within our file and is saved to disk.\n",
    "f.blocks\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are finally set up to put our data in our file!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First lets check how we can actually create a DataArray.\n",
    "help(block.create_data_array)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we create the DataArray with the voltage data created at the very beginning\n",
    "#  and within the Block created above. We also add the appropriate labels immediately.\n",
    "da = block.create_data_array(name=\"data_regular\", array_type=\"sine\", data=voltage)\n",
    "da.label = \"voltage\"\n",
    "da.unit = \"mV\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we will also add the appropriate Dimension to this DataArray, so it can be correctly \n",
    "#  interpreted for later plotting. We will look into the different Dimensions in a second.\n",
    "\n",
    "# Note that we always should add dimensions in the order x, y, z ... when thinking in plot terms.\n",
    "#  This is necessary to later properly interpret data without knowing the actual structure of a DataArray.\n",
    "\n",
    "# First we check how to properly create the Dimension we need.\n",
    "help(da.append_sampled_dimension)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets add the Dimension of our X axis to our DataArray.\n",
    "dim = da.append_sampled_dimension(sample_interval)\n",
    "dim.label = \"time\"\n",
    "dim.unit = \"s\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We also want to add a Dimension to our Y axis to make the DataArray consistent even \n",
    "#  if we do not add any additional annotations. We will see what a SetDimension is later on.\n",
    "dim_set = da.append_set_dimension()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the example shown above, the NIX library will figure out the dimensionality of the data, the **shape** of the data and its **type**. The data type and the dimensionality (i.e. the number of dimensions) are fixed once a `DataArray` has been created. The actual size of a `DataArray` can be changed during the life-time of the entity.\n",
    "\n",
    "In case more control is required, `DataArrays` can be created empty for later filling e.g. during data acquisition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now lets see if we can access our data and do something useful e.g. plot it.\n",
    "#  First we'll fetch all data from the DataArray.\n",
    "plot_data = f.blocks['basic_examples'].data_arrays['data_regular']\n",
    "plot_data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now lets check some of its content.\n",
    "plot_data[:5]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets also check the dimensionality of our data.\n",
    "plot_data.dimensions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Since we stored just the sampling rate and not every single tick via the second dimension \n",
    "#  we save quite a bit of space.\n",
    "dim = plot_data.dimensions[0]\n",
    "dim.sampling_interval\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# In comparison the original time array.\n",
    "time[:]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets plot all data from the NIX file using all information provided by the file\n",
    "y = plot_data[:]\n",
    "\n",
    "# The SampledDimension axis method applies its saved interval to a passed array\n",
    "#  to recreate the original time array.\n",
    "x = plot_data.dimensions[0].axis(y.shape[0])\n",
    "\n",
    "plot.figure(figsize=(10,5))\n",
    "plot.plot(x, y, '-')\n",
    "plot.xlabel(\"%s [%s]\" % (dim.label, dim.unit))\n",
    "plot.ylabel(\"%s [%s]\" % (plot_data.label, plot_data.unit))\n",
    "plot.title(\"%s/%s\" % (plot_data.name, plot_data.type))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saving irregularly sampled data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Phew, that was already nice. As you have seen in the example we dealt with regularly sampled data. What do we do if we have data that is not regularly sampled? As mentioned at the beginning, NIX supports\n",
    "- regularly sampled data\n",
    "- irregularly sampled data\n",
    "- set (event) data\n",
    "- one dimensional data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets create some irregularly sampled data and store it\n",
    "duration = 1.0\n",
    "interval = 0.02\n",
    "\n",
    "# Create random time points.\n",
    "time_points = np.around(np.cumsum(np.random.poisson(interval*1000, int(1.5*duration/interval)))/1000., 3)\n",
    "time_points = time_points[time_points <= duration]\n",
    "print(\"Time points\\n%s\" % time_points)\n",
    "\n",
    "# Create example data values for every timepoint.\n",
    "data_points = np.sin(5 * np.arange(0, time_points[-1] * 2 * np.pi, 0.001))\n",
    "data_points = data_points[np.asarray(time_points / 0.001 * 2 * np.pi, dtype=int)]\n",
    "print(\"Data points\\n%s\" % data_points)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the Block we want to save this data in.\n",
    "block\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a DataArray with our data points.\n",
    "data_irr = block.create_data_array(name=\"data_irregular\", array_type=\"sine\", data=data_points)\n",
    "data_irr.label = \"Voltage\"\n",
    "data_irr.unit = \"mV\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets add a RangeDimension as our x dimension and save the irregular time points along with it.\n",
    "dim = data_irr.append_range_dimension(time_points)\n",
    "dim.label = \"time\"\n",
    "dim.unit = \"s\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# And a SetDimension as our y dimension\n",
    "dim_set = data_irr.append_set_dimension()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets plot our data again\n",
    "plot_data = f.blocks['basic_examples'].data_arrays['data_irregular']\n",
    "\n",
    "x_dim = plot_data.dimensions[0]\n",
    "x = list(x_dim.ticks)\n",
    "\n",
    "y = plot_data[:]\n",
    "\n",
    "plot.figure(figsize=(10,5))\n",
    "plot.plot(x, y, '-o')\n",
    "plot.xlabel(\"%s [%s]\" % (x_dim.label, x_dim.unit))\n",
    "plot.ylabel(\"%s [%s]\" % (plot_data.label, plot_data.unit))\n",
    "plot.title(\"%s/%s\" % (plot_data.name, plot_data.type))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saving event data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Next we will store some basic set or \"event\" data.\n",
    "data_points = [281, 293, 271, 300, 285, 150]\n",
    "\n",
    "data_event = block.create_data_array(name=\"data_event\", array_type=\"event\", data=data_points)\n",
    "data_event.label = \"temperature\"\n",
    "data_event.unit = \"K\"\n",
    "\n",
    "# Add x dimension\n",
    "dim = data_event.append_set_dimension()\n",
    "dim.labels = [\"Response A\", \"Response B\", \"Response C\", \"Response D\", \"Response E\", \"Response F\"]\n",
    "\n",
    "# Add y dimension\n",
    "dim_set = data_event.append_set_dimension()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# And lets see how we can plot this\n",
    "plot_data = f.blocks['basic_examples'].data_arrays['data_event']\n",
    "\n",
    "x_dim = plot_data.dimensions[0]\n",
    "y_data = plot_data[:]\n",
    "# Create an index for plotting in y from the length of the saved data\n",
    "index = np.arange(len(y_data))\n",
    "\n",
    "# Set up a bar plot\n",
    "plot.figure(figsize=(10,5))\n",
    "plot.bar(index, y_data)\n",
    "\n",
    "# Add labels and title\n",
    "plot.xticks(index, x_dim.labels)\n",
    "plot.ylabel(\"%s [%s]\" % (plot_data.label, plot_data.unit))\n",
    "plot.title(\"%s/%s\" % (plot_data.name, plot_data.type))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multiple related signals in one DataArray - Multidimensional data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we know how to save two dimensional data in a `DataArray`. Due to the ability to add dimensions, NIX also supports multidimensional data and is able to properly describe it. As an example one could save 2D images including their different color channels into one `DataArray`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another use case would be to store different time series data together in one `DataArray`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets create data for two related time series and store them together.\n",
    "# ---- MOCK DATA; the code can be safely ignored --------\n",
    "freq = 5.0;\n",
    "samples = 1000\n",
    "sample_interval = 0.001\n",
    "time = np.arange(samples)\n",
    "voltage_trace_A = np.sin(2 * np.pi * time * freq/samples)\n",
    "voltage_trace_B = np.cos(2 * np.pi * time * freq/samples)\n",
    "\n",
    "# We use a numpy function that will stack both signal\n",
    "voltage_stacked = np.vstack((voltage_trace_A, voltage_trace_B))\n",
    "# ---- MOCK DATA end --------\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets create a new DataArray with our multi-dimensional data,\n",
    "data_related = block.create_data_array(name=\"data_multi_dimension\", array_type=\"multi\", data=voltage_stacked)\n",
    "data_related.label = \"voltage\"\n",
    "data_related.unit = \"mV\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# To properly describe the DataArray, we need to add two dimensions.\n",
    "\n",
    "# First we describe the depth of the stacked arrays.\n",
    "dim_set = data_related.append_set_dimension()\n",
    "\n",
    "# Take care to add the lables in the order the arrays were stacked above.\n",
    "dim_set.labels = [\"Trace_A\", \"Trace_B\"]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Next we add the second dimension that is common to both stacked arrays of data and describes time.\n",
    "dim_sample = data_related.append_sampled_dimension(sample_interval)\n",
    "dim_sample.label = \"time\"\n",
    "dim_sample.unit = \"s\"\n",
    "\n",
    "# And finally add the y dimension.\n",
    "dim_set = data_related.append_set_dimension()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets harvest the fruits of our labour.\n",
    "plot_data = f.blocks['basic_examples'].data_arrays['data_multi_dimension']\n",
    "\n",
    "# Fetch the descriptive dimensions.\n",
    "dim_set = plot_data.dimensions[0]\n",
    "dim_sampled = plot_data.dimensions[1]\n",
    "\n",
    "# We need to know the dimension of the x-axis, so we compute the timepoints\n",
    "#  from one of the stacked arrays and the SampledDimension interval.\n",
    "data_points_A = plot_data[0, :]\n",
    "time_points = dim_sampled.axis(data_points_A.shape[0])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "plot.figure(figsize=(10,5))\n",
    "\n",
    "# Now we add as many plots as we have set dimensions\n",
    "for i, label in enumerate(dim_set.labels):\n",
    "    plot.plot(time_points, plot_data[i, :], label=label)\n",
    "\n",
    "plot.xlabel(\"%s [%s]\" % (dim_sampled.label, dim_sampled.unit))\n",
    "plot.ylabel(\"%s [%s]\" % (plot_data.label, plot_data.unit))\n",
    "plot.title(\"%s/%s\" % (plot_data.name, plot_data.type))\n",
    "plot.legend()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What we have seen so far:\n",
    "- we can save different `DataArrays` that belong to the same experiment in one file in a structured fashion.\n",
    "- we can describe and save different kinds of data to file.\n",
    "- we can add labels and units directly to the data.\n",
    "- we can save multidimensional data.\n",
    "- we can save a bit of space in case of sampled data.\n",
    "- we can better understand the dimensionality of the stored data since we spell out the kind of dimensions which\n",
    "  makes it easier to interpret it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tagging and working with multiple analysis steps in the same file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "NIX provides additional features to save analyzed data alongside raw data in a meaningfull way.\n",
    "- Tag regions of interest in a `DataArray`.\n",
    "- Use the same Tag to connect multiple related `DataArrays` e.g. with Multi-ElectrodeArrays.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `Tag` and the `MultiTag` features of NIX are designed to tag single or multiple points or regions of interest respectively and link different sets of data in a meaningful way.\n",
    "\n",
    "One `Tag` can point to several `DataArrays` at once.\n",
    "\n",
    "The following figure illustrates how a `MultiTag` links two `DataArrays` to create a new construct."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![https://nixio.readthedocs.io/en/latest/_images/mtag_concept.png](./resources/nix_mtag_concept.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let us create a new Block to illustrate tagged data\n",
    "block_tag = f.create_block(name=\"tag_examples\", type_=\"examples\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f.blocks\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Referencing a single point or region in a DataArray"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To reference only a single point or region, we can use a NIX `Tag`. The `Tag` is a simpler form of the `MultiTag` that we will cover in a moment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We will create some more elaborate example data to make a point.\n",
    "# For this we need some equally elaborate code, which can be safely ignored.\n",
    "\n",
    "# ---- MOCK CODE AND DATA; the code can be safely ignored --------\n",
    "\n",
    "class LIF(object):\n",
    "    def __init__(self, stepsize=0.0001, offset=1.6, tau_m=0.025, tau_a=0.02, da=0.0, D=3.5):\n",
    "        self.stepsize = stepsize  # simulation stepsize [s]\n",
    "        self.offset = offset  # offset curent [nA]\n",
    "        self.tau_m = tau_m  # membrane time_constant [s]\n",
    "        self.tau_a = tau_a  # adaptation time_constant [s]\n",
    "        self.da = da  # increment in adaptation current [nA]\n",
    "        self.D = D  # noise intensity\n",
    "        self.v_threshold = 1.0  # spiking threshold\n",
    "        self.v_reset = 0.0  # reset voltage after spiking\n",
    "        self.i_a = 0.0  # current adaptation current\n",
    "        self.v = self.v_reset  # current membrane voltage\n",
    "        self.t = 0.0  # current time [s]\n",
    "        self.membrane_voltage = []\n",
    "        self.spike_times = []\n",
    "\n",
    "    def _reset(self):\n",
    "        self.i_a = 0.0\n",
    "        self.v = self.v_reset\n",
    "        self.t = 0.0\n",
    "        self.membrane_voltage = []\n",
    "        self.spike_times = []\n",
    "\n",
    "    def _lif(self, stimulus, noise):\n",
    "        \"\"\"\n",
    "        euler solution of the membrane equation with adaptation current and noise\n",
    "        \"\"\"\n",
    "        self.i_a -= self.i_a - self.stepsize/self.tau_a * (self.i_a)\n",
    "        self.v += self.stepsize * ( -self.v + stimulus + noise + self.offset - self.i_a)/self.tau_m;\n",
    "        self.membrane_voltage.append(self.v)\n",
    "\n",
    "    def _next(self, stimulus):\n",
    "        \"\"\"\n",
    "        working horse which delegates to the euler and gets the spike times\n",
    "        \"\"\"\n",
    "        noise = self.D * (float(np.random.randn() % 10000) - 5000.0)/10000\n",
    "        self._lif(stimulus, noise)\n",
    "        self.t += self.stepsize\n",
    "        if self.v > self.v_threshold and len(self.membrane_voltage) > 1:\n",
    "            self.v = self.v_reset\n",
    "            self.membrane_voltage[len(self.membrane_voltage)-1] = 2.0\n",
    "            self.spike_times.append(self.t)\n",
    "            self.i_a += self.da\n",
    "\n",
    "    def run_const_stim(self, steps, stimulus):\n",
    "        \"\"\"\n",
    "        lif simulation with constant stimulus.\n",
    "        \"\"\"\n",
    "        self._reset()\n",
    "        for i in range(steps):\n",
    "            self._next(stimulus);\n",
    "        time = np.arange(len(self.membrane_voltage))*self.stepsize\n",
    "        return time, np.array(self.membrane_voltage), np.array(self.spike_times)\n",
    "\n",
    "    def run_stimulus(self, stimulus):\n",
    "        \"\"\"\n",
    "        lif simulation with a predefined stimulus trace.\n",
    "        \"\"\"\n",
    "        self._reset()\n",
    "        for s in stimulus:\n",
    "            self._next(s);\n",
    "        time = np.arange(len(self.membrane_voltage))*self.stepsize\n",
    "        return time, np.array(self.membrane_voltage), np.array(self.spike_times)\n",
    "\n",
    "    def __str__(self):\n",
    "        out = '\\n'.join([\"stepsize: \\t\" + str(self.stepsize),\n",
    "                         \"offset:\\t\\t\" + str(self.offset),\n",
    "                         \"tau_m:\\t\\t\" + str(self.tau_m),\n",
    "                         \"tau_a:\\t\\t\" + str(self.tau_a),\n",
    "                         \"da:\\t\\t\" + str(self.da),\n",
    "                         \"D:\\t\\t\" + str(self.D),\n",
    "                         \"v_threshold:\\t\" + str(self.v_threshold),\n",
    "                         \"v_reset:\\t\" + str(self.v_reset)])\n",
    "        return out\n",
    "\n",
    "    def __repr__(self):\n",
    "        return self.__str__()\n",
    "\n",
    "lif_model = LIF()\n",
    "time, voltage, spike_times = lif_model.run_const_stim(10000, 0.005)\n",
    "    \n",
    "# ---- MOCK CODE AND DATA end --------\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This code has created\n",
    "#  - some mock raw membrane voltage traces\n",
    "#  - some spike times that represent the results of an analysis that has been run on the raw data.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is what our time data looks like.\n",
    "time[:10]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is what our voltage data looks like.\n",
    "voltage[:10]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Our assumption is that we analysed the voltage traces and identified times where neurons where spiking.\n",
    "# The data for these spike times are found in the third mock data and look like this.\n",
    "spike_times\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# With the mock raw membrane voltage traces we now create a new DataArray on our Block.\n",
    "data = block_tag.create_data_array(name=\"membrane_voltage_A\", array_type=\"regular_sampled\", data=voltage)\n",
    "data.label = \"membrane_voltage\"\n",
    "data.unit = \"mV\"\n",
    "\n",
    "# As we are used to by now, we add the time dimension as a sampled dimension with the sample interval.\n",
    "dim = data.append_sampled_dimension(time[1] - time[0])\n",
    "dim.label = \"time\"\n",
    "dim.unit = \"s\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we want to store the data from our analysis step: the identified spike times.\n",
    "#  We store them in a separate DataArray on the same Block, right next to our raw data.\n",
    "spike_data = block_tag.create_data_array(name=\"spike_times_A\", array_type=\"set\", data=spike_times)\n",
    "\n",
    "# The analysed data set requires the same dimensionality as the raw data set; otherwise it cannot be linked\n",
    "# via a Tag or MultiTag. We add two dimensions, they don't need to contain data, since it is \n",
    "# assumed that the analysed data will map to the x-axis of the RAW DATA.\n",
    "spike_data.append_set_dimension()\n",
    "spike_data.append_set_dimension()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We want to make sure, that anyone using this file will know that \"spike_times_A\" \n",
    "#   were derived from the DataArray \"membrane_voltage_A\".\n",
    "# We can do that by connecting them via a \"MultiTag\"\n",
    "\n",
    "# We first create the MultiTag on the same Block right next to our two DataArrays. Lets see how we can do that:\n",
    "help(block_tag.create_multi_tag)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We create the MultiTag using the derived spike data.\n",
    "multi_tag = block_tag.create_multi_tag(name=\"tag_A\", type_=\"spike_times\", positions=spike_data)\n",
    "\n",
    "# Now we hook the spike data up to the raw data via the MultiTags 'references method'.\n",
    "multi_tag.references.append(data)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# And now we see how these two data sets can be plotted together.\n",
    "# To interpret and plot tagged data, we only need the Tag, \n",
    "#  we do not even need to know the DataArrays themselves.\n",
    "plot_tag = f.blocks['tag_examples'].multi_tags['tag_A']\n",
    "\n",
    "# We fetch the raw data via the MultiTag. Note that the method \"plot_tag.references\"\n",
    "#  returns a list since a Tag could reference multiple DataArrays.\n",
    "init_data = plot_tag.references[0]\n",
    "\n",
    "# We fetch the spike times from the MultiTag as well.\n",
    "spike_times = plot_tag.positions\n",
    "\n",
    "# Now we prepare both raw and analysed data side by side for plotting.\n",
    "dim_sampled = init_data.dimensions[0] # We again reconstruct the time axis from the first DataArray.\n",
    "time_points = dim_sampled.axis(init_data.shape[0])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# We prepare a plot with the raw data and a scatter plot with the analysed data on top of it.\n",
    "plot.figure(figsize=(10,5))\n",
    "plot.plot(time_points, init_data[:], label=init_data.name)\n",
    "plot.scatter(spike_times[:], np.ones(spike_times[:].shape)*np.max(init_data), color='red', label=spike_times.name)\n",
    "\n",
    "# And properly label everything with information from the DataArrays and Dimensions.\n",
    "plot.xlabel(\"%s [%s]\" % (dim_sampled.label, dim_sampled.unit))\n",
    "plot.ylabel(\"%s [%s]\" % (init_data.label, init_data.unit))\n",
    "plot.title(\"%s/%s\" % (plot_data.name, plot_data.type))\n",
    "plot.legend()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can extract information from multiple steps of analysis via `MultiTags` and are able to plot data and analyses data without having to know or directly access the `DataArrays` that contain the underlying data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data and data annotation in the same file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "NIX does not only allow to save initial data and analysed data within in the same file. It also allows to create structured annotations of the experiments that were conducted and connects this information directly to the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metadata in NIX files is stored in the [odML format](https://g-node.github.io/python-odml) and is saved side by side with the actual \"DataTree\" in a \"MetadataTree\" but can easily be connected to Data in the DataTree."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "odML is a hierarchically structured data format, that provides grouping in nestable `Sections` and stores information in `Property`-`Value` pairs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The odml data model\n",
    "![](./resources/nix_odML_model_simplified.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let us annotate a DataArray of our last example.\n",
    "\n",
    "# As we can see, we have not stored any metadata in our current file yet.\n",
    "f.sections\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets check how we can create a new Section.\n",
    "help(f.create_section)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First we need to create a Section that can hold our annotations.\n",
    "section = f.create_section(name=\"experiment_42\", type_=\"project_AB\")\n",
    "\n",
    "f.sections\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This Section can hold further Sections as well as Properties.\n",
    "section.sections\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "section.props"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets store additional information about the raw data of our MultiTag example.\n",
    "\n",
    "# We want to add information about the subject that was used in the experiment.\n",
    "sub_sec = section.create_section(name=\"subject\", type_=\"experiment_42\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets add some Properties to this Section\n",
    "help(sub_sec.create_property)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We'll add information about subjectID, subject species and subject age. \n",
    "prop = sub_sec.create_property(name=\"subjectID\", values_or_dtype=\"78376446-f096-47b9-8bfe-ce1eb43a48dc\")\n",
    "prop = sub_sec.create_property(name=\"species\", values_or_dtype=\"Mus Musculus\")\n",
    "prop = sub_sec.create_property(name=\"age\", values_or_dtype=\"4\")\n",
    "prop.unit = \"weeks\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets check what we have so far at the root of the file.\n",
    "f.sections\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We list all Sections that our main Section documenting our \"tag_example\" holds.\n",
    "f.sections['experiment_42'].sections\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We access all Properties of the subsection containing subject related information.\n",
    "f.sections['experiment_42'].sections['subject'].props\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can now connect the Section describing our experiment directly to the MultiTag \n",
    "#  that references both the raw as well as the analysed data.\n",
    "\n",
    "multi_tag = f.blocks['tag_examples'].multi_tags['tag_A']\n",
    "multi_tag.metadata = f.sections['experiment_42']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now when we look at the data via a MultiTag we can directly access all metadata that has been attached to it.\n",
    "# E.g. get information about the subject the experiment was conducted with.\n",
    "multi_tag.metadata.sections['subject'].props\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can also attach the same Section to the raw DataArray itself e.g. when no MultTags have been used.\n",
    "init_data = f.blocks['tag_examples'].data_arrays['membrane_voltage_A']\n",
    "init_data.metadata = f.sections['experiment_42']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# And we can also find it in reverse: we can select a Section and find all data, that are connected to it.\n",
    "sec = f.sections['experiment_42']\n",
    "\n",
    "# Either via connected DataArrays.\n",
    "sec.referring_data_arrays\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Or via connected MultiTags.\n",
    "sec.referring_multi_tags\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# And finally we close our file.\n",
    "f.close()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Try it out\n",
    "\n",
    "Now we move on to an actual exercise.\n",
    "\n",
    "The public repository https://gin.g-node.org/RDMcourse2020/demo-lecture-07 contains a Jupyter notebook \"2020_RDM_course_nix_exercise.ipynb\".\n",
    "\n",
    "Start it either\n",
    "- locally if you can use Python and make sure all dependencies are installed.\n",
    "- or use Binder if you cannot use Python locally. The repository is already set up for the use with Binder. Check the last lecture if you are unsure how to start the notebook using Binder.\n",
    "\n",
    "This repository further contains a folder called \"excercise\". It contains calcium imaging data and rough metadata about the recordings.\n",
    "\n",
    "The exercise is to\n",
    "- read through the README.md and briefly familiarize yourself with the project and the data.\n",
    "- load the raw data to the notebook. Ideally transfer the \"obj_substracted\" column from the data files (column 3) but it can be any other column as well.\n",
    "- the \"time_elapsed\" column is roughly 100ms. If you want to you can use a SampledDimension with an interval of 100 which should be easier or try to include the real times as a RangeDimension.\n",
    "- create a new NIX file and put the raw data traces into NIX DataArrays including labels and units - note that the signal is Flourescence with unit AU (arbitrary unit). \n",
    "- plot data from these DataArrays.\n",
    "- read through the metadata, try to put useful metadata into a NIX Section/Property structure and connect it to the DataArrays. Examples would be\n",
    "  - original file names of raw data files.\n",
    "  - species.\n",
    "  - recording equipment.\n",
    "\n",
    "- identify and specify a region of interest via the used shift paradigm with start and extent and try to create a MultiTag connecting all three DataArrays via the same paradigm MultiTag.\n",
    "\n",
    "Alternatively you can also take some of your own data and try to put it into a NIX file along with some of your metadata."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}