{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5e9a73d4-ccdb-4971-b8cb-34ffaee3cac6",
   "metadata": {},
   "source": [
    "# **---  `DatasetEphy`: container of neurophysiological data ---**\n",
    "---\n",
    "In this tutorial we're mainly going to see how to define a container hosting the neurophysiological data of one or several subjects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6437bce7-384e-4779-b04d-6ca6a1b8be29",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import numpy as np\n",
    "import xarray as xr\n",
    "import pandas as pd\n",
    "\n",
    "from mne import EpochsArray, create_info\n",
    "\n",
    "from frites.dataset import DatasetEphy\n",
    "\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f847437b-9ad9-4a48-85d6-2c4bb9aa605b",
   "metadata": {},
   "source": [
    "---\n",
    "# **--- ROOT PATH ---**\n",
    "\n",
    "<div class=\"alert alert-info\"><p>\n",
    "\n",
    "Define the path to where the data are located !\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f495bfb-ebcb-475f-97e9-d896eee27d9b",
   "metadata": {},
   "outputs": [],
   "source": "ROOT = '../dataset/'"
  },
  {
   "cell_type": "markdown",
   "id": "f0b2c43d-a26b-42f5-b246-fc9e2311457f",
   "metadata": {
    "tags": []
   },
   "source": [
    "---\n",
    "# **--- Structure of the `DatasetEphy` ---**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed6c707a-705c-46ec-ab92-ef3746e19839",
   "metadata": {},
   "source": [
    "## Signature of the `DatasetEphy`\n",
    "\n",
    "**`DatasetEphy(x, y='...', times='...', roi='...')`** where :\n",
    "---\n",
    "\n",
    "1. **`x` = The brain data**\n",
    "    * **[Description] :** list containing the brain data of one or multiple subjects / sessions\n",
    "    * **[Sizes] :** each element of the list is the epoched brain data. For example, for two subjects it would be : $[(n_{epochs}, n_{channels}, n_{times})_{subject_1}, (n_{epochs}, n_{channels}, n_{times})_{subject_2}]$\n",
    "2. **`y` = The external variable**\n",
    "    * **[Description] :** list containing the external variable (e.g. stimulus type, behavioral model, reaction time etc.) of one or multiple subjects / sessions\n",
    "    * **[Goal] :** the goal is then to ask if the variables `x` (brain data) and the external variable `y` shared information. Said differently, if there's differences in the brain activity according to the stimulus (\\~decoding) or if the brain data correlates with the behavioral model or reaction time (\\~regression)\n",
    "    * **[Sizes] :** each element of the list is the external variable of a single subject. For example, the reaction time for two subjects : $[RT_{subject_1}, RT_{subject_2}] = [(n_{epochs},)_{subject_1}, (n_{epochs},)_{subject_2}]$\n",
    "3. **`times` = The time vector**\n",
    "    * **[Description] :** a single time vector\n",
    "    * **[Alternatives] :** in fact it can be anything, like frequencies (e.g. PSD)\n",
    "4. **`roi` = spatial dimension**\n",
    "    * **[Description] :** list containing a spatial description of each subject (e.g. the name of the channels for i/M/EEG, the name of brain regions etc.)\n",
    "    * **[Sizes] :** each element of the list describes the channels / ROI of a single subject. For example, still for two subjects : $[ROI_{subject_1}, ROI_{subject_2}] = [(n_{channels},)_{subject_1}, (n_{channels},)_{subject_2}]$\n",
    "    \n",
    "## Good to know\n",
    "\n",
    "* It works for one / many subjects\n",
    "* It works for one / many sessions of a single subject (or animals like monkeys)\n",
    "* All subjects can have a different number of trials (or epochs) such as a different number of channels. The channels can also be located in different brain regions\n",
    "* However, the time vector **should be the same across all subjects or sessions**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "198995b3-3365-4884-819c-0eab0302b495",
   "metadata": {},
   "source": [
    "---\n",
    "# **0 - Functions**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e600275-4fce-4022-90f0-8779c2fd534a",
   "metadata": {},
   "outputs": [],
   "source": [
    "###############################################################################\n",
    "###############################################################################\n",
    "#                 Load the data of a single subject\n",
    "###############################################################################\n",
    "###############################################################################\n",
    "\n",
    "def load_ss(subject_nb):\n",
    "    \"\"\"Load the data of a single subject.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    subject_nb : int\n",
    "        Subject number [0, 12]\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    hga : xarray.DataArray\n",
    "        Xarray containing the high-gamma activity\n",
    "    anat : pandas.DataFrame\n",
    "        Table containing the anatomical informations\n",
    "    beh : pandas.DataFrame\n",
    "        Table containing the behavioral informations\n",
    "    \"\"\"\n",
    "    print(f\"Loading the data of subject #{subject_nb}\")\n",
    "\n",
    "    # load the high-gamma activity\n",
    "    file_hga = os.path.join(ROOT, 'hga', f'hga_s-{subject_nb}.nc')\n",
    "    hga = xr.load_dataarray(file_hga)\n",
    "\n",
    "    # load the name of the brain regions\n",
    "    file_anat = os.path.join(ROOT, 'anat', f'anat_s-{subject_nb}.xlsx')\n",
    "    anat = pd.read_excel(file_anat)\n",
    "\n",
    "    # load the behavior\n",
    "    file_beh = os.path.join(ROOT, 'beh', f'beh_s-{subject_nb}.xlsx')\n",
    "    beh = pd.read_excel(file_beh)\n",
    "    \n",
    "    return hga, anat, beh\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab7ea866-009c-451c-853b-8dc78017eb6f",
   "metadata": {},
   "source": [
    "---\n",
    "# **1 - Define a `DatasetEphy` using Xarray**\n",
    "## 1.1 Load the data of a single subject"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbfa1b0b-545e-4be6-b8e7-8322b6dba0ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the data of subject #2\n",
    "subject_nb = 2\n",
    "hga, anat, beh = load_ss(subject_nb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce9a6045-3dc3-4da9-88fc-2a6df153600a",
   "metadata": {},
   "source": [
    "## 1.2 Define the `DatasetEphy`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5aae800-6ec0-4027-8ff8-fe8275b4cd92",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = DatasetEphy([hga.copy()])\n",
    "\n",
    "# checkout the warnings !!\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4a4e10c-8d5c-42bf-b674-8d818eb614f8",
   "metadata": {},
   "source": [
    "## 1.3 Define the `DatasetEphy` with dimensions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b394f2cd-00fc-46b5-8bdd-00a9803459bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "hga"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34537838-6781-4ecd-b300-b2d5a8e83004",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "When defining the `DatasetEphy`, we can specify the name of the dimensions\n",
    "for the time (`times=`) and for the space (`roi`)\n",
    "\"\"\"\n",
    "ds = DatasetEphy([hga.copy()], times='times', roi='channels')\n",
    "ds\n",
    "\n",
    "# - Checkout the html output !\n",
    "# - \"Supported MI definition\" is none? wtf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e3320d3-417e-45ad-adcf-b610a0a292ce",
   "metadata": {},
   "source": [
    "## 1.4 Specify the stimulus type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e0fb1ac-c4ed-416d-a398-9d725061c219",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Each trial contain the information of the stimulus type !\n",
    "\n",
    "* -2 = \"-1€\"\n",
    "* -1 = \"-0€\"\n",
    "* +1 = \"+0€\"\n",
    "* +2 = \"+1€\"\n",
    "\"\"\"\n",
    "hga['trials']\n",
    "\n",
    "\"\"\"We can specify in the `DatasetEphy` the name of the dimension containing the\n",
    "stimulus type\n",
    "\"\"\"\n",
    "ds = DatasetEphy([hga.copy()], y='trials', times='times', roi='channels')\n",
    "# ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf737b0a-2f59-48d9-ac6a-bc24cfb39586",
   "metadata": {},
   "source": [
    "## 1.5 Define a `DatasetEphy` for all of the subjects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d11f1ec-5192-46f6-9a43-58e884c91147",
   "metadata": {},
   "outputs": [],
   "source": [
    "# start by loading the data coming from multiple subjects\n",
    "hga = []\n",
    "for n_s in range(12):  # 12 subjects in total\n",
    "    # load the data of a single subject\n",
    "    _hga, _, _ = load_ss(n_s)\n",
    "    \n",
    "    # append the data to the list\n",
    "    hga.append(_hga)\n",
    "\n",
    "# create the dataset ephy\n",
    "ds = DatasetEphy(hga, y='trials', times='times', roi='channels')\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a0e3613-3b82-48ce-bf1e-fd6fc8680600",
   "metadata": {},
   "source": [
    "# **2 - Define a `DatasetEphy` with NumPy**\n",
    "## 2.1 Single subject"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63d3e712-7eab-4dc8-9b2e-9b931c485df6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the data of subject #2\n",
    "subject_nb = 2\n",
    "hga, anat, beh = load_ss(subject_nb)\n",
    "\n",
    "# get the contact names\n",
    "channels = hga['channels'].data\n",
    "\n",
    "# get the time vector\n",
    "times = hga['times'].data\n",
    "\n",
    "# get the stimulus types\n",
    "stim = hga['trials'].data\n",
    "\n",
    "# get the hga as a NumPy array\n",
    "hga_np = hga.data\n",
    "\n",
    "# create the dataset ephy\n",
    "ds = DatasetEphy(\n",
    "    [hga_np], y=[stim], times=times, roi=[channels])\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbfea23f-45f6-48fa-89ea-5473115441cf",
   "metadata": {},
   "source": [
    "## 2.2 Multi-subjects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fd5f30f-2f37-4d8c-9f12-be27cfa8db77",
   "metadata": {},
   "outputs": [],
   "source": [
    "# start by loading the data coming from multiple subjects\n",
    "hga, channels, stim = [], [], []\n",
    "for n_s in range(12):  # 12 subjects in total\n",
    "    # load the data of a single subject\n",
    "    _hga, _, _ = load_ss(n_s)\n",
    "    \n",
    "    # get the contact names\n",
    "    _channels = _hga['channels'].data\n",
    "\n",
    "    # get the time vector\n",
    "    times = _hga['times'].data\n",
    "\n",
    "    # get the stimulus types\n",
    "    _stim = _hga['trials'].data\n",
    "\n",
    "    # get the hga as a NumPy array\n",
    "    hga_np = _hga.data\n",
    "\n",
    "    # append the data to the list\n",
    "    hga.append(_hga)\n",
    "    channels.append(_channels)\n",
    "    stim.append(_stim)\n",
    "\n",
    "# create the dataset ephy\n",
    "# ds = DatasetEphy(hga, y=stim, times=times, roi=channels)\n",
    "# ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d838e2d5-d7fe-4372-bb29-52cb21089d09",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9317b91f-f777-40ad-9ad2-13d609311949",
   "metadata": {},
   "source": [
    "# **3 - Define a `DatasetEphy` using MNE-Python objects**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e520e749-da90-44db-bb77-5028000fb36b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# _____________________________ CONVERSION TO MNE _____________________________\n",
    "# load the data of subject #2\n",
    "subject_nb = 2\n",
    "hga, anat, beh = load_ss(subject_nb)\n",
    "\n",
    "# create the information used by MNE\n",
    "info = create_info(\n",
    "    hga['channels'].data.tolist(), hga.attrs['sfreq'], ch_types='seeg'\n",
    ")\n",
    "\n",
    "# create the epoch\n",
    "epoch = EpochsArray(hga.data, info, tmin=hga['times'].data[0])\n",
    "\n",
    "# ________________________________ DATASET EPHY _______________________________\n",
    "ds = DatasetEphy([epoch], y=[hga['trials'].data])\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68b86844-e9ee-4b7c-8aa8-84df9514f05c",
   "metadata": {},
   "source": [
    "---\n",
    "# **---- Test yourself ! ----**\n",
    "## **1. Load fresh data !**\n",
    "\n",
    "<div class=\"alert alert-warning\"><p>\n",
    "\n",
    "**[Instructions]** Load the data, behavior and anatomy of subject #0\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "194ac176-e297-44dc-85fd-c4a605fd4241",
   "metadata": {},
   "outputs": [],
   "source": [
    "# write your answer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c8f370f-d756-48ea-acfb-d42ee9bdf7ea",
   "metadata": {},
   "source": [
    "## **2. `DatasetEphy` with the Prediction Error**\n",
    "### 2.1 Plot the Prediction error\n",
    "\n",
    "In the behavioral table (output `beh` of the function `load_ss()`) there's a column called `'PE'` which stands for `prediction error`. This model is estimated using the behavior of the subject. It represents the mismatch between a prior expectation of the future outcome and the actual outcome obtained (i.e. $Outcome_{expected} - Outcome_{obtained}$).\n",
    "    \n",
    "Said differently, it's kind of a learning curve estimated trial after trial, where at the beginning the subject is not really good at predicting the outcome he's going to obtained (PE very high) but at the end, he nails it like a pro (PE at 0) ! \n",
    "\n",
    "The table `beh` also contains a column `block` indicating the session (or block).\n",
    "\n",
    "<div class=\"alert alert-warning\"><p>\n",
    "\n",
    "**[Instructions]**\n",
    "    \n",
    "Let's start \"easy\" by plotting the PE, only for the first block. You should see it decreasing to almost 0 !\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f485d40-dbbb-42ad-952d-6f25723aa278",
   "metadata": {},
   "outputs": [],
   "source": [
    "# write your answer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "609e012d-10de-47e7-9467-48e700a22fe4",
   "metadata": {},
   "source": [
    "### 2.2 Set the Prediction error as the trial coordinate\n",
    "\n",
    "One typical question you can ask using the PE is **\"what are the brain regions for which the brain data are modulated accordingly to the PE?\"** or said differently, what are the brain regions involved during learning.\n",
    "\n",
    "The first step to answer this question is to tell to the `DatasetEphy` that the external variable `y` is going to be the PE.\n",
    "\n",
    "<div class=\"alert alert-warning\"><p>\n",
    "\n",
    "**[Instructions]**\n",
    "    \n",
    "1. Rename the dimension `trials` in the hga `DataArray` by `pe`\n",
    "2. Fill this dimension `pe` with the values of the PE\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf9199b7-520e-49ad-a048-1cc0503c76f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# write your answer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fee791bc-5183-472f-99d0-7b1b3b492a00",
   "metadata": {},
   "source": [
    "### 2.3 Define a `DatasetEphy` for a single subject\n",
    "\n",
    "<div class=\"alert alert-warning\"><p>\n",
    "\n",
    "**[Instructions]**\n",
    "    \n",
    "Define a `DatasetEphy` for the subject #0 you just loaded and specify the coordinates of the PE (`y`), the time (`times`) and the spatial dimension (`roi`)\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b45fabe-e9f7-44ad-8f75-a9e769f37fb7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# write your answer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "827fb122-ed5c-4ac3-869c-f58b452f0979",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = input('What is the `Supported MI definition`?')\n",
    "\n",
    "if a == 'I(x; y (continuous)) (cc)':\n",
    "    print(\"Yes, you're right !\")\n",
    "else:\n",
    "    print('Nope ! Try again :)')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c831d273-fcca-4731-96e9-e155f4b01eb0",
   "metadata": {},
   "source": [
    "## **3. `DatasetEphy` with brain regions**\n",
    "### 3.1 Drop channels - replace with brain region names\n",
    "\n",
    "Here, we're going to replace the name of the channels by their corresponding brain region names and define a `DatasetEphy` with those names of brain regions.\n",
    "\n",
    "<div class=\"alert alert-warning\"><p>\n",
    "\n",
    "**[Instructions]**\n",
    "    \n",
    "1. Get the name of the brain regions (`anat` output, column `roi`)\n",
    "2. Rename the dimension name `channels` of the `hga` to be `parcels`\n",
    "3. Fill this dimension `roi` with the name of the brain regions\n",
    "4. Define the `DatasetEphy` and specify the the `roi` dimension is now called `parcels`\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76db7ce2-b82c-4e2d-9656-5b43d6d00b65",
   "metadata": {},
   "outputs": [],
   "source": [
    "# write your answer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ba4d895-788f-4046-afc5-af7835797fdb",
   "metadata": {},
   "source": [
    "### 3.2 Do the same, for all of the subjects\n",
    "\n",
    "<div class=\"alert alert-warning\"><p>\n",
    "\n",
    "**[Instructions]**\n",
    "    \n",
    "Build a `DatasetEphy` with the 12 subjects, each having the PE dimension and the brain region names instead of the channel names\n",
    "</p></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07e7e69f-ec63-4704-9d9b-0d47afa6b658",
   "metadata": {},
   "outputs": [],
   "source": [
    "# write your answer"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}