Selaa lähdekoodia

Added PCA->FA analysis

Michael Denker 3 viikkoa sitten
vanhempi
commit
d98b04e761
2 muutettua tiedostoa jossa 340 lisäystä ja 0 poistoa
  1. 340 0
      Yu/tutorials/Exercises_PCA_to_FA.ipynb
  2. BIN
      Yu/tutorials/data_for_exercises.mat

+ 340 - 0
Yu/tutorials/Exercises_PCA_to_FA.ipynb

@@ -0,0 +1,340 @@
+{
+ "cells": [
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "# ANDA-NI 2024 Lecture Exercises: Byron Yu - Dimensionality Reduction ",
+   "id": "902e1319dc9042d1"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": [
+    "import numpy as np\n",
+    "import scipy.io as sio\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D"
+   ],
+   "id": "64622552e0387a21",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "The dataset for the following problems is contained in `data_for_exercises.mat`. You will\n",
+    "find the following variables:\n",
+    "\n",
+    "- `Xplan`: a $728 \\times 97$ matrix of real data, where each row is a spike count vector across 97 simultaneously-recorded neurons in dorsal premotor cortex (PMd) of a macaque monkey1. Spike counts are taken in a 200 ms bin while the monkey is planning to make an arm reach. There are 91 trials for each of 8 reaching angles, for a total of 728 trials. Trials 1 to 91 correspond to reaching angle 1, trials 92 to 182 correspond to reach angle 2, etc.\n",
+    "- `Xsim`: a $8 \\times 2$ matrix of simulated data, where each row is a data point.\n",
+    "\n",
+    "The neural data have been generously provided by the laboratory of Prof. Krishna Shenoy at Stanford\n",
+    "University. The data are to be used exclusively for educational purposes in this course."
+   ],
+   "id": "c8d5c5667c4691e7"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 1: Visualization of high-dimensional neural activity using PCA",
+   "id": "77a1140c20567fd1"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 1a",
+   "id": "7ed21ee03f3ba20e"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    " \n",
+    "First, we will apply PCA to `Xplan` to gain some intuition about plan activity in PMd. The data points are $ \\mathbf{x_n} \\in \\mathbf{R}^D$ $(n = 1,\\ldots, N)$, where $D = 97$ is the data dimensionality and $N = 728$ is the number of data points.\n",
+    "\n",
+    "Plot the square-rooted eigenvalue spectrum. If you had to identify an elbow in the eigenvalue spectrum, how many dominant eigenvalues would there be? What percentage of the overall variance is captured by the top 3 principal components?"
+   ],
+   "id": "5cda8903962956a3"
+  },
+  {
+   "cell_type": "code",
+   "id": "6b7ede524b085671",
+   "metadata": {},
+   "source": "",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "There is an elbow after the 3rd dominant eigenvalue. The top three eigenvectors explain 44.8% of the data variance.",
+   "id": "b4b85cba5a45319e"
+  },
+  {
+   "metadata": {
+    "collapsed": true
+   },
+   "cell_type": "code",
+   "source": "",
+   "id": "initial_id",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 1b",
+   "id": "b9135778e1d5ed"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "For the purposes of visualization, we’ll consider the PC space defined by the top $M = 3$ eigenvectors. Project the data into the three-dimensional PC space. Plot the projected points using `plot_3d_points`, and color each dot appropriately according to reaching angle (there should be a total of 728 dots). Use your mouse to rotate the three-dimensional plot. Show a view in which the clusters are well-separated.",
+   "id": "8ff4abd1ba31aa3b"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "824373e992974460",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 1c",
+   "id": "8669f1371bc9dc4"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Define a matrix $U_M \\in \\mathbf{R}^{D\\times M}$ containing the top three eigenvectors (i.e., PC directions), where $U_M (d, m)$ indicates the contribution of the $d$-th neuron to the $m$-th principal component. Show the values in $U_M$ by calling `imshow(UM)`. (Note: Also call `plt.colorbar()` to show the scale.) Are there are any obvious groupings among the neurons in each column of $U_M$?",
+   "id": "7e857be9c02a37d1"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "a58e715f9104595a",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2: From PCA to PPCA and FA",
+   "id": "a12b006aa359525e"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "30ba2f38caa8835e",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2a",
+   "id": "7a38b1d31a7b0881"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "Create one plot containing all of the following for PCA:\n",
+    "- Plot each data point $\\mathbf{x_n}$ as a black dot in a two-dimensional space.\n",
+    "- Plot the mean of the data $\\mathbf{\\mu}$ as a big green point.\n",
+    "- Plot the PC space defined by $u_1$ as a black line. (Hint: This line should pass through $\\mu$.)\n",
+    "- Project each data point into the PC space, and plot each projected data point $\\mathbf{\\hat x_n}$ as a red dot. (Hint: The projected points should lie on the $u_1$ line.)\n",
+    "- Connect each data point $x_n$ with its projection $\\mathbf{\\hat x_n}$ using a red line. (Hint: The red lines should be orthogonal to the PC space. To see this, you will need to call ‘axis equal’ in Matlab.)"
+   ],
+   "id": "ef663e6b30c8eaa9"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "3dfdbf29782fcb76",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2b",
+   "id": "2d223af176b0f288"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Implement the EM algorithm for PPCA in Python and run the algorithm on the data in `Xsim`. Plot the log PPCA likelihood $\\sum_{n=1}^N \\log P(\\mathbf{x_n})$ versus EM iteration. (Hint: The log data likelihood should increase monotonically with EM iteration. You should run enough EM iterations to see a long, flat plateau.)",
+   "id": "c29a67f52c2a6908"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "9b7923779b15d8a0",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2c",
+   "id": "5a1b922c517f136b"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Using the parameters found in 2b, what is the PPCA covariance $(WW^T+\\sigma^2I)$? If you did part 2b correctly, the PPCA covariance should be very similar to the sample covariance.",
+   "id": "8ca4758def4d05b1"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "a6008a1710ed90a8",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2d",
+   "id": "3b87048033ea04cd"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "Create one plot containing all of the following for PPCA:\n",
+    "- Plot each data point $\\mathbf{x}_n$ as a black dot in a two-dimensional space.\n",
+    "- Plot the mean of the data $\\mathbf{μ}$ as a big green point.\n",
+    "- The PC space found by PPCA is defined by $W$ , which in this case is a two-dimensional vector. Check that PC space defined by W is identical to that found by PCA. Plot the PC space as a black line. (Hint: If there is some small discrepancy between the PC spaces found by PCA and PPCA, run more EM iterations for PPCA.)\n",
+    "- Project each data point into the PC space using PPCA, and plot each projected data point $\\mathbf{\\hat x}_n = W E[z_n | \\mathbf{x}_n ] + \\mathbf{\\mu}$ as a red dot. (Hint: The projected points should lie on the line defined by $W$.)\n",
+    "- Connect each data point $x_n$ with its projection $\\mathbf{\\hat x_n}$ using a red line. Why are the red lines no longer orthogonal to the PC space?"
+   ],
+   "id": "ead3408b6fbab165"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "87c5089a33f08782",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2e",
+   "id": "94b7f7fde28c8038"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Implement EM algorithm for FA. You should be able to do this with only a small the modification to your PPCA code.  Run the algorithm on the data in `Xsim`. Plot the log PPCA likelihood $\\sum_{n=1}^N \\log P(\\mathbf{x_n})$ versus EM iteration. (Hint: The log data likelihood should increase monotonically with EM iteration. You should run enough EM iterations to see a long, flat plateau.)",
+   "id": "56b2ba005114dfd2"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Solution: See function `fastfa` above.",
+   "id": "fabd6eb210c6f4d5"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2f",
+   "id": "a5eace9c10a19fc3"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Using the parameters found in 2e, what is the FA covariance $(WW^T+\\Psi)$? If you did part 2e correctly, the FA covariance should be very similar to the sample covariance.",
+   "id": "b600ee12e912fb64"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "fc6f7e78ef385eb4",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Problem 2g",
+   "id": "fcbac96ee25ba619"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "Create one plot containing all of the following for FA:\n",
+    "- Plot each data point $\\mathbf{x}_n$ as a black dot in a two-dimensional space.\n",
+    "- Plot the mean of the data $\\mathbf{μ}$ as a big green point.\n",
+    "- The PC space found by FA is defined by $W$ , which in this case is a two-dimensional vector. Plot the PC space as a black line. (Hint: This line should pass through $\\mathbf{μ}$.) Why is the low-dimensional space\n",
+    "found by FA different from that found by PCA and PPCA?\n",
+    "- Project each data point into the PC space using FA, and plot each projected data point $\\mathbf{\\hat x}_n = W E[z_n | \\mathbf{x}_n ] + \\mathbf{\\mu}$ as a red dot. (Hint: The projected points should lie on the line defined by $W$.)\n",
+    "- Connect each data point $x_n$ with its projection $\\mathbf{\\hat x_n}$ using a red line."
+   ],
+   "id": "70b20c3f6dcc673"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Plot FA Projection:",
+   "id": "18eda18d38abd3a3"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "1cc70f6d52fe2653",
+   "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "Plot FA Log-Likelihood:",
+   "id": "48d30404a840dc2b"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": "",
+   "id": "e52c04eb1c995ee6",
+   "outputs": [],
+   "execution_count": null
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

BIN
Yu/tutorials/data_for_exercises.mat