\documentclass[letterpaper, 11pt]{article} \usepackage{amsmath} \usepackage[final]{graphicx} \usepackage[left = 1.5in, right =1.5in, top = 1.25in, bottom = 1.25in]{geometry} \usepackage{booktabs} \usepackage{tabularx} \usepackage{longtable} \title{FAnalyze v0.1 \\User's Manual} \author{Dan Valente\\\small{Mitra Lab}\\\small{Cold Spring Harbor Laboratory}} \date{\today} \begin{document} \maketitle \section{Introduction} FAnalyze is a suite of functions written in Matlab to aid in the analysis of animal behavior in the Open Field. It was written and tested primarily for fly exploratory behavior in a circular arena, and a description and illustration of the measures calculated by FAnalyze can be found in Ref.\ \cite{valente1}. FAnalyze assumes that the user has tracked the (x,y) location of the animal to obtain the trajectory that the animal took over the course of the experiment. FAnalyze smooths the data, calculates the animal's velocity, and allows one to segment space and speed based on hard thresholds. With this inaugural version, FAnalyze only allows for segmentation into circular zones (it is assumed that the Open Field arena is circular). FAnalyze then allows the user to calculate and display probability distributions of the relevant variables to obtain a quantitative phenotypic characterization of the exploratory behavior. Please note that these calculations --- the probability distributions --- are the focus of the program. Although it has been rigourously tested, being that this is version 0.1, please be aware that multiple bugs may still exist in FAnalyze. If you find them, feel free to fix them (if you are Matlab savvy), or send an email to the author. To use FAnalyze, a basic working knowledge of Matlab is assumed. \section{Placing FAnalyze in the Matlab path} In order to use FAnalyze, the FAnalyze folder must first be placed in the Matlab path. \begin{enumerate} \item Open Matlab. \item Select \textbf{File $>>$ Set Path\ldots} from the toolbar. The Set Path dialogue will open up. \item Click on the \textbf{Add Folder\ldots} button. \item Find the FAnalyze directory and select the \textbf{functions} folder. Click \textbf{OK}. The path and name of the FAnalyze functions folder should show up in the Matlab search path window. \item Click \textbf{Save}, and then click \textbf{Close}. FAnalyze is now ready to use. To test if it is installed correctly, \texttt{help FAnalyze} in the Matlab command window. If an error is obtained upon this command, the folder was not correctly installed, so try to re-install. If the installation worked, you will be able to open the program or use any of the functions regardless of what directory you are currently in. \end{enumerate} \section{The FAnalyze GUI} The FAnalyze graphical user interface (GUI) was written to facilitate use of the analysis functions for quick data exploration. It is not intended for large scale data analysis---the individual FAnalyze Matlab functions are best for this---although the user may find the GUI useful in high-throughput studies. \begin{figure} \centering \hspace{-0cm} \includegraphics[bb= 0.0 0.0 622.0 367.0 clip=true, scale = 0.7]{GUI.jpg} \caption{The FAnalyze GUI} \label{GUI} \end{figure} The interface is shown in Fig.\ \ref{GUI} and is divided into three general sections: Trajectory Data, Probability Distributions, and Segmentation. The functionality of these sections is described below. \subsection*{Loading a Trajectory} To begin, the user must load a trajectory file for analysis by clicking the \textbf{Load Trajectory} button. FAnalyze assumes that the data is contained in a \texttt{.mat} file. Within that file \emph{must} be variables labeled \texttt{x}, \texttt{y}, and \texttt{t} describing the spatial coordinates and time of every point in the trajectory. FAnalyze permits analysis of only one file per session. Once the file is loaded, a message is displayed in the Command Window informing the user of the chosen file's name. \subsection*{Instructions for Smoothing} Due to jitter in the object's location caused by non-translational object movements and artifacts of the tracking method, it is typically good practice to smooth the resulting trajectory. The smoothing is executed using the function \texttt{runline} from the Chronux neural data analysis software package (http://www.chronux.org), which performs a local linear regression on the data\footnote{\texttt{runline} is included with FAnalyze, so having Chronux is not a pre-requisite for use of this program.}. \begin{enumerate} \item Select the length of the running window (in samples) to be used in the smoothing process. \item Select the step-size (in samples) that the window will take as it moves across the data (also known as the Window Overlap.) Enter these numbers into the appropriate boxes. \item Click the \textbf{Smooth Trajectory} button. In the Command Window, a message informs the user that the smoothing process is underway\footnote{Smoothing may take a long time, depending on the length of the trajectory.}. When smoothing has finished, a message will be displayed declaring a successful completion of the smoothing algorithm. \end{enumerate} Because smoothing can significantly eliminate fluctuations in the data, one is advised to investigate the effects of changing the smoothing parameters on the resulting trajectory. This will allow you to decide what amount of fluctuations are physically relevant in your videos. For instance, if the data is smoothed too drastically, a short but visible stop of the object can be entirely smoothed into an apparent movement! \subsection*{Viewing the Trajectory} Once the data is smoothed, the user is able to view the resulting trajectory. Simply select the plot of interest from the pull-down menu in the Trajectory Data section and click \textbf{View Trajectory}. The user has access to view the following data (note that a circular arena is assumed, hence the availability of data in polar coordinates): \begin{tabular}{l r} \texttt{(x,y)} & The complete position trajectory\\ \texttt{x} & The x-coordinate as a function of time\\ \texttt{y} & The y-coordinate as a function of time\\ \texttt{r} & The radial coordinate as a function of time \\ \texttt{theta} & The angular coordinate as a function of time\\ \texttt{vx} & The x-velocity as a function of time\\ \texttt{vy} & The y-velocity as a function of time\\ \texttt{v} & The speed as a function of time\\ \texttt{vtheta} & The direction of the velocity vector as a function of time \end{tabular} \subsection*{Creating and Viewing Probability Distributions} The philosophy behind FAnalyze is as follows\footnote{This is adapted from Ref.\ \cite{valente1}}: We regard the trajectory as a stochastic process $\mathbf{x}(t)$. This process would be fully characterized if the joint distributions $P(\mathbf{x}(t_1), \mathbf{x}(t_2), \ldots ,\mathbf{x}(t_n))$ were specified for all choices of time points $(t_1,t_2, \ldots ,t_n )$. Unfortunately, the full distribution, $P(\mathbf{x}(t_1), \mathbf{x}(t_2), \ldots ,x(t_n))$, is difficult (if not impossible) to measure. However, by examining joint distributions of position and velocity along with distributions of path curvature, reorientation angle, and event durations, we can obtain a convenient summary of the animal's behavior in the arena and its interaction with the environment. The distributions are estimated using histograms of the data, so it is recommended that the organism be studied for a long period of time for ``clean-looking" distributions (a ``good" length of time will depend on the activity of the animal and the frame rate that the video was taken at). These probability histograms are calculated with the functions \texttt{ProbDist1D}, \texttt{ProbDist2D}, and \texttt{JointDist}. When examining histogram estimates of probability distributions, one needs to exercise care about phase space factors in order to obtain accurate estimates. For example, if the animal is moving in two dimensions, the probability density for the speed $v$ along with the phase space factors is given by $p(v)vdvd\theta$ (where $\theta$ is polar angle of the point $(v_x,v_y)$ in velocity space). Therefore, binning data in bins of size $\Delta v\Delta\theta$ would yield an estimate for $p(v)v$. When this is the case, we eliminate the need to divide by $v$ (which could be an unstable calculation for small $v$) by binning in $v^2$, since $p(v)vdvd\theta \sim p(v^2/2)d(v^2/2)d\theta$. For one-dimensional motion, such as movement along the arena boundary, there are no phase space factors and it is sufficient to bin the data in $v$. FAnalyze allows the user to select whether to calculate the distributions assuming a 1D or 2D phase space. For 2D phase space calculations, the user should take note of the non-constant bin widths of these histograms. Therefore, as soon as the data is smoothed (as well as when the data is segmented), the Probability Distributions list box will become populated with the variables that are available for analysis. The naming convention for variables in the list box is described in Appendix A. To calculate and view the probability distributions of interest, proceed as follows: \begin{enumerate} \item For a single variable marginal distribution, highlight the variable of interest by clicking on it. For a joint distribution of two variables, select the first variable of interest, hold down the CTRL button on the keyboard, and select the second variable of interest. \item Enter the \textbf{Bins} to use for the calculation. This field will accept any bin description that the Matlab \texttt{hist} or \texttt{hist3} commands accept. See the help files of those functions for details, and make sure that brackets commas and other necessary puncuation are used. Also note that no other options available to \texttt{hist} or \texttt{hist3} are available in the FAnalyze functions with this release (v0.1). As an example, if you wish to calculate a joint distribution having 100 bins in the first variable and 150 in the second, you would enter: \texttt{[100 150]}. If, instead you wanted bin centers from 0 to 10 in steps of 0.1 for the first variable, and bin centers from 2 to 4 in steps of 0.3 for the second you would enter: \texttt{ \{[0:0.1:10] [2:0.3:4]\}}. (Note the curly brackets). \item Select whether the variable of interest exists in a one or two-dimensional phase space (see above). \item Click the \textbf{View Distribution} button. The distribution will be calculated and a plot of the distribution will be displayed. \end{enumerate} Every calculation that is performed is held in memory until FAnalyze is closed. At any point in time, you may click the Save button to save your calculations (structure of data is described below). Unfortunately, at this point, the user cannot access the calculated data from the command window until after the data is saved and reloaded. \subsection*{Segmenting Space} Often, an examination of the joint distributions $p(x,v)$ or $p(r,\theta)$ will show that the animal has a spatial preference for some part of the arena. FAnalyze allows the user to segment the arena into any number of circular spatial ``zones." Please note that only concentric circular zones are allowed (or, rather, toroidal zones). To segment space: \begin{enumerate} \item Enter the \textbf{Number of Zones} that you wish to segment the arena into. \item Click the \textbf{Segment Space} button. A window will pop up asking the user to input relevant information. \item Input the location of the threshold defining the boundary between zones 1 and 2, in terms of the radial distance from center. Enter names for these zones. Click \textbf{OK}. \item If more than two zones are requested, another window will pop up asking for similar information. Make sure that the first zone name in this window is the same as the second name in the last window and that the second threshold is larger than the first; otherwise, you will get an error (zone 2 must have a consistent name, and the second threshold must be further than the first threshold). \item Repeat this for all the zones you requested. \item Once the segmentation is finished, a message will be displayed in the command window, and the Probability Distributions list box will be populated with variables available for analysis. \end{enumerate} \subsection*{Segmenting Speed} Similar to the spatial distributions, when examining the speed distribution $p(v)$, the user may find that the distribution appears to be a mixture of a few different types of motion. Because of this, investigators often find it useful to segment the speed into distinct modes of motion. In FAnalyze, speed segmentation is performed almost exactly as the space segmentation. To segment speed: \begin{enumerate} \item Enter the \textbf{Noise Threshold} seen in your data. The noise threshold is the lowest speed that you can accurately resolve. It can be obtained by examining the speed vs.\ time plot and noting the maximum speed attained in regions where the animal is visibly stationary. Velocity and speed points below this threshold are assigned a value of 0. \item Click the \textbf{Segment Speed} button. A window will pop up asking the user to input relevant information \emph{for each spatial zone}. \item Input the location of the threshold defining the boundary between segments 1 and 2, in terms of the absolute speed. Enter names for these segments. Click \textbf{OK}. \item If more than two segments are requested, another window will pop up asking for similar information. Make sure that the first segment name in this window is the same as the second name in the last window and that the second threshold is larger than the first; otherwise, you will get an error (segment 2 must have a consistent name, and the second threshold must be further than the first threshold). \item Repeat this for all the segments and zones you requested. \item FAnalyze now segments the data according to the user's requests, as well as calculating where the animal has stopped (points below the noise threshold). Stops are considered a ``segment." \item Once the segmentation is finished, a message will be displayed in the command window, and the Probability Distributions list box will be populated with variables available for analysis. \end{enumerate} \subsection*{Saving the Data} To save the data from the session as a \texttt{.mat} file, click on the \textbf{Save} button in the lower right-hand corner of FAnalyze. The user is asked to choose a location and a filename in which to save. The data is saved as two cells, \texttt{traj} and \texttt{P}.\\ \noindent\texttt{traj} is a 1 x $N$ cell, where $N$ is the number of zones in the arena. The $i^{\text{th}}$ cell contains a structure with the trajectory information from that zone. Within that structure, each variable is a cell itself containing structures for each speed segment. For example, data from the second zone is accessed by typing \texttt{traj\{2\}} and is organized as follows: \begin{verbatim} traj{2} = zone_label: {`CZ'} seg_label: {`all' `stops' `NZS' `FSS'} t: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} x: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} y: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} r: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} theta: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} vx: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} vy: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} v: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} vtheta: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} tau: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} kappa: {[1x1 struct] [1x1 struct] [1x1 struct] [1x1 struct]} beta: {[1x1 struct]} \end{verbatim} The labels are fairly self explanatory. Note that \texttt{beta} can only be calculated if all the points in the trajectory are considered. The $j^{\text{th}}$ entry of each variable's cell contains a structure with a single field. This field is called \texttt{data}. So, if one is interested in extracting the x-position of the organism in zone \texttt{`CZ'} (the second zone), while the organism walked in speed segments labeled by \texttt{`FSS'} (the fourth segment), one would type \begin{verbatim} traj{2}.x{4}.data \end{verbatim} The probability histograms are saved in the \texttt{P} cell. \texttt{P} is a 1 x M cell, where M is the number of times the View Distributions button was pressed during the session. Each entry of the cell contains a structure that is organized as follows: \begin{verbatim} P{1} = label: `x_Full Arena_all' phase_opt: `phase1D' data: [1x50 double] bins: [1x50 double] \end{verbatim} The field \texttt{label} is the name of the variable from which the probability distribution was calculated. \texttt{phase\_opt} denotes whether the user chose to calculate the distribution assuming a 1D or 2D phase space. \texttt{data} contains the bin-by-bin data from the calculated histogram, and \texttt{bins} contains the bin centers. If an error occurs, the entry is completely empty. Admittedly, this seems complicated, but the author felt it was a decent way to organize the data file. \section{Known Problems} There is only one known problem with FAnalyze at this point. Please be aware that you cannot choose to segment speed and \emph{then} segment space. You must segment space first and \emph{then} segment speed. You can, however, select to only segment speed or only segment space. \section{Concluding Comments} For those who wish to use the functions from the Matlab command line, complete descriptions of their use and workings can be found in the Matlab help files; simply type \texttt{help function\_name}. Describing them in detail here would be superfluous. The scripts are also commented, and as such, they should be relatively easy to follow. Suggestions for improvements to the algorithms, the GUI or the coding style are highly encouraged! Comments on the ease of use of the GUI and functions are also important for refining this program. Since this is v0.1, FAnalyze needs quite a bit of testing in order to find all of the bugs (pun intended). Until then, please check and double-check any results that you obtain from this program, and make sure that they make sense! \\ \noindent Enjoy! \section{Appendix A: Variable Naming Convention} There are eleven variables that are available for analysis in FAnalyze. They are:\\ \begin{tabular}{l r} \texttt{x} & The x-coordinate as a function of time\\ \texttt{y} & The y-coordinate as a function of time\\ \texttt{r} & The radial coordinate as a function of time \\ \texttt{theta} & The angular coordinate as a function of time\\ \texttt{vx} & The x-velocity as a function of time\\ \texttt{vy} & The y-velocity as a function of time\\ \texttt{v} & The speed as a function of time\\ \texttt{vtheta} & The direction of the velocity vector as a function of time\\ \texttt{tau} & Duration of speed segments\\ \texttt{kappa} & Curvature of the path \\ \texttt{beta} & Reorientation angle \end{tabular}\\ In the list box, these variable names are followed by the zone name and the speed segment name as given by the user. For example, for a zone named \texttt{`RZ'} and a speed segment within that zone named \texttt{`FSS'}, the speed would appear as \texttt{v\_RZ\_FSS}. For the full arena, the label is automatically called (appropriately) \texttt{`Full Arena'}. If all the velocity points are included the speed segment label is \texttt{`all'}. Therefore, after smoothing and before any segmentation, the variables in the list contain the label \texttt{`\_Full Arena\_all'}. \begin{thebibliography}{9} \bibitem{valente1}Valente, D., Golani, I., and P.P. Mitra, ``Analysis of the trajectory of \emph{Drosophila melanogaster} in a circular open field arena." PLoS ONE 2(10), e1083 doi:10.1371/journal.pone.0001083, (2007) \bibitem{valente2}Valente, D., Wang H., Andrews P., Saar S., Tchernichovski O., Benjanimi, Y., Golani I. and P.P. Mitra, ``Characterization of animal behavior through the use of audio and video signal processing." IEEE Multimedia, 14 (2), 32-41, (2007) \end{thebibliography} \end{document}