# Assessing the quantity of data for functional alignment to estimate responses in the "parahippocampal place area": from raw data to results [![made-with-datalad](https://www.datalad.org/badges/made_with.svg)](https://datalad.org) This repository contains all data and code to generate the results in Chapter 5 of the PhD thesis "Exploring naturalistic stimuli as an alternative to a traditional functional localizer" written by Häusler, C.O. If you have never used [DataLad](https://www.datalad.org/) before, please read the section on DataLad datasets below. ## DataLad datasets and how to use them This repository is a [DataLad](https://www.datalad.org/) dataset. It allows fine-grained data access up to the level of single files. In order to use this repository for data retrieval, [DataLad](https://www.datalad.org/) is required. It is a free and open source command line tool, available for all major operating systems, and builds up on Git and [git-annex](https://git-annex.branchable.com/) to allow sharing, synchronizing, and version controlling collections of large files. You can find information on how to install DataLad at [handbook.datalad.org/en/latest/intro/installation.html](http://handbook.datalad.org/en/latest/intro/installation.html). ### Get the dataset A DataLad dataset can be `cloned` by running ``` datalad clone ``` Once a dataset is cloned, it is a light-weight directory on your local machine. At this point, it contains only small metadata and information on the identity of the files in the dataset, but not actual *content* of the (sometimes large) data files. ### Retrieve dataset content After cloning a dataset, you can retrieve file contents by running ``` datalad get ``` This command will trigger a download of the files, directories, or subdatasets you have specified. DataLad datasets can contain other datasets, so called *subdatasets*. If you clone the top-level dataset, subdatasets do not yet contain metadata and information on the identity of files, but appear to be empty directories. In order to retrieve file availability metadata in subdatasets, run ``` datalad get -n ``` Afterwards, you can browse the retrieved metadata to find out about subdataset contents, and retrieve individual files with `datalad get`. If you use `datalad get `, all contents of the subdataset will be downloaded at once. ### Stay up-to-date DataLad datasets can be updated. The command `datalad update` will *fetch* updates and store them on a different branch (by default `remotes/origin/master`). Running ``` datalad update --merge ``` will *pull* available updates and integrate them in one go. ### More information More information on DataLad and how to use it can be found in the DataLad Handbook at [handbook.datalad.org](http://handbook.datalad.org/en/latest/index.html). The chapter "DataLad datasets" can help you to familiarize yourself with the concept of a dataset. ## Dataset structure and contents - `code/`: a local installation of Python 3.7 and all custom code - `localpython/python37`: the local installation of Python 3.7.17 - `localpython/venv/python37`: a virtual environment comprising the packages necessary to run the calculations - `*.py`: Python 3.7 scripts - `*.sh`: Bash scripts - `*.submit`: submit files for HTCondor that distributes calculations across a computer cluster - `inputs/`: building blocks from other sources; DataLad datasets installed as subdatasets - `studyforrest-data-templatetransforms`: participant/scan-specific template images and transformations between these respective image spaces (cf. [repo on github](https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms/)) - `studyforrest-data-visualrois`: data and results of [Sengupta, et al. (2016)](https://github.com/psychoinformatics-de/studyforrest-data-visualrois.git) - `studyforrest-ppa-analysis`: data and results of [Häusler, Eickhoff, & Hanke (2022)](https://gin.g-node.org/chaeusler/studyforrest-ppa-analysis) - `masks/`: group masks and atlases - `sub-*/`: individual subject folders that contain - `masks/`: masks in the corresponding subject's voxel space (e.g., PPA and field of view) - `sub-*_task-*_run-*_bold_filtered.nii.gz`: time series that were used as input in FSL in [Sengupta et al. (2016)](https://www.nature.com/articles/sdata201693), and [Häusler, Eickhoff, & Hanke (2022)](https://www.nature.com/articles/s41597-022-01250-4) - `sub-*_task_*_run-*-*_bold-filtered.npy`: masked time series, z-scored per paradigm/run - `sub-*_ao-av-vis_concatenated_zscored.npy`: all paradigms concatenated and z-scored - `models/`: the shared response models calculated from the training subjects' data - `matrices/`: the transformation matrices of the subject obtained from aligning her/him to a shared response model - `predictions/`: the predicted Z-maps - `results/`: final statistics and figures - `statistics_cronbachs.csv`: Cronbach's Alpha of the empirical Z-maps - `corr_*-ppa-vs-estimation_srm-ao-av-vis_feat*.csv`: correlations between empirical and predicted Z-maps - `statistics_t-tests.csv`: results of the t-tests ## Cookbook -- How reproduce this dataset from scratch ### Setting up variables and the virtual environment # the present working directory mainDir=$PWD # activate the virtual environment source code/localpython/venv/python37/bin/activate # FSL v5.0.9 directory FSLDIR=~/fsl . ${FSLDIR}/etc/fslconf/fsl.sh PATH=${FSLDIR}/bin:${PATH} export FSLDIR PATH . ${FSLDIR}/etc/fslconf/fsl.sh ### Reproduce results of [Sengupta et al. (2016)](https://www.nature.com/articles/sdata201693) # install the dataset datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-visualrois.git inputs/studyforrest-data-visualroi # create union of visual ROIs # manually add 'create-roi-overlaps.sh' datalad save -m 'add script to create union of visual rois ([Sengupta et al. (2016)](https://github.com/psychoinformatics-de/studyforrest-data-visualrois.git))' # run it; outputs result to 'masks/in_mni' datalad run -m 'create union of visual rois' ./code/create-roi-overlaps.sh # retrieve file availability metadata of the subdataset containing the time series data datalad get -n inputs/studyforrest-data-visualrois/src/aligned # get the data datalad get inputs/studyforrest-data-visualrois/src/aligned/sub-*/in_bold3Tp2/sub-*_task-objectcategories_run-*_bold.nii.gz # rerun first-level analysis # get the FSL onset files for each subject datalad get inputs/studyforrest-data-visualrois/sub-*/onsets/run-*/*.txt # go into the dataset's directory cd inputs/studyforrest-data-visualrois # manually adjust 'code/despike.submit' and save datalad save -m 'adjust (absolute) paths in code/despike.submit' # run the despiking on a computer cluster condor_submit code/despike.submit # save results datalad save -m 'despike fMRI data' # manually adjust paths to current environment (FSL seems to require absolute paths) datalad save -m 'adjust (absolute) paths 1stlevel_design.fsf' datalad save -m 'adjust (absolute) paths generate_1st_level_design.sh' # generate first-level design files ./code/generate_1st_level_design.sh # run first-level analyses on a computer cluster # outputs results to 'inputs/studyforrest-data-visualrois/sub-*/run-*.feat'' condor_submit code/compute_1stlvl_glm.submit # save results datalad save -m '1st lvl results' # back to main directory cd $mainDir ### Reproduce results of [Häusler, Eickhoff, & Hanke (2022)](https://www.nature.com/articles/s41597-022-01250-4) # install the dataset datalad install -d . -s https://gin.g-node.org/chaeusler/studyforrest-ppa-analysis inputs/studyforrest-ppa-analysis # retrieve file availability metadata of the subdataset containing the time series data datalad get -n inputs/studyforrest-ppa-analysis/inputs/studyforrest-data-aligned # get the data datalad get inputs/studyforrest-ppa-analysis/inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-a?movie_run-?_bold*.* # retrieve file availability metadata of the subdataset containing the motion correction parameters of the audio-description datalad get -n inputs/studyforrest-ppa-analysis/inputs/phase1 # get the correction parameters datalad get inputs/studyforrest-ppa-analysis/inputs/phase1/sub???/BOLD/task001_run00?/bold_dico_moco.txt # retrieve file availability metadata of the subdataset containing templates and transforms datalad get -n inputs/studyforrest-ppa-analysis/inputs/studyforrest-data-templatetransforms # get the actual data datalad get inputs/studyforrest-ppa-analysis/inputs/studyforrest-data-templatetransforms/sub-*/bold3Tp2/ datalad get inputs/studyforrest-ppa-analysis/inputs/studyforrest-data-templatetransforms/templates/* # rerun first-level analysis # go into the dataset's directory cd inputs/studyforrest-ppa-analysis # manually adjust paths to current environment (FSL seems to require absolute paths) datalad save -m 'adjust paths in 1st lvl FEAT design files (movie & group, individuals)' # run first-level analyses on a computer cluster # outputs results to # inputs/studyforrest-ppa-analysis/sub-*/run-*_audio-ppa-grp.feat, and # inputs/studyforrest-ppa-analysis/sub-*/run-*_movie-ppa-grp.feat condor_submit code/compute_1st-lvl_movie-ppa-ind.submit condor_submit code/compute_1st-lvl_audio-ppa-ind.submit # save results datalad save -m '1st results movie & audio (individuals)' # rerun second-level analysis # create templates ./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_movie-ppa-ind.feat\ ./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_audio-ppa-ind.feat\ # save it datalad save -m 'add templates & transformation matrices to 1st lvl result directories of Feat' # manually adjust paths to current environment (FSL seems to require absolute paths) datalad save -m 'adjust (absolute) paths in generate_2nd-lvl-design_*-ppa-ind.sh' datalad save -m 'adjust (absolute) paths in 2nd-lvl_*-ppa-ind.fsf' # from template to individual design files (audio, individuals); adjusted paths datalad rerun c0ada988773 # from template to individual design files (movie, individuals); adjusted paths datalad rerun 9ece407eb20 # run the second-level analyses on a computer cluster # outputs results to: # inputs/studyforrest-ppa-analysis/sub-*/2nd-lvl_audio-ppa-ind.gfeat # inputs/studyforrest-ppa-analysis/sub-*/2nd-lvl_movie-ppa-ind.gfeat # inputs/studyforrest-ppa-analysis/sub-*/2nd-lvl_audio-ppa-grp.gfeat # inputs/studyforrest-ppa-analysis/sub-*/2nd-lvl_movie-ppa-grp.gfeat condor_submit code/compute_2nd-lvl_audio-ppa-ind.submit condor_submit code/compute_2nd-lvl_movie-ppa-ind.submit condor_submit code/compute_2nd-lvl_audio-ppa-grp.submit condor_submit code/compute_2nd-lvl_movie-ppa-grp.submit # save results datalad save -m '2nd lvl results movie & audio' # back to main directory cd $mainDir ### Masks & probabilistic ROIs # add probabilistic ROIs created in fsleyes to 'masks/in_mni' manually datalad save -m 'add probabilistic ROIs extracted from fsleyes (Harvard-Oxford & MNI Prob Atlas)' # add field of view of the audio-description ('fov_tmpl_0.5.nii.gz') to 'masks/in_mni' manually datalad save -m 'add AO study FoV mask (in MNI152 space)' # warp masks from MNI152 into subject spaces # manually add the script 'masks-from-mni-to-bold3Tp2.py' that will call 'applywarp' of FSL datalad save -m 'add masks-from-mni-to-bold3Tp2.py' # do the warping from MNI152 into subjects spaces # inputs from 'studyforrest-data-templatetransforms' are downloaded by the script # outputs results to 'sub-*/masks/in_bold3Tp2' datalad run -m 'warp MNI masks into individual bold3Tp2 spaces' \ ./code/masks-from-mni-to-bold3Tp2.py # manually add script that aligns t1w images with bold3Tp2 datalad save -m 'add masks-from-t1w-to-bold3Tp2.py' # run it; outputs results to 'sub-*/masks/in_bold3Tp2' datalad run -m 'transform masks in t1w to individual bold3Tp2' \ ./code/masks-from-t1w-to-bold3Tp2.py # create binary AO FoV from 4D data, and binary gray matter masks for each subject # manually add script the script datalad save -m 'add mask-builder-voxel-counter.py' # run it; outputs results to 'sub-*/masks/in_bold3Tp2' datalad run -m 'create individual AO FoV and gray matter masks' \ ./code/masks-builder-voxel-counter.py ### Reproduce time series of [Sengupta et al. (2016)](https://www.nature.com/articles/sdata201693) and [Häusler, Eickhoff, & Hanke (2022)](https://www.nature.com/articles/s41597-022-01250-4) # FSL does not save a grand mean scaled version of the time series, so let's do it # manually add 'grand_mean_for_4d.py' and 'grand_mean_for_4d.submit' datalad save -m 'add scripts that apply grand mean scaling to filtered functional data per subj & run' # run it on a computer cluster # outputs results to 'sub-*/sub-*_task-*_run-*_bold_filtered.nii.gz condor_submit code/grand_mean_for_4d.submit # save results datalad save -m 'save grand mean scaled runs (AV, AO, VIS) per subject' ### Preprocessing for the shared response model (SRM; [Chen et al., 2015](https://proceedings.neurips.cc/paper_files/paper/2015/file/b3967a0e938dc2a6340e258630febd5a-Paper.pdf)) # mask, z-score, and concat the time series # manually add 'data_mask_concat_runs.py', 'data_mask_concat_jobs.sh', and 'data_mask_concat_runs.submit' datalad save -m 'add scripts that mask & concat 4D data using HTcondor' # run it on a computer cluster # outputs results to: # 'sub-*/sub-*_task_aomovie-avmovie_run-1-8_bold-filtered.npy' # 'sub-*/sub-*_task_visloc_run-1-4_bold-filtered.npy' # 'sub-*/sub-*_ao-av-vis_concatenated_zscored.npy condor_submit code/data_mask_concat_runs.submit # save output datalad save -m 'save masked (individual gray matter & FoV) & concatenated runs' # manually merge branches it git merge -m "Merge results from job cluster" $(git branch -l | grep 'job-' | tr -d ' ') # delete branches matching a pattern branch | grep "job-*" | xargs git branch -D ### Perform the model fit on data of the training subjects # manually add scripts that calculate the common functional space and transformations for the training subjects # ('data_srm_fitting.py', 'data_srm_fitting_venv.sh', 'data_srm_fitting.submit' ) datalad save -m 'add scripts to perform SRM fitting (.py, .sh, .sh)' # run it on a computer cluster # outputs results to 'sub-*/models' condor_submit code/data_srm_fitting.submit # save it datalad save -m 'save SRM models (computed on a cluster)' ### Obtain transformation matrices of the left-out subjects # manually add script that calculate the transformation matrices datalad save -m 'add template of get_wmatrix_for_left-out.py' # manually add scripts that allow calculations on a cluster # ('get_wmatrix_for_left-out_venv.sh', 'get_wmatrix_for_left-out.submit') datalad save -m 'add scripts (.submit, .sh) that allow calculation matrices on a cluster' # run it on a computer cluster # outputs results to 'sub-*/matrices' condor_submit code/get_wmatrix_for_left-out.submit # save output datalad save -m 'save test subjects matrices (computed on a cluster)' ### Predict PPA # manually add scripts that maps individual results through common spaces # into a test subject's voxels space to predict his/her/whatever PPA; # also warps ROIs of training subjects via FSL's applywarp into voxel space # of the test subject, and calculates the correlations between empirical # and predicted Z-maps datalad save -m 'add predict_ppa.py' # manually add a HTCondor submit file datalad save -m 'add submit file predict_ppa.submit' # run it on a computer cluster # outputs results to: # masks/in_mni/sub-*_*-PPA.nii.gz, # 'sub-*/masks/in_bold3Tp2' # 'sub-*/predictions', and # 'results/corr_*-ppa-vs-estimation_srm-ao-av-vis_feat10.csv' condor_submit code/predict_ppa.submit datalad save -m 'save predicted z-maps (computed on a cluster)' ### Calculate statistics # manually add statistics_cronbachs.py datalad save -m 'add script that calculates cronbachs alpha for VIS runs, and AV & AO segments' # run it; outputs results to 'results/statistics_cronbachs.py datalad run -m 'calculate Cronbachs Alpha of empirical Z-maps' \ ./code/statistics_cronbachs.py # statistics_t-test-correlations.py datalad save -m 'add scripts for t-test of correlations' # run it; outputs results to 'results/statistics_t-tests.csv' datalad run -m 'calculate t-tests' ./code/statistics_t-test-correlations.py ### Create Plots # correlations of modeled hemodynamic reponses and shared responses # manually add 'plot_corr-of-glm-and-srm.py' datalad save -m 'add plot_corr-of-glm-and-srm.py' # plot actual time series datalad run -m "plot similarity of regressors and shared responses" \ ./code/plot_corr-of-glm-and-srm.py \ -model 'sub-01/models/srm-ao-av-vis_feat10-iter30.npz' \ -o results # plot mean of 1000 models datalad run -m "plot similarity of regressors and shared responses based on shuffled runs" \ ./code/plot_corr-of-glm-and-srm.py \ -model 'sub-01/models/srm-ao-av-vis_shuffled-across_feat10-iter30-0001.npz' \ -o results # reliability of empirical Z-maps # manually add 'plot_cronbachs.py' datalad save -m 'add script for plotting stripplot Cronbachs' # plot it datalad run -m "plot cronbachs alpha of 1st lvl z-maps" \ ./code/plot_cronbachs.py # correlations of empiricial and predicted Z-maps # manually add 'plot_corr-emp-vs-estimation.py' datalad save -m 'add script for plotting stripplot of correlations' # plot it datalad run -m "plot stripplot of correlations empirical vs. estimation" \ ./code/plot_corr-emp-vs-estimation.py \ -invis './results/corr_vis-ppa-vs-estimation_srm-ao-av-vis_feat10.csv' \ -inav './results/corr_av-ppa-vs-estimation_srm-ao-av-vis_feat10.csv' \ -inao './results/corr_ao-ppa-vs-estimation_srm-ao-av-vis_feat10.csv' # size of ROIs # manually add 'plot_voxel-counts.py' datalad save -m 'add script for plotting voxels per subject-specific mask' # plot it datalad run -m 'plot size of ROIs per subject' \ ./code/plot_voxel-counts.py