# Processing of visual and non-visual naturalistic spatial information in the "parahippocampal place area": from raw data to results

[![made-with-datalad](https://www.datalad.org/badges/made_with.svg)](https://datalad.org)

This repository contains the raw data and all code to generate the results in
Häusler, C. O., Eickhoff, S. B., & Hanke, M. (2022). Processing of visual and
non-visual naturalistic spatial information in the "parahippocampal place area".
Scientific Data, 9(1). doi: [10.1038/s41597-022-01250-4](https://doi.org/10.1038/s41597-022-01250-4).

If you have never used [DataLad](https://www.datalad.org/) before, please read the
section on DataLad datasets below.

## DataLad datasets and how to use them

This repository is a [DataLad](https://www.datalad.org/) dataset. It allows
fine-grained data access up to the level of single files.  In order to use this
repository for data retrieval, [DataLad](https://www.datalad.org/) is required.
It is a free and open source command line tool, available for all major
operating systems, and builds up on Git and
[git-annex](https://git-annex.branchable.com/) to allow sharing, synchronizing,
and version controlling collections of large files. You can find information on
how to install DataLad at
[handbook.datalad.org/en/latest/intro/installation.html](http://handbook.datalad.org/en/latest/intro/installation.html).

### Get the dataset

A DataLad dataset can be `cloned` by running

```
datalad clone <url>
```
Once a dataset is cloned, it is a light-weight directory on your local machine.
At
this point,
it contains only small metadata and information on the identity of the files in the dataset,
but not actual *content* of the (sometimes large) data files.

### Retrieve dataset content

After cloning a dataset, you can retrieve file contents by running
```
datalad get <path/to/directory/or/file>
```
This command will trigger a download of the files, directories, or subdatasets you have specified.

DataLad datasets can contain other datasets, so called *subdatasets*. If you clone the top-level
dataset, subdatasets do not yet contain metadata and information on the identity of files,
but appear to be empty directories. In order to retrieve file availability metadata in
subdatasets, run

```
datalad get -n <path/to/subdataset>
```
Afterwards, you can browse the retrieved metadata to find out about subdataset contents, and
retrieve individual files with `datalad get`. If you use `datalad get <path/to/subdataset>`,
all contents of the subdataset will be downloaded at once.

### Stay up-to-date

DataLad datasets can be updated. The command `datalad update` will *fetch* updates and store them
on a different branch (by default `remotes/origin/master`). Running
```
datalad update --merge
```
will *pull* available updates and integrate them in one go.

### More information

More information on DataLad and how to use it can be found in the DataLad Handbook at
[handbook.datalad.org](http://handbook.datalad.org/en/latest/index.html). The chapter
"DataLad datasets" can help you to familiarize yourself with the concept of a dataset.

## Dataset structure

- All inputs (i.e. building blocks from other sources) are located in
  `inputs/`.
- All custom code and (templates of) FEAT design files are located in `code/`.
- Segmented annotations are located in `events/segments/`.
- Templates of event files for FEAT created from segmented annotations are located in `events/onsets/`.
- Individual subject folders (`sub-*/`) contain individualized FEAT setup files (i.e. GLM design files) for the first level analyses that estimated parameters for each fMRI run of each subject seperately (e.g. 'run-1_1st_audio-ppa-grp.fsf') and for the second level analyses that averaged parameter estimates across runs of every subject (e.g. `2nd-lvl_audio-ppa-grp.fsf`).
- Individual subject folders (`sub-*`) also contain the results of the first level (e.g. `run-1_audio-ppa-grp.feat`) and second level analyses (e.g. `2nd-lvl_audio-ppa-grp.gfeat`).
- Results of first level analyses (e.g. `sub-01/run-1_audio-ppa-grp.feat`) contain thresholded (`thresh_zstat*.nii.gz`) and unthresholded (`stats/zstat*.nii.gz`) z-maps for every contrast (contrast of parameter estimates; COPE), parameter estimates for every regressor and temporal derivative (`stats/pe*.nii.gz`), and the anatomical image that was used as standard space (`reg/standard.nii.gz`).
- Results of second level analyses (e.g. `sub-01/2nd-lvl_audio-ppa-grp.gfeat`) contain thresholded (`cope*.feat/thresh_zstat1.nii.gz`) and unthresholded (`cope*.feat/stats/zstat1.nii.gz`) z-maps for every contrast.
- Results of third level GLM analyses that averaged parameter estimates across subjects are located in `3rd-lvl/`.
- Each folder (e.g. `3rd-lvl/audio-ppa_c1_z3.4.gfeat`) contains the thresholded (`cope1.feat/thresh_zstat1.nii.gz`) and unthresholded (`cope1.feat/stats/zstat1.nii.gz`) z-maps of the corresponding contrast.
- A detailed descripten of FEAT (output) directories can be found in the [FEAT UserGuide](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FEAT/UserGuide#FEAT_Output).


## Cookbook -- How this dataset was assembled
### install subdatasets and get the raw data

    # install subdataset that provides motion corrected fMRI data from the audio-visual movie and its audio-description
    datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-aligned inputs/studyforrest-data-aligned
    # download 4D fMRI data (and motion correction parameters of the movie) 
    datalad get inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-a?movie_run-?_bold*.*

    # install subdataset that provides the original 7 Tesla data to get the motion correction parameters of the audio-descriptio
    datalad install -d . -s juseless.inm7.de:/data/project/studyforrest/collection/phase1 inputs/phase1
    datalad get inputs/phase1/sub???/BOLD/task001_run00?/bold_dico_moco.txt

    # install subdataset "template & transforms", and download the relevant images
    datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms inputs/studyforrest-data-templatetransforms
    datalad get inputs/studyforrest-data-templatetransforms/sub-*/bold3Tp2/
    datalad get inputs/studyforrest-data-templatetransforms/templates/*

    # install subdataset "studyforrest-data-annotations" that contains the annotation of cuts & locations as subdataset
    # and "code/researchcut2segments.py" that we need to segment the (continuous) annotations
    datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-annotations inputs/studyforrest-data-annotations

    # install the annotation of speech as subdataset
    datalad install -d . -s juseless.inm7.de:/data/group/psyinf/studyforrest-speechannotation inputs/studyforrest-speechannotation
    # download the annotation as TSV-file (BIDS)
    datalad get inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv

### segmenting of continuous annotations
    # segment the annotation of cuts & locations timings of the audio-visual movie segments
    datalad run \
    -i inputs/studyforrest-data-annotations/researchcut/locations.tsv \
    -o events/segments \
    ./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
    '{inputs}' \
    avmovie avmovie \
    '{outputs}'

    # segment the annotation of speech using timings of the audio-description segments
    datalad run \
    -i inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv \
    -o events/segments \
    ./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
    '{inputs}' \
    aomovie aomovie \
    '{outputs}'

    # for control contrasts, segment the speech annotation using timings of the audio-visual movie segments
    datalad run \
    -i inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv \
    -o events/segments \
    ./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
    '{inputs}' \
    avmovie avmovie \
    '{outputs}'

    # for control contrasts, segment the location annotation using timings of the audio-description segments
    datalad run \
    -i inputs/studyforrest-data-annotations/researchcut/locations.tsv \
    -o events/segments \
    ./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
    '{inputs}' \
    aomovie aomovie \
    '{outputs}'

### manual addition of confound annotations and a script that gets the annotation in shape for the subsequent FEAT analyses
    # add low-level confound files of audio-visual movie manually & save (folder "avconfounds")
    datalad save -m 'add low-level confound files for audio-visual movie to /events/segments'
    # add low-level confound files of audio-description manually & save (folder "aoconfounds")
    datalad save -m 'add low-level confound files for audio-description to /events/segments'

### convert confound annotations into FEAT onset files
    # add script code/confounds2onsets.py
    datalad save -m 'add script that converts & copies confound files to onsets directories'
    # perform the conversion considering the directories of corresponding fMRI runs and
    # rename according to conventions used in FSL-design files
    datalad run \
    -i events/segments \
    -o events/onsets \
    ./code/confounds2onsets.py -i '{inputs}' -o '{outputs}'

### create FEAT onsets files from the segmented annotation of cuts & locations
    # add the script that performs the conversion
    datalad save -m 'add script that creates event files for FSL from the segmented location annotation'

    # create event onset files from segmented location annotation (timings of audio-visual movie)
    datalad run \
    -m "create the event files with movie timing" \
    -i events/segments/avmovie \
    -o events/onsets \
    ./code/locationsanno2onsets.py \
    -ind '{inputs}' \
    -inp 'locations_run-?_events.tsv' \
    -outd '{outputs}'

    # create event onset files from segmented location annotation (timings of audio-description)
    datalad run \
    -m "create the event files with audio-track timing" \
    -i events/segments/aomovie \
    -o events/onsets \
    ./code/locationsanno2onsets.py \
    -ind '{inputs}' \
    -inp 'locations_run-?_events.tsv' \
    -outd '{outputs}'

### create FEAT onsets files from the segmented annotation of speech
    # add the script that performs the conversion
    datalad save -m 'add script that creates event files for FSL from the segmented speech annotation'

    # create event onset files from segmented speech annotation (timings of audio-visual movie)
    datalad run \
    -i events/segments/avmovie \
    -o events/onsets \
    ./code/speechanno2onsets.py \
    -ind '{inputs}' \
    -inp 'fg_rscut_ad_ger_speech_tagged_run-*.tsv' \
    -outd '{outputs}'

    # create event onset files from segmented speech annotation (timings of audio-description)
    datalad run \
    -i events/segments/aomovie \
    -o events/onsets \
    ./code/speechanno2onsets.py \
    -ind '{inputs}' \
    -inp 'fg_rscut_ad_ger_speech_tagged_run-*.tsv' \
    -outd '{outputs}'

### copy FEAT event files to folders of individual subjects

    # manually add the script that creates directories & handles the copying
    datalad save -m 'add script that creates subject directories and copies FSL event files  into it'

    # create subjects folders & copy events with timing of the audio-visual movie
    datalad run \
    -m "create subject folders & copy event files to it" \
    ./code/onsets2subfolders.py \
    -fmri 'inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-aomovie_run-1_bold.nii.gz' \
    -onsets 'events/onsets/avmovie/run-?/*.txt' \
    -o './'

    # copy events with timing of the audio-description 
    datalad run \
    -m "copy event files with audio-description with movie timings to subject folders" \
    ./code/onsets2subfolders.py \
    -fmri 'inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-aomovie_run- 1_bold.nii.gz' \
    -onsets 'events/onsets/aomovie/run-?/*.txt' \
    -o './'

### manually add the templates of FEAT design files

    # manually add the script that creates first level individual design files from template
    datalad save -m 'add python script that creates individual (1st level) design files from templates'

    # analyses in group space, level 1-3 (e.g. 1st-lvl_movie-ppa-grp.fsf, 2nd-lvl_movie-ppa-grp.fsf, 3rd-lvl_movie-ppa-grp-1.fsf)
    # both steps include adding the bash scripts that take 2nd level templates as input and create design-files in individual directories 
    # (e.g. generate_2nd-lvl-design_movie-ppa-grp.sh)
    datalad save -m 'add FSL design files (lvl 1-3) for movie (group)'
    datalad save -m 'add FSL design files (lvl 1-3) for audio (group)'

    # analyses in subject space, level 1-2 (e.g. 1st-lvl_movie-ppa-ind.fsf, 2nd-lvl_movie-ppa-ind.fsf)
    # both steps include adding the bash scripts that take 2nd level templates as input and create design-files in individual directories 
    # (e.g. generate_2nd-lvl-design_movie-ppa-ind.sh)
    datalad save -m 'add FSL design files (lvl 1-2) for movie (individuals)'
    datalad save -m 'add FSL design files (lvl 1-2) for audio (individuals)'

### from templates, create FEAT design files for individual subjects

    # movie, group space, first level
    datalad run \
    -m 'for movie analysis (group), create individual (1st level) design files from template' \
    code/generate_1st-lvl-design.py \
    -fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold.nii.gz' \
    -design 'code/1st-lvl_movie-ppa-grp.fsf'

    # movie, group space, second level
    datalad run \
    -m "for movie analysis (group), generate individual 2nd lvl design files from template" \
    "./code/generate_2nd-lvl-design_movie-ppa-grp.sh"

    # audio-description, group space, first level
    datalad run \
    -m 'for audio analysis (group), create individual 1st level design files from template' \
    code/generate_1st-lvl-design.py \
    -fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-aomovie_run-1_bold.nii.gz' \
    -design 'code/1st-lvl_audio-ppa-grp.fsf'

    # audio-description, group space, second level
    datalad run \
    -m "for audio analysis (group), generate individual 2nd lvl design files from template" \
    "./code/generate_2nd-lvl-design_audio-ppa-grp.sh"

    # movie, subject space, first level
    datalad run \
    -m 'for movie analysis (individuals), create individual 1st level design files from template' \
    code/generate_1st-lvl-design.py \
    -fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold.nii.gz' \
    -design 'code/1st-lvl_movie-ppa-ind.fsf'

    # movie, subject space, second level
    datalad run \
    -m "for movie analysis (individuals), generate individual 2nd lvl design files from template" \
    "./code/generate_2nd-lvl-design_movie-ppa-ind.sh"

    # audio-description, subject space, first level
    datalad run \
    -m 'for audio analysis (individuals), create individual 1st level design files from template' \
    code/generate_1st-lvl-design.py \
    -fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-aomovie_run-1_bold.nii.gz' \
    -design 'code/1st-lvl_audio-ppa-ind.fsf'

    # audio-description, subject space, second level
    datalad run \
    -m "for audio analysis (individuals), generate individual 2nd lvl design files from template" \
    "./code/generate_2nd-lvl-design_audio-ppa-ind.sh"

### manually add bash script that handles custom standard space templates & matrices for FEAT
    datalad save -m "add script that add templates & transformation matrices to 1st lvl result directories of Feat"

### run the analyses via condor_submit on a computer cluster & manually save results
    # add file "condor-commands-for-cm.txt" that contains the following commands to manually submit the subsequent analyses to HTCondor
    datalad save -m "add txt file with instructions for manually starting Condor Jobs from CM"

    # movie, group space, first level
    condor_submit code/compute_1st-lvl_movie-ppa-grp.submit
    # in .feat-directories, create templates and transforms
    ./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 grpbold3Tp2 sub-*/run-?_movie-ppa-grp.feat
    # movie, group space, second level
    condor_submit code/compute_2nd-lvl_movie-ppa-grp.submit
    # movie, group space, third level
    condor_submit code/compute_3rd-lvl_movie-ppa-grp.submit
    # save results of first to third level
    datalad save -m '3rd lvl results movie (group)'

    # audio-description, group space, first level
    condor_submit code/compute_1st-lvl_audio-ppa-grp.submit
    # in .feat-directories, create templates and transforms
    ./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 grpbold3Tp2 sub-*/run-?_audio-ppa-grp.feat
    # audio-description, group space, second level
    condor_submit code/compute_2nd-lvl_audio-ppa-grp.submit    
    # audio-description, group space, third level
    condor_submit code/compute_3rd-lvl_audio-ppa-grp.submit
    # save results of first to third level
    datalad save -m '3rd lvl results audio (group)'

    # movie, subject space, first level
    condor_submit code/compute_1st-lvl_movie-ppa-ind.submit
    # in .feat-directories, create templates and transforms
    ./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_movie-ppa-ind.feat
    # movie, subject space, second level
    condor_submit code/compute_2nd-lvl_audio-ppa-ind.submit
    # save results of first to second level
    datalad save -m '2nd lvl results audio (individuals)'

    # audio-description, subjects space, first level
    condor_submit code/compute_1st-lvl_audio-ppa-ind.submit
    # in .feat-directories, create templates and transforms
    ./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_audio-ppa-ind.feat
    # audio-description, subject space, second level
    condor_submit code/compute_2nd-lvl_audio-ppa-ind.submit
    # audio-description, group space, third level
    datalad save -m '2nd lvl results audio (individuals)'   
    # save results of first to third level

### comment: some cleaning that we did

In order to limit the dataset to an appropriate size, we dropped some files that were generated by FEAT during an intermediate stage of the first level analyses. More specifically, we dropped *filtered_func_data.nii.gz* (4D fMRI data after all filtering) and *res4d.nii.gz* (residual noise images) for every subject and run using the following commands 

    git annex unused
    git annex dropunused all --force
    datalad drop --nocheck sub*/*.feat/filtered_func_data.nii.gz
    datalad drop --nocheck sub*/*.feat/stats/res4d.nii.gz
    git rm sub-*/run-*.feat/filtered_func_data.nii.gz
    git rm sub-*/run-*.feat/stats/res4d.nii.gz
    
If necessary, the files can be obtained by rerunning the corresponding first level analysis.