Processing of visual and non-visual naturalistic spatial information in the "parahippocampal place area"

Christian Olaf Häusler d12c48e5be correct citation of paper		9 months ago
.datalad	9e983e8f7f [DATALAD] new dataset	4 years ago
3rd-lvl	f8e5cb37ae 3rd lvl results audio (group)	4 years ago
code	761f1d1800 adjusted z threshold for audio (ind) to match threshold for movie (ind)	4 years ago
condor_logs	9bcdf2b8f1 2nd lvl results audio (individuals)	4 years ago
events	9b7d385646 [DATALAD RUNCMD] create the event files with audio-track timing	4 years ago
inputs	ee670937a6 [DATALAD] Recorded changes	4 years ago
sub-01	d119d99575 rm broken links	2 years ago
sub-02	d119d99575 rm broken links	2 years ago
sub-03	d119d99575 rm broken links	2 years ago
sub-04	d119d99575 rm broken links	2 years ago
sub-05	d119d99575 rm broken links	2 years ago
sub-06	d119d99575 rm broken links	2 years ago
sub-09	d119d99575 rm broken links	2 years ago
sub-14	d119d99575 rm broken links	2 years ago
sub-15	d119d99575 rm broken links	2 years ago
sub-16	d119d99575 rm broken links	2 years ago
sub-17	d119d99575 rm broken links	2 years ago
sub-18	d119d99575 rm broken links	2 years ago
sub-19	d119d99575 rm broken links	2 years ago
sub-20	d119d99575 rm broken links	2 years ago
.gitattributes	91729c1b4d Apply YODA dataset setup	4 years ago
.gitmodules	ee670937a6 [DATALAD] Recorded changes	4 years ago
CHANGELOG.md	1330dbfe82 Apply YODA dataset setup	4 years ago
LICENSE	6df7618cd8 Declare CC-BY license	3 years ago
README.md	d12c48e5be correct citation of paper	9 months ago
datacite.yml	ffdd5007ff change to CC BY 4.0	2 years ago

Processing of visual and non-visual naturalistic spatial information in the "parahippocampal place area": from raw data to results

This repository contains the raw data and all code to generate the results in Häusler, C. O., Eickhoff, S. B., & Hanke, M. (2022). Processing of visual and non-visual naturalistic spatial information in the "parahippocampal place area". Scientific Data, 9(1). doi: 10.1038/s41597-022-01250-4.

If you have never used DataLad before, please read the section on DataLad datasets below.

DataLad datasets and how to use them

This repository is a DataLad dataset. It allows fine-grained data access up to the level of single files. In order to use this repository for data retrieval, DataLad is required. It is a free and open source command line tool, available for all major operating systems, and builds up on Git and git-annex to allow sharing, synchronizing, and version controlling collections of large files. You can find information on how to install DataLad at handbook.datalad.org/en/latest/intro/installation.html.

Get the dataset

A DataLad dataset can be cloned by running

datalad clone <url>

Once a dataset is cloned, it is a light-weight directory on your local machine. At this point, it contains only small metadata and information on the identity of the files in the dataset, but not actual content of the (sometimes large) data files.

Retrieve dataset content

After cloning a dataset, you can retrieve file contents by running

datalad get <path/to/directory/or/file>

This command will trigger a download of the files, directories, or subdatasets you have specified.

DataLad datasets can contain other datasets, so called subdatasets. If you clone the top-level dataset, subdatasets do not yet contain metadata and information on the identity of files, but appear to be empty directories. In order to retrieve file availability metadata in subdatasets, run

datalad get -n <path/to/subdataset>

Afterwards, you can browse the retrieved metadata to find out about subdataset contents, and retrieve individual files with datalad get. If you use datalad get <path/to/subdataset>, all contents of the subdataset will be downloaded at once.

Stay up-to-date

DataLad datasets can be updated. The command datalad update will fetch updates and store them on a different branch (by default remotes/origin/master). Running

datalad update --merge

will pull available updates and integrate them in one go.

More information

More information on DataLad and how to use it can be found in the DataLad Handbook at handbook.datalad.org. The chapter "DataLad datasets" can help you to familiarize yourself with the concept of a dataset.

Dataset structure

All inputs (i.e. building blocks from other sources) are located in inputs/.
All custom code and (templates of) FEAT design files are located in code/.
Segmented annotations are located in events/segments/.
Templates of event files for FEAT created from segmented annotations are located in events/onsets/.
Individual subject folders (sub-*/) contain individualized FEAT setup files (i.e. GLM design files) for the first level analyses that estimated parameters for each fMRI run of each subject seperately (e.g. 'run-1_1st_audio-ppa-grp.fsf') and for the second level analyses that averaged parameter estimates across runs of every subject (e.g. 2nd-lvl_audio-ppa-grp.fsf).
Individual subject folders (sub-*) also contain the results of the first level (e.g. run-1_audio-ppa-grp.feat) and second level analyses (e.g. 2nd-lvl_audio-ppa-grp.gfeat).
Results of first level analyses (e.g. sub-01/run-1_audio-ppa-grp.feat) contain thresholded (thresh_zstat*.nii.gz) and unthresholded (stats/zstat*.nii.gz) z-maps for every contrast (contrast of parameter estimates; COPE), parameter estimates for every regressor and temporal derivative (stats/pe*.nii.gz), and the anatomical image that was used as standard space (reg/standard.nii.gz).
Results of second level analyses (e.g. sub-01/2nd-lvl_audio-ppa-grp.gfeat) contain thresholded (cope*.feat/thresh_zstat1.nii.gz) and unthresholded (cope*.feat/stats/zstat1.nii.gz) z-maps for every contrast.
Results of third level GLM analyses that averaged parameter estimates across subjects are located in 3rd-lvl/.
Each folder (e.g. 3rd-lvl/audio-ppa_c1_z3.4.gfeat) contains the thresholded (cope1.feat/thresh_zstat1.nii.gz) and unthresholded (cope1.feat/stats/zstat1.nii.gz) z-maps of the corresponding contrast.
A detailed descripten of FEAT (output) directories can be found in the FEAT UserGuide.

Cookbook -- How this dataset was assembled

install subdatasets and get the raw data

# install subdataset that provides motion corrected fMRI data from the audio-visual movie and its audio-description
datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-aligned inputs/studyforrest-data-aligned
# download 4D fMRI data (and motion correction parameters of the movie) 
datalad get inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-a?movie_run-?_bold*.*

# install subdataset that provides the original 7 Tesla data to get the motion correction parameters of the audio-descriptio
datalad install -d . -s juseless.inm7.de:/data/project/studyforrest/collection/phase1 inputs/phase1
datalad get inputs/phase1/sub???/BOLD/task001_run00?/bold_dico_moco.txt

# install subdataset "template & transforms", and download the relevant images
datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms inputs/studyforrest-data-templatetransforms
datalad get inputs/studyforrest-data-templatetransforms/sub-*/bold3Tp2/
datalad get inputs/studyforrest-data-templatetransforms/templates/*

# install subdataset "studyforrest-data-annotations" that contains the annotation of cuts & locations as subdataset
# and "code/researchcut2segments.py" that we need to segment the (continuous) annotations
datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-annotations inputs/studyforrest-data-annotations

# install the annotation of speech as subdataset
datalad install -d . -s juseless.inm7.de:/data/group/psyinf/studyforrest-speechannotation inputs/studyforrest-speechannotation
# download the annotation as TSV-file (BIDS)
datalad get inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv

segmenting of continuous annotations

# segment the annotation of cuts & locations timings of the audio-visual movie segments
datalad run \
-i inputs/studyforrest-data-annotations/researchcut/locations.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
avmovie avmovie \
'{outputs}'

# segment the annotation of speech using timings of the audio-description segments
datalad run \
-i inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
aomovie aomovie \
'{outputs}'

# for control contrasts, segment the speech annotation using timings of the audio-visual movie segments
datalad run \
-i inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
avmovie avmovie \
'{outputs}'

# for control contrasts, segment the location annotation using timings of the audio-description segments
datalad run \
-i inputs/studyforrest-data-annotations/researchcut/locations.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
aomovie aomovie \
'{outputs}'

manual addition of confound annotations and a script that gets the annotation in shape for the subsequent FEAT analyses

# add low-level confound files of audio-visual movie manually & save (folder "avconfounds")
datalad save -m 'add low-level confound files for audio-visual movie to /events/segments'
# add low-level confound files of audio-description manually & save (folder "aoconfounds")
datalad save -m 'add low-level confound files for audio-description to /events/segments'

convert confound annotations into FEAT onset files

# add script code/confounds2onsets.py
datalad save -m 'add script that converts & copies confound files to onsets directories'
# perform the conversion considering the directories of corresponding fMRI runs and
# rename according to conventions used in FSL-design files
datalad run \
-i events/segments \
-o events/onsets \
./code/confounds2onsets.py -i '{inputs}' -o '{outputs}'

create FEAT onsets files from the segmented annotation of cuts & locations

# add the script that performs the conversion
datalad save -m 'add script that creates event files for FSL from the segmented location annotation'

# create event onset files from segmented location annotation (timings of audio-visual movie)
datalad run \
-m "create the event files with movie timing" \
-i events/segments/avmovie \
-o events/onsets \
./code/locationsanno2onsets.py \
-ind '{inputs}' \
-inp 'locations_run-?_events.tsv' \
-outd '{outputs}'

# create event onset files from segmented location annotation (timings of audio-description)
datalad run \
-m "create the event files with audio-track timing" \
-i events/segments/aomovie \
-o events/onsets \
./code/locationsanno2onsets.py \
-ind '{inputs}' \
-inp 'locations_run-?_events.tsv' \
-outd '{outputs}'

create FEAT onsets files from the segmented annotation of speech

# add the script that performs the conversion
datalad save -m 'add script that creates event files for FSL from the segmented speech annotation'

# create event onset files from segmented speech annotation (timings of audio-visual movie)
datalad run \
-i events/segments/avmovie \
-o events/onsets \
./code/speechanno2onsets.py \
-ind '{inputs}' \
-inp 'fg_rscut_ad_ger_speech_tagged_run-*.tsv' \
-outd '{outputs}'

# create event onset files from segmented speech annotation (timings of audio-description)
datalad run \
-i events/segments/aomovie \
-o events/onsets \
./code/speechanno2onsets.py \
-ind '{inputs}' \
-inp 'fg_rscut_ad_ger_speech_tagged_run-*.tsv' \
-outd '{outputs}'

copy FEAT event files to folders of individual subjects

# manually add the script that creates directories & handles the copying
datalad save -m 'add script that creates subject directories and copies FSL event files  into it'

# create subjects folders & copy events with timing of the audio-visual movie
datalad run \
-m "create subject folders & copy event files to it" \
./code/onsets2subfolders.py \
-fmri 'inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-aomovie_run-1_bold.nii.gz' \
-onsets 'events/onsets/avmovie/run-?/*.txt' \
-o './'

# copy events with timing of the audio-description 
datalad run \
-m "copy event files with audio-description with movie timings to subject folders" \
./code/onsets2subfolders.py \
-fmri 'inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-aomovie_run- 1_bold.nii.gz' \
-onsets 'events/onsets/aomovie/run-?/*.txt' \
-o './'

manually add the templates of FEAT design files

# manually add the script that creates first level individual design files from template
datalad save -m 'add python script that creates individual (1st level) design files from templates'

# analyses in group space, level 1-3 (e.g. 1st-lvl_movie-ppa-grp.fsf, 2nd-lvl_movie-ppa-grp.fsf, 3rd-lvl_movie-ppa-grp-1.fsf)
# both steps include adding the bash scripts that take 2nd level templates as input and create design-files in individual directories 
# (e.g. generate_2nd-lvl-design_movie-ppa-grp.sh)
datalad save -m 'add FSL design files (lvl 1-3) for movie (group)'
datalad save -m 'add FSL design files (lvl 1-3) for audio (group)'

# analyses in subject space, level 1-2 (e.g. 1st-lvl_movie-ppa-ind.fsf, 2nd-lvl_movie-ppa-ind.fsf)
# both steps include adding the bash scripts that take 2nd level templates as input and create design-files in individual directories 
# (e.g. generate_2nd-lvl-design_movie-ppa-ind.sh)
datalad save -m 'add FSL design files (lvl 1-2) for movie (individuals)'
datalad save -m 'add FSL design files (lvl 1-2) for audio (individuals)'

from templates, create FEAT design files for individual subjects

# movie, group space, first level
datalad run \
-m 'for movie analysis (group), create individual (1st level) design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_movie-ppa-grp.fsf'

# movie, group space, second level
datalad run \
-m "for movie analysis (group), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_movie-ppa-grp.sh"

# audio-description, group space, first level
datalad run \
-m 'for audio analysis (group), create individual 1st level design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-aomovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_audio-ppa-grp.fsf'

# audio-description, group space, second level
datalad run \
-m "for audio analysis (group), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_audio-ppa-grp.sh"

# movie, subject space, first level
datalad run \
-m 'for movie analysis (individuals), create individual 1st level design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_movie-ppa-ind.fsf'

# movie, subject space, second level
datalad run \
-m "for movie analysis (individuals), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_movie-ppa-ind.sh"

# audio-description, subject space, first level
datalad run \
-m 'for audio analysis (individuals), create individual 1st level design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-aomovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_audio-ppa-ind.fsf'

# audio-description, subject space, second level
datalad run \
-m "for audio analysis (individuals), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_audio-ppa-ind.sh"

manually add bash script that handles custom standard space templates & matrices for FEAT

datalad save -m "add script that add templates & transformation matrices to 1st lvl result directories of Feat"

run the analyses via condor_submit on a computer cluster & manually save results

# add file "condor-commands-for-cm.txt" that contains the following commands to manually submit the subsequent analyses to HTCondor
datalad save -m "add txt file with instructions for manually starting Condor Jobs from CM"

# movie, group space, first level
condor_submit code/compute_1st-lvl_movie-ppa-grp.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 grpbold3Tp2 sub-*/run-?_movie-ppa-grp.feat
# movie, group space, second level
condor_submit code/compute_2nd-lvl_movie-ppa-grp.submit
# movie, group space, third level
condor_submit code/compute_3rd-lvl_movie-ppa-grp.submit
# save results of first to third level
datalad save -m '3rd lvl results movie (group)'

# audio-description, group space, first level
condor_submit code/compute_1st-lvl_audio-ppa-grp.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 grpbold3Tp2 sub-*/run-?_audio-ppa-grp.feat
# audio-description, group space, second level
condor_submit code/compute_2nd-lvl_audio-ppa-grp.submit    
# audio-description, group space, third level
condor_submit code/compute_3rd-lvl_audio-ppa-grp.submit
# save results of first to third level
datalad save -m '3rd lvl results audio (group)'

# movie, subject space, first level
condor_submit code/compute_1st-lvl_movie-ppa-ind.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_movie-ppa-ind.feat
# movie, subject space, second level
condor_submit code/compute_2nd-lvl_audio-ppa-ind.submit
# save results of first to second level
datalad save -m '2nd lvl results audio (individuals)'

# audio-description, subjects space, first level
condor_submit code/compute_1st-lvl_audio-ppa-ind.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_audio-ppa-ind.feat
# audio-description, subject space, second level
condor_submit code/compute_2nd-lvl_audio-ppa-ind.submit
# audio-description, group space, third level
datalad save -m '2nd lvl results audio (individuals)'   
# save results of first to third level

comment: some cleaning that we did

In order to limit the dataset to an appropriate size, we dropped some files that were generated by FEAT during an intermediate stage of the first level analyses. More specifically, we dropped filtered_func_data.nii.gz (4D fMRI data after all filtering) and res4d.nii.gz (residual noise images) for every subject and run using the following commands

git annex unused
git annex dropunused all --force
datalad drop --nocheck sub*/*.feat/filtered_func_data.nii.gz
datalad drop --nocheck sub*/*.feat/stats/res4d.nii.gz
git rm sub-*/run-*.feat/filtered_func_data.nii.gz
git rm sub-*/run-*.feat/stats/res4d.nii.gz

If necessary, the files can be obtained by rerunning the corresponding first level analysis.

datacite.yml
Title	Processing of visual and non-visual naturalistic spatial information in the "parahippocampal place area": from raw data to results
Authors	Häusler,Christian O.;Psychoinformatics Lab, Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany; Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany;ORCID:0000-0002-0936-317X Eickhoff,Simon B.;Psychoinformatics Lab, Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany; Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany;ORCID:0000-0001-6363-2759 Hanke,Michael;Psychoinformatics Lab, Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany; Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany;ORCID:0000-0001-6398-6370
Description	This repository contains the fMRI data, annotations, analysis scripts to generate the results, and results in Häusler C.O. & Hanke M. (submitted) as Datalad datasets (https://github.com/datalad).
License	Creative Commons Attribution 4.0 International Public License (https://creativecommons.org/licenses/by/4.0/)
References	Häusler, C. O. & Hanke, M. (2016). Dataset 1 in: An annotation of cuts, depicted locations, and temporal progression in the motion picture "Forrest Gump". F1000Research. [https://doi.org/10.5256/f1000research.9536.d134823] (IsReferencedBy) Häusler, C. O., & Hanke, M. (2020). studyforrest-paper-speechannotation. OSF. [https://doi.org/10.17605/OSF.IO/GFRME] (IsReferencedBy) Hanke, M., Baumgartner, F.J., Ibe, P., Kaule, F.K., Pollmann; S., Speck, O., Zinke, W., & Stadler, J. (2014). A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie. OpenfMRI. [https://legacy.openfmri.org/dataset/ds000113] (IsReferencedBy) Hanke, M., Kottke, D., Iacovella, V., Hoffmann, M.B., Sengupta, A., Kaule, F.R., Häusler, C., Guntupalli, S.J., Baumgartner, F.J., & Stadler, J. (2016). Simultaneous fMRI/eyetracking while movie watching, plus visual localizers. OpenfMRI. [https://legacy.openfmri.org/dataset/ds000113d] (IsReferencedBy) Hanke, M. (2016). studyforrest.org Dataset. Pre-aligned MRI data. GitHub. [https://github.com/psychoinformatics-de/studyforrest-data-aligned] (IsReferencedBy) Sengupta, A., Kaule, F.R., Guntupalli, S.J., Hoffmann, M.B., Häusler, C., Sadler, J., & Hanke, M. (2016). studyforrest.org Dataset. Localization of higher-level visual ROIs. GitHub. [https://github.com/psychoinformatics-de/studyforrest-data-visualrois] (IsReferencedBy) Hanke, M. (2016). studyforrest.org Dataset. Reconstruction of cortical surfaces. GitHub. [https://github.com/psychoinformatics-de/studyforrest-data-freesurfer] (IsReferencedBy)
Funding	BMBF, 01GQ1112 NSF, 1129855
Keywords	fMRI naturalistic stimulus spatial perception vision language speech narrative studyforrest datalad
Resource Type	Dataset

README.md