README.md 17 KB

Processing of visual and non-visual naturalistic spatial information in the "parahippocampal place area": from raw data to results

made-with-datalad

This repository contains the raw data and all code to generate the results in Häusler C.O. & Hanke M. (submitted).

If you have never used DataLad before, please read the section on DataLad datasets below.

DataLad datasets and how to use them

This repository is a DataLad dataset. It allows fine-grained data access up to the level of single files. In order to use this repository for data retrieval, DataLad is required. It is a free and open source command line tool, available for all major operating systems, and builds up on Git and git-annex to allow sharing, synchronizing, and version controlling collections of large files. You can find information on how to install DataLad at handbook.datalad.org/en/latest/intro/installation.html.

Get the dataset

A DataLad dataset can be cloned by running

datalad clone <url>

Once a dataset is cloned, it is a light-weight directory on your local machine. At this point, it contains only small metadata and information on the identity of the files in the dataset, but not actual content of the (sometimes large) data files.

Retrieve dataset content

After cloning a dataset, you can retrieve file contents by running

datalad get <path/to/directory/or/file>

This command will trigger a download of the files, directories, or subdatasets you have specified.

DataLad datasets can contain other datasets, so called subdatasets. If you clone the top-level dataset, subdatasets do not yet contain metadata and information on the identity of files, but appear to be empty directories. In order to retrieve file availability metadata in subdatasets, run

datalad get -n <path/to/subdataset>

Afterwards, you can browse the retrieved metadata to find out about subdataset contents, and retrieve individual files with datalad get. If you use datalad get <path/to/subdataset>, all contents of the subdataset will be downloaded at once.

Stay up-to-date

DataLad datasets can be updated. The command datalad update will fetch updates and store them on a different branch (by default remotes/origin/master). Running

datalad update --merge

will pull available updates and integrate them in one go.

More information

More information on DataLad and how to use it can be found in the DataLad Handbook at handbook.datalad.org. The chapter "DataLad datasets" can help you to familiarize yourself with the concept of a dataset.

Dataset structure

  • All inputs (i.e. building blocks from other sources) are located in inputs/.
  • All custom code and (templates of) FEAT design files are located in code/.
  • Segmented annotations are located in events/segments
  • Templates of event files for FEAT created from segmented annotations are located in events/onsets.
  • Individualized design files, and results of the first and second level GLM analyses are located in sub-*.
  • Results of third level (group) GLM analyses are located in 3rd-lvl.

Cookbook -- How this dataset was assembled

install subdatasets and get the raw data

# install subdataset that provides motion corrected fMRI data from the audio-visual movie and its audio-description
datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-aligned inputs/studyforrest-data-aligned
# download 4D fMRI data (and motion correction parameters of the movie) 
datalad get inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-a?movie_run-?_bold*.*

# install subdataset that provides the original 7 Tesla data to get the motion correction parameters of the audio-descriptio
datalad install -d . -s juseless.inm7.de:/data/project/studyforrest/collection/phase1 inputs/phase1
datalad get inputs/phase1/sub???/BOLD/task001_run00?/bold_dico_moco.txt

# install subdataset "template & transforms", and download the relevant images
datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms inputs/studyforrest-data-templatetransforms
datalad get inputs/studyforrest-data-templatetransforms/sub-*/bold3Tp2/
datalad get inputs/studyforrest-data-templatetransforms/templates/*

# install subdataset "studyforrest-data-annotations" that contains the annotation of cuts & locations as subdataset
# and "code/researchcut2segments.py" that we need to segment the (continuous) annotations
datalad install -d . -s https://github.com/psychoinformatics-de/studyforrest-data-annotations inputs/studyforrest-data-annotations

# install the annotation of speech as subdataset
datalad install -d . -s juseless.inm7.de:/data/group/psyinf/studyforrest-speechannotation inputs/studyforrest-speechannotation
# download the annotation as TSV-file (BIDS)
datalad get inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv

segmenting of continuous annotations

# segment the annotation of cuts & locations timings of the audio-visual movie segments
datalad run \
-i inputs/studyforrest-data-annotations/researchcut/locations.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
avmovie avmovie \
'{outputs}'

# segment the annotation of speech using timings of the audio-description segments
datalad run \
-i inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
aomovie aomovie \
'{outputs}'

# for control contrasts, segment the speech annotation using timings of the audio-visual movie segments
datalad run \
-i inputs/studyforrest-speechannotation/annotation/fg_rscut_ad_ger_speech_tagged.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
avmovie avmovie \
'{outputs}'

# for control contrasts, segment the location annotation using timings of the audio-description segments
datalad run \
-i inputs/studyforrest-data-annotations/researchcut/locations.tsv \
-o events/segments \
./inputs/studyforrest-data-annotations/code/researchcut2segments.py \
'{inputs}' \
aomovie aomovie \
'{outputs}'

manual addition of confound annotations and a script that gets the annotation in shape for the subsequent FEAT analyses

# add low-level confound files of audio-visual movie manually & save (folder "avconfounds")
datalad save -m 'add low-level confound files for audio-visual movie to /events/segments'
# add low-level confound files of audio-description manually & save (folder "aoconfounds")
datalad save -m 'add low-level confound files for audio-description to /events/segments'

convert confound annotations into FEAT onset files

# add script code/confounds2onsets.py
datalad save -m 'add script that converts & copies confound files to onsets directories'
# perform the conversion considering the directories of corresponding fMRI runs and
# rename according to conventions used in FSL-design files
datalad run \
-i events/segments \
-o events/onsets \
./code/confounds2onsets.py -i '{inputs}' -o '{outputs}'

create FEAT onsets files from the segmented annotation of cuts & locations

# add the script that performs the conversion
datalad save -m 'add script that creates event files for FSL from the segmented location annotation'

# create event onset files from segmented location annotation (timings of audio-visual movie)
datalad run \
-m "create the event files with movie timing" \
-i events/segments/avmovie \
-o events/onsets \
./code/locationsanno2onsets.py \
-ind '{inputs}' \
-inp 'locations_run-?_events.tsv' \
-outd '{outputs}'

# create event onset files from segmented location annotation (timings of audio-description)
datalad run \
-m "create the event files with audio-track timing" \
-i events/segments/aomovie \
-o events/onsets \
./code/locationsanno2onsets.py \
-ind '{inputs}' \
-inp 'locations_run-?_events.tsv' \
-outd '{outputs}'

create FEAT onsets files from the segmented annotation of speech

# add the script that performs the conversion
datalad save -m 'add script that creates event files for FSL from the segmented speech annotation'

# create event onset files from segmented speech annotation (timings of audio-visual movie)
datalad run \
-i events/segments/avmovie \
-o events/onsets \
./code/speechanno2onsets.py \
-ind '{inputs}' \
-inp 'fg_rscut_ad_ger_speech_tagged_run-*.tsv' \
-outd '{outputs}'

# create event onset files from segmented speech annotation (timings of audio-description)
datalad run \
-i events/segments/aomovie \
-o events/onsets \
./code/speechanno2onsets.py \
-ind '{inputs}' \
-inp 'fg_rscut_ad_ger_speech_tagged_run-*.tsv' \
-outd '{outputs}'

copy FEAT event files to folders of individual subjects

# manually add the script that creates directories & handles the copying
datalad save -m 'add script that creates subject directories and copies FSL event files  into it'

# create subjects folders & copy events with timing of the audio-visual movie
datalad run \
-m "create subject folders & copy event files to it" \
./code/onsets2subfolders.py \
-fmri 'inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-aomovie_run-1_bold.nii.gz' \
-onsets 'events/onsets/avmovie/run-?/*.txt' \
-o './'

# copy events with timing of the audio-description 
datalad run \
-m "copy event files with audio-description with movie timings to subject folders" \
./code/onsets2subfolders.py \
-fmri 'inputs/studyforrest-data-aligned/sub-??/in_bold3Tp2/sub-??_task-aomovie_run- 1_bold.nii.gz' \
-onsets 'events/onsets/aomovie/run-?/*.txt' \
-o './'

manually add the templates of FEAT design files

# manually add the script that creates first level individual design files from template
datalad save -m 'add python script that creates individual (1st level) design files from templates'

# analyses in group space, level 1-3 (e.g. 1st-lvl_movie-ppa-grp.fsf, 2nd-lvl_movie-ppa-grp.fsf, 3rd-lvl_movie-ppa-grp-1.fsf)
# both steps include adding the bash scripts that take 2nd level templates as input and create design-files in individual directories 
# (e.g. generate_2nd-lvl-design_movie-ppa-grp.sh)
datalad save -m 'add FSL design files (lvl 1-3) for movie (group)'
datalad save -m 'add FSL design files (lvl 1-3) for audio (group)'

# analyses in subject space, level 1-2 (e.g. 1st-lvl_movie-ppa-ind.fsf, 2nd-lvl_movie-ppa-ind.fsf)
# both steps include adding the bash scripts that take 2nd level templates as input and create design-files in individual directories 
# (e.g. generate_2nd-lvl-design_movie-ppa-ind.sh)
datalad save -m 'add FSL design files (lvl 1-2) for movie (individuals)'
datalad save -m 'add FSL design files (lvl 1-2) for audio (individuals)'

from templates, create FEAT design files for individual subjects

# movie, group space, first level
datalad run \
-m 'for movie analysis (group), create individual (1st level) design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_movie-ppa-grp.fsf'

# movie, group space, second level
datalad run \
-m "for movie analysis (group), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_movie-ppa-grp.sh"

# audio-description, group space, first level
datalad run \
-m 'for audio analysis (group), create individual 1st level design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-aomovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_audio-ppa-grp.fsf'

# audio-description, group space, second level
datalad run \
-m "for audio analysis (group), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_audio-ppa-grp.sh"

# movie, subject space, first level
datalad run \
-m 'for movie analysis (individuals), create individual 1st level design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_movie-ppa-ind.fsf'

# movie, subject space, second level
datalad run \
-m "for movie analysis (individuals), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_movie-ppa-ind.sh"

# audio-description, subject space, first level
datalad run \
-m 'for audio analysis (individuals), create individual 1st level design files from template' \
code/generate_1st-lvl-design.py \
-fmri 'inputs/studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-aomovie_run-1_bold.nii.gz' \
-design 'code/1st-lvl_audio-ppa-ind.fsf'

# audio-description, subject space, second level
datalad run \
-m "for audio analysis (individuals), generate individual 2nd lvl design files from template" \
"./code/generate_2nd-lvl-design_audio-ppa-ind.sh"

manually add bash script that handles custom standard space templates & matrices for FEAT

datalad save -m "add script that add templates & transformation matrices to 1st lvl result directories of Feat"

run the analyses via condor_submit on a computer cluster & manually save results

# add file "condor-commands-for-cm.txt" that contains the following commands to manually submit the subsequent analyses to HTCondor
datalad save -m "add txt file with instructions for manually starting Condor Jobs from CM"

# movie, group space, first level
condor_submit code/compute_1st-lvl_movie-ppa-grp.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 grpbold3Tp2 sub-*/run-?_movie-ppa-grp.feat
# movie, group space, second level
condor_submit code/compute_2nd-lvl_movie-ppa-grp.submit
# movie, group space, third level
condor_submit code/compute_3rd-lvl_movie-ppa-grp.submit
# save results of first to third level
datalad save -m '3rd lvl results movie (group)'

# audio-description, group space, first level
condor_submit code/compute_1st-lvl_audio-ppa-grp.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 grpbold3Tp2 sub-*/run-?_audio-ppa-grp.feat
# audio-description, group space, second level
condor_submit code/compute_2nd-lvl_audio-ppa-grp.submit    
# audio-description, group space, third level
condor_submit code/compute_3rd-lvl_audio-ppa-grp.submit
# save results of first to third level
datalad save -m '3rd lvl results audio (group)'

# movie, subject space, first level
condor_submit code/compute_1st-lvl_movie-ppa-ind.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_movie-ppa-ind.feat
# movie, subject space, second level
condor_submit code/compute_2nd-lvl_audio-ppa-ind.submit
# save results of first to second level
datalad save -m '2nd lvl results audio (individuals)'

# audio-description, subjects space, first level
condor_submit code/compute_1st-lvl_audio-ppa-ind.submit
# in .feat-directories, create templates and transforms
./code/reg2std4feat inputs/studyforrest-data-templatetransforms bold3Tp2 bold3Tp2 sub-*/run-?_audio-ppa-ind.feat
# audio-description, subject space, second level
condor_submit code/compute_2nd-lvl_audio-ppa-ind.submit
# audio-description, group space, third level
datalad save -m '2nd lvl results audio (individuals)'   
# save results of first to third level

comment: some cleaning that we did

git annex unused
git annex dropunused all --force
datalad drop --nocheck sub*/*.feat/filtered_func_data.nii.gz
datalad drop --nocheck sub*/*.feat/stats/res4d.nii.gz