lnnrtwttkhn
/
highspeed-bids


			
			
				
					
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226
							```{r, echo=FALSE, message=FALSE, include=FALSE}
if (!requireNamespace("pacman")) install.packages("pacman")
packages_cran <- c("here")
pacman::p_load(char = packages_cran)
if (basename(here::here()) == "highspeed"){
  path_root = here::here("highspeed-bids")
} else {
  path_root = here::here()
}
```

## MRI data to BIDS

### Overview

#### Data availability

The data is freely available from https://github.com/lnnrtwttkhn/highspeed-bids and https://gin.g-node.org/lnnrtwttkhn/highspeed-bids.

#### License

The dataset is licensed under Creative Commons Attribution-ShareAlike 4.0.
Please see https://creativecommons.org/licenses/by-sa/4.0/ for details.

### Step 1: Conversion of MRI data to the Brain Imaging Data Structure (BIDS) using HeuDiConv

#### Overview

After MRI data was acquired at the MRI scanner, we converted all data to adhere to the [Brain Imaging Data Structure (BIDS)](http://bids.neuroimaging.io/) standard.
Please see the [paper by Gorgoleski et al., 2016, *Scientific Data*](https://www.nature.com/articles/sdata201644) for details.
In short, BIDS is a community standard for organizing and describing MRI datasets.

#### Container: `heudiconv` container, version 0.6.0

We used [HeuDiConv](https://github.com/nipy/heudiconv), version 0.6.0, to convert our MRI DICOM data to the BIDS structure.
First, we created a Singularity container with HeuDiConv in our cluster environment at the Max Planck Institute for Human Development, Berlin Germany.
The Singularity container is separately available at [https://github.com/lnnrtwttkhn/tools](https://github.com/lnnrtwttkhn/tools) and was created using:

```bash
singularity pull docker://nipy/heudiconv:0.6.0
```

For the conversion of DICOM data acquired at the MRI scanner to BIDS-converted NIfTI-files the following scripts were used (these scripts can be found in the in the `code/heudiconv/` directory):

#### Mapping raw DICOMS to BIDS: `highspeed-heudiconv-heuristic.py`

`highspeed_heudiconv_heuristic.py` is a Python script, that creates a mapping between the DICOMS and the NIfTI-converted files in the BIDS structure.

```{python, echo=TRUE, code=readLines(file.path(path_root, "code", "heudiconv", "highspeed-heudiconv-heuristic.py")), eval=FALSE, python.reticulate=FALSE}
```

#### Changing participant IDs: `highspeed-heudiconv-anonymizer.py`

`highspeed-heudiconv-anonymizer.py` is an executable Python script, which is used in combination with the `--anon-cmd` flag of the `heudiconv` command that turns the original participant IDs (that we used when we ran the study in the lab) into consecutive zero-padded numbers (see [this thread on neurostars.org](https://neurostars.org/t/heudiconv-how-to-turn-subject-ids-into-consecutive-zero-padded-numbers-as-required-by-bids/2240) for details)

```{python, echo=TRUE, code=readLines(file.path(path_root, "code", "heudiconv", "highspeed-heudiconv-anonymizer.py")), eval=FALSE, python.reticulate=FALSE}
```

As a side note, the last step is not really necessary since zero-padded numbers are not required by the BIDS standard.

#### Running `heudiconv` on the cluster: `highspeed-heudiconv-cluster.sh`

`highspeed-heudiconv-cluster.sh` is a bash script that parallelizes the HeuDiConv command for each participant on the high-performance cluster of the Max Planck Institute for Human Development Berlin, Germany.

Note, that for privacy concerns we only saved the BIDS-converted data in the repo after running the defacing (see details below).

```{bash, echo=TRUE, code=readLines(file.path(path_root, "code", "heudiconv", "highspeed-heudiconv-cluster.sh")), eval=FALSE}
```

We acquired both pre-normalized and non-normalized MRI data (see e.g., [here](https://practicalfmri.blogspot.com/2012/04/common-persistent-epi-artifacts-receive.html) for more information on pre-normalization).
All analyses reported in the paper were based on the **pre-normalized data**.
Only the pre-normalized data set is published because uploading the dataset in two versions (with- and without pre-normalization) would otherwise cause interference when running fMRIPrep (see [here](https://neurostars.org/t/addressing-multiple-t1w-images/4959)).

#### Resources

The following resources helped along the way (thank you, people on the internet!):

* [Heudiconv documentation](https://heudiconv.readthedocs.io/en/latest/index.html)
* ["Heudiconv: Example Usage" - Tutorial by James Kent](https://slides.com/jameskent/deck-3#/)
* ["Heudiconv on multiple subjects" - Discussion on neurostars.org](https://neurostars.org/t/heudiconv-on-multiple-subjects/1344)
* ["Heudiconv across multiple sessions" - Discussion on neurostars.org](https://neurostars.org/t/heudiconv-across-multiple-sessions/1281/9)
* ["Heudiconv: How to turn subject IDs into consecutive zero-padded numbers as required by BIDS?" - Discussion on neurostars.org](https://neurostars.org/t/heudiconv-how-to-turn-subject-ids-into-consecutive-zero-padded-numbers-as-required-by-bids/2240)
* ["DICOM to BIDS conversion" - A YouTube tutorial](https://www.youtube.com/watch?time_continue=4&v=pAv9WuyyF3g)
* ["BIDS Tutorial Series: HeuDiConv Walkthrough" - A tutorial by the Stanford Center for Reproducible Neuroscience](http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/)

### Step 2: Removal of facial features using [pydeface](https://github.com/poldracklab/pydeface)

#### Overview

Facial features need to be removed from structural images before sharing the data online.
See the statement from [openfmri.org](https://openfmri.org/de-identification/) below regarding the importance of defacing:

> *To protect the privacy of the individuals who have been scanned we require that all subjects be de-identified before publishing a dataset. For the purposes of fMRI de-facing is the preferred method de-identification of scan data. Skull stripped data will not be accepted for publication.*

and a second statement from the [openneuro.org FAQs](https://openneuro.org/faq):

> *Yes. We recommend using pydeface. Defacing is strongly prefered over skullstripping, because the process is more robust and yields lower chance of accidentally removing brain tissue.*

#### Container: `pydeface` container, version 2.0.0

Defacing of all structural images was performed using [`pydeface`](https://github.com/poldracklab/pydeface), version 2.0.0. (with nipype version 1.3.0-rc1)

To ensure robustness of the defacing procedure, we used a Singularity container for `pydeface`, which we installed as follows:

```bash
singularity pull docker://poldracklab/pydeface:37-2e0c2d
```

All scripts that we used for defacing can be found in the `code/defacing` directory.

#### Defacing structural images: `highspeed-defacing-cluster.sh`

First, we ran `highspeed-defacing-cluster.sh` to deface all structural images that can be found in the corresponding BIDS data set.
Note, that this script was optimized to run on the high performance cluster of the Max Planck Institute for Human Development, Berlin.

```{bash, echo=TRUE, code=readLines(file.path(path_root, "code", "defacing", "highspeed-defacing-cluster.sh")), eval=FALSE}
```

#### Replacing defaced with original images: `highspeed-defacing-cleanup.sh`

`pydeface` creates a new file with the ending `T1w_defaced.nii.gz`.
As `fMRIPrep`, `MRIQC` and other tools need to use the defaced instead of the original image, we need to replace the original with the defaced image.
This can be done separately after `pydeface` was run.

In order to replace the original structural images with the defaced once ([as recommended](https://neurostars.org/t/defaced-anatomical-data-fails-bids-validator/3636)), we ran `highspeed-defacing-cleanup.sh`.

```{bash, echo=TRUE, code=readLines(file.path(path_root, "code", "defacing", "highspeed-defacing-cleanup.sh")), eval=FALSE}
```

#### Resources

* ["Pydeface defaces structural data only?!"](https://neurostars.org/t/pydeface-defaces-structural-data-only/903) - Discussion on neurostars.org if any other data than structural acquisitions should be defaced (short answer: no!)
* ["Is/how much fmriprep (freesurfer et al) is resilient to “defacing”?"](https://neurostars.org/t/is-how-much-fmriprep-freesurfer-et-al-is-resilient-to-defacing/2642) - Discussion on neurostars.org if `fMRIPrep` works well with defaced data (short answer: yes!)

### Step 3: Adding BIDS `*events.tsv` files

#### Overview

According to the [BIDS specification](https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/05-task-events.html) ...

> The purpose of this file is to describe timing and other properties of events recorded during the scan.

Thus, we transformed all behavioral data that was acquired during the scanning sessions to run-specific `*events.tsv` files that are compatible with the BIDS standard.

#### Code and software

We mainly used two MATLAB files that were run in MATLAB version R2017b. The code takes the original behavioral data files acquired during scanning as input and returns run-specific BIDS-compatible `*events.tsv` files.

```{octave, echo=TRUE, code=readLines(file.path(path_root, "code", "events", "highspeed_bids_events.m")), eval=FALSE}
```

```{octave, echo=TRUE, code=readLines(file.path(path_root, "code", "events", "extract_bids_events.m")), eval=FALSE}
```

### Step 4: Adding additional information to the BIDS dataset

#### Code and software

First, we created a virtual environment using Python version 3.8.5 to install all dependencies and required packages.

A list of all required packages can be found in `requirements.txt` (created using `pip freeze > requirements.txt`) and installed via `pip install -r requirements.txt`.

```bash
mkvirtualenv highspeed-bids -p $(which python3)
````

```bash
$ python --version
Python 3.8.5
````

#### Updating `dataset_description.json`: `python highspeed-bids-description.py`

`highspeed-bids-description.py` is a short Python script that loads the `dataset_description.json` file that is pre-generated by HeuDiConv and populates it with the relevant study informtion.

```bash
datalad run -m "create dataset_description.json" --output "dataset_description.json" "python3 code/bids_conversion/highspeed-bids-description.py"
```

```{python, echo=TRUE, code=readLines(file.path(path_root, "code", "bids_conversion", "highspeed-bids-description.py")), eval=FALSE, python.reticulate=FALSE}
```

#### Updating the `.json` files of fieldmap data: `python highspeed-bids-fieldmaps.py`

During later pre-processing of MRI data with `fMRIPrep`, we want to use our fieldmap data for distortion correction.
To ensure that `fMRIPrep` detects and uses the fieldmap data, we need to add the `IntendedFor` field to the `.json` files of the fieldmap data and provide relative paths to the functional task data (see [here](https://bids-specification.readthedocs.io/en/latest/04-modality-specific-files/01-magnetic-resonance-imaging-data.html#fieldmap-data) for more information).

This is done using the `code/bids_conversion/highspeed-bids-fieldmaps.py` file.
To detect the provencance of the run command in our DataLad dataset, we run:

```bash
datalad run -m "create IntendedFor fields in fieldmap .json files" \
--input "./*/*/*/*.nii.gz" --output "./*/*/fmap/*.json" \
"python3 code/code/bids_conversion/highspeed_bids_fieldmaps.py"
```

```{python, echo=TRUE, code=readLines(file.path(path_root, "code", "bids_conversion", "highspeed-bids-fieldmaps.py")), eval=FALSE, python.reticulate=FALSE}
```

#### Creating the `participants.tsv` and `participants.json` files

According to the [BIDS specification, version 1.1.0](https://bids.neuroimaging.io/bids_spec1.1.0.pdf) ...

> [...] the purpose of this file is to describe properties of participants such as age, handedness, sex, etc. In case of single session studies this file has one compulsory column participant_id that consists of sub-<participant_label> , followed by a list of optional columns describing participants. Each participant needs to be described by one and only one row.

In our case, this information was saved in the original behavioral data files that were created when participants performed the task during scanning.

To extract this information from the behavioral files into the `participants.tsv` file we use the `code/events/highspeed_bids_participants.m` file.
Since wrapping the execution of this script into a `datalad` run command appeared challenging, we just ran the script with paths not self-contained and just saved the changes in files.

First, we had to unlock the `participants.tsv` file using `datalad unlock participants.tsv`

The we ran `highspeed_bids_participants.m` using MATLAB version R2017b and saved the changes using `datalad save`.

```{octave, echo=TRUE, code=readLines(file.path(path_root, "code", "events", "highspeed_bids_participants.m")), eval=FALSE}
```

Finally, we create the `participants.json` file using:
```bash
datalad run -m "create participants.json" \
--output "participants.json" \
"python3 code/bids_conversion/highspeed-bids-participants.py"
```

```{python, echo=TRUE, code=readLines(file.path(path_root, "code", "bids_conversion", "highspeed-bids-participants.py")), eval=FALSE, python.reticulate=FALSE}
```