neurovault data for the NARPS open pipeline
TABLE OF CONTENT
Getting the data from neurovault
Download code taken and adapted from:
https://github.com/poldrack/narps/tree/master/ImageAnalyses
Note it is also possible to grad all the data from the Zenodo archives that was
generated when the NARPS paper was released:
https://zenodo.org/record/3528329/
Requirements
Necessary pacakges are listed in the requirements.txt
.
Teams
The team_id.xlsx
is required for the script to run and lists all the different
teams and the link to their collection.
Excluded teams are hard coded in TEAMS_TO_SKIP
in PrepareData.py
.
50GV
C22U
94GU
5G9K
2T7P
R42Q
16IN
VG39
1K0E
X1Z4
L1A8
XU70
They are listed in this google spreadsheet:
https://docs.google.com/spreadsheets/d/1FU_F6kdxOD4PRQDIHXGHS4zTi_jEVaUqY_Zwg0z6S64/
Reasons for exclusion are listed PrepareData.py
and here
https://gitlab.inria.fr/egermani/analytic_variability_fmri/-/blob/master/src/variable_selection.ipynb
Get the data
The data provided for download were obtained using from
Neurovault using PrepareData.py
. The tarball includes
files describing the provenance of the downloaded data (including MD5 hashes for
identity checking).
python PrepareData.py -b $PWD
DataLad datasets and how to use them
This repository is a DataLad dataset. It provides
fine-grained data access down to the level of individual files, and allows for
tracking future updates. In order to use this repository for data retrieval,
DataLad is required. It is a free and open source
command line tool, available for all major operating systems, and builds up on
Git and git-annex to allow sharing,
synchronizing, and version controlling collections of large files. You can find
information on how to install DataLad at
handbook.datalad.org/en/latest/intro/installation.html.
Get the dataset
A DataLad dataset can be installed
by running
datalad install <url>
Once a dataset is installed, it is a light-weight directory on your local
machine. At this point, it contains only small metadata and information on the
identity of the files in the dataset, but not actual content of the (sometimes
large) data files.
Given that this dataset is hosted on GIN, you will need to set up an SSH key to
get this data.
See the datalad handbook for more information:
http://handbook.datalad.org/en/latest/basics/101-139-gin.html
Retrieve dataset content
After cloning a dataset, you can retrieve file contents by running
datalad get <path/to/directory/or/file>`
This command will trigger a download of the files, directories, or subdatasets
you have specified.
DataLad datasets can contain other datasets, so called subdatasets. If you
clone the top-level dataset, subdatasets do not yet contain metadata and
information on the identity of files, but appear to be empty directories. In
order to retrieve file availability metadata in subdatasets, run
datalad get -n <path/to/subdataset>
Afterwards, you can browse the retrieved metadata to find out about subdataset
contents, and retrieve individual files with datalad get
. If you use
datalad get <path/to/subdataset>
, all contents of the subdataset will be
downloaded at once.
Stay up-to-date
DataLad datasets can be updated. The command datalad update
will fetch
updates and store them on a different branch (by default
remotes/origin/master
). Running
datalad update --merge
will pull available updates and integrate them in one go.
Find out what has been done
DataLad datasets contain their history in the git log
. By running git log
(or a tool that displays Git history) in the dataset or on specific files, you
can find out what has been done to the dataset or to individual files by whom,
and when.
More information
More information on DataLad and how to use it can be found in the DataLad
Handbook at
handbook.datalad.org. The
chapter "DataLad datasets" can help you to familiarize yourself with the concept
of a dataset.