BOLD fMRI responses in human subjects reading and listening to a set of natural stories.

More information can be found in:

Deniz, F., Nunez-Elizalde, A.O., Huth, A.G. and Gallant, J.L., 2019. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39), pp.7722-7736.

Catherine Chen 18fd91d109 Upload files to 'stimuli/textgrids' 4 months ago
chen2024_timescales af2aaf2ea7 Upload files to 'chen2024_timescales' 9 months ago
code 6b09a20a92 Remove depracated .value for hdfs. 9 months ago
features 90dafd4dcb add features 9 months ago
mappers 76586c2a0a Update subject04 mapper 9 months ago
responses 1e08410d9d gin commit from past-health 9 months ago
stimuli 18fd91d109 Upload files to 'stimuli/textgrids' 4 months ago
.gitignore 35bbe75ac4 Initial commit 9 months ago
LICENSE 35bbe75ac4 Initial commit 9 months ago
README.md ce9cb11c21 update README 9 months ago
datacite.yml a82cfc6de6 formatting in references 9 months ago

README.md

narratives_reading_listening_fmri

This folder contains stimuli, models, and fMRI data originally created for and collected in Deniz et al. 2019 (see below for full reference).

Some results from this study can be viewed online at:

https://www.gallantlab.org/brainviewer/Deniz2019/

Stimuli

The stimuli folder contains wav files presented to the subjects in the experiment. The reading stimuli was based on the transcripts of the stories that are included into this folder as textfiles. story_11.wav is the validation story. For full details regarding stimulus presentation, please see the methods section of Deniz et al. 2019.

Model Features

In the features folder, the two files features_trn_NEW.hdf and features_val_NEW.hdf store the values for features in all the models for each TR of the stimulus stories. These feature values are downsampled to the sampling rate of the MRI data. For example, the dimensions of the stored array for the semantic model for each story is (time x 985) because there are 985 features in the semantic feature space. The dimensions of the stimulus features of each story are 10 seconds (5 TR) less than the fMRI data because the 10 s (5 TRs) silence after the story during the scan are not reflected in the stimulus features. However, the 10 s (5 TRs) silence before the story are included in the stimulus features. When data is trimmed as described in the paper, this discrepancy should be considered. The file moth_en_moten_20210928.npz contains the motion energy feature and is already trimmed and concatenated across train and test stories.

Data

In the responses folder, fMRI data for the six subjects in the experiment are provided in arrays of (time x voxels) for each data collection run (10 stories of training data, 1 story repeated 2 times as validation data). The data have been preprocessed to account for effects of subject motion and voxel selection has been applied to select only cortical voxels in each scan. Manually edited freesurfer segmentations of the cortex were used along with pycortex to produce masks for cortical activity. Out of concern for subject privacy, we provide neither raw functional scans nor anatomical scans. Instead, we provide sparse matrices that can be used (1) to map the per-voxel data onto a flattened version of each subject's brain and (2) to map each subject's brain onto the fsaverage surface in freesurfer (which itself is in MNI space). These sparse matrices can be found in the mappers folder.

These mapping arrays provide a way to assess the relationship between anatomy and function without compromising subjects' privacy. Full raw data may be provided for specifically defined research goals requiring raw data if and only if the subjects consent to that specific use.

Check example.py in code/ directory for example code that can be used to load data and visualize data on subject's cortical surface.

Citation

Deniz, F., Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L. (2019). The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39), 7722-7736.

This repository also contains data from other papers that used the dataset, such as Chen et al., 2024 in the chen2024_timescales folder.

datacite.yml
Title The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality
Authors Deniz,Fatma;University of California, Berkeley;0000-0001-6051-7288
Nunez-Elizalde,Anwar O.;University of California, Berkeley
Huth,Alexander G.;University of California, Berkeley;ORCID:0000-0002-7590-3525
Gallant,Jack L.;University of California, Berkeley;ORCID:0000-0001-7273-1054
Description This folder contains stimuli, models, and fMRI data of subject reading and listening to english narratives originally collected for Deniz et al. 2019.
License Creative Commons CC0 1.0 Public Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/)
References Deniz F., Nunez-Elizalde A. O., Huth, A. G., & Gallant, J.L. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39), 7722-7736 (2019). [doi:10.1523/JNEUROSCI.0675-19.2019] (IsSupplementTo)
Chen C., Dupré la Tour T., Gallant J.L., Klein D., Deniz F. The Cortical Representation of Language Timescales is Shared between Reading and Listening. bioRxiv [Preprint]. 2023 Dec 11:2023.01.06.522601. doi: 10.1101/2023.01.06.522601. PMID: 37577530; PMCID: PMC10418083. [10.1101/2023.01.06.522601] (IsSupplementTo)
Lamarre M., Chen C., and Deniz F. Attention weights accurately predict language representations in the brain. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4513–4529, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. (2022) [10.18653/v1/2022.findings-emnlp.330] (IsSupplementTo)
Funding NSF; IIS1208203
NEI; EY019684
NEI; EY022454
IARPA; 86155-Carnegi-1990360-gallant
CSI; CCF-0939370
Keywords Neuroscience
BOLD
cross-model representations
fMRI
listening
reading
semantics
Resource Type Dataset