README.md 5.8 KB

Nature Story Listening 3T fMRI Data

Summary

This dataset contains BOLD fMRI responses in human subjects listening to a set of natural autobiographic stories. The functional data were collected in eleven subjects, in two sessions over two separate days for each subject. Details of the experiment are described in the original publications [1], [2], [3], [4]. Source data used to generate all the figures in the publication [4] is included. Code used to analyze the data in the publication [4] is here.

[1] Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016). https://doi.org/10.1038/nature17637

[2] de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L., & Theunissen, F. E.. The hierarchical cortical organization of human speech processing. Journal of Neuroscience, 37(27), 6539-6557 (2017). DOI: https://doi.org/10.1523/JNEUROSCI.3267-16.2017

[3] Deniz, F., Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L.. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39), 7722-7736 (2019). DOI: https://doi.org/10.1523/JNEUROSCI.0675-19.2019

[4] Gong, X., Huth, A. G., Deniz, F., Johnson, K., Gallant, J. L., & Theunissen, F. E.. Phonemic segmentation of narrative speech in human cerebral cortex. Nature Communications, (2023). https://doi.org/

If you publish any work using the dataset, please cite the original publication [2] and [4], and cite the dataset [1b] in the following recommended format:

[1] Huth, A. G., De Heer, W. A., Deniz, F., Gong, X., Gallant, J. L., & Theunissen, F. E.. Nature Story Listening 3T fMRI Data.

How to get started

With git and git-annex

To download the data with git-annex, run the commands:

# clone the repository, without the data files
git clone https://gin.g-node.org/gallantlab/story_listening
cd story_listening
# download one file (e.g. features/features_matrix.hdf)
git annex get features/features_matrix.hdf --from wasabi
# download all files
git annex get . --from wasabi

To maximize the downloading speed, two remotes are available to download the data. The first remote is GIN (--from origin), but the bandwidth might be limited. The second remote is Wasabi (--from wasabi), with a larger bandwidth.

Dataset content

Data file organization

features/                    → feature spaces used for voxelwise modeling
    english1000.hdf          → semantic embeddings, as described in [1], [2], [3], [4]
    feature_basis.hdf        → all feature labels, as described in [1]
    feature_matrix.hdf       → all feature, as described in [1]
mappers/                     → plotting mappers for each subject
    S01_mappers.hdf
    ...
    S11_mappers.hdf
responses/                   → functional responses for each subject
    S01_BOLD.hdf
    ...
    S11_BOLD.hdf
    simulation_BOLD.hdf      → simulated functional responses for simulation analysis
stimuli/                     → natural autobiographic story, for each fMRI run
    test.wav
    train_00.wav
    ...
    train_11.wav

Data format

All files are hdf5 files, with multiple arrays stored inside. The names, shapes, and descriptions of each array are listed below.


Each file in `features` contains:
    X_train: array of shape (3737, n_features)
        Training features.
    X_test: array of shape (291, n_features)
        Testing features.

    where (n_features = 448) for `spectral power` 
    and (n_features = 1) for `number of phonemes` & `number of words` 
    and (n_features = 39) for `single phoneme`
    and (n_features = 858) for `diphone`.
    and (n_features = 4841) for `triphone`.
    and (n_features = 985) for `semantics`.

Each file in `mappers` contains:
    voxel_to_flatmap: CSR sparse array of shape (n_pixels, n_voxels)
        Mapper from voxels to flatmap image. The sparse array is stored with
        four dense arrays: (data, indices, indptr, shape).
    voxel_to_fsaverage: CSR sparse array of shape (n_vertices, n_voxels)
        Mapper from voxels to FreeSurfer surface. The sparse array is stored
        with four dense arrays: (data, indices, indptr, shape).
    flatmap_mask: array of shape (width, height)
        Pixels of the flatmap image associated with a voxel.
    flatmap_rois: array of shape (width, height, 4)
        Transparent image with annotated ROIs (for subjects S01, S02, and S03).
    flatmap_curvature: array of shape (width, height)
        Transparent image with binarized curvature to locate sulci/gyri.
    roi_mask_xxx: array of shape (n_voxels, )
        Mask indicating which voxels are in the ROI `xxx`.
        ROI list is different on each subject. SO4 and S05 have no ROIs.

Each file in `responses` contains:
    Y_train: array of shape (3737, n_voxels)
        Training responses.
    Y_test: array of shape (291, n_voxels)
        Testing responses.

Each file in `stimuli` contains the raw sound wav for each story. 

Each file in `source_data` contains the data used to generate each figure in the publication [4].
    `source_data_manuscript` contains data for generating figure 4, 5, 6, and 7(c, d) for the main paper. 
	Data for 5 is also used to generate supplementary table 3.
	Data for 7c is also used to generate supplementary figure 14.  
    `source_data_performance` contains data for generating all the flatmaps for both the main paper (figure 2, 3, 7(a, b))and the supplements(supfigure 5, 6, 8, 11, 13)
    `source_data_supplements` contains data for generating figures in the supplements not provided in the `source_data_performance`.