doi/Dataset_Levels: This repository contains the "Levels" dataset.

This repository contains the "Levels" dataset.

4 Vetvy

Frieda Born 60e49980cf correcting back to top header		1 mesiac pred
docs	5d7ff1b112 first commit of ressources for levels data repository	1 mesiac pred
processed_data	5d7ff1b112 first commit of ressources for levels data repository	1 mesiac pred
sourcedata	5d7ff1b112 first commit of ressources for levels data repository	1 mesiac pred
.gitignore	5d7ff1b112 first commit of ressources for levels data repository	1 mesiac pred
LICENSE	5d7ff1b112 first commit of ressources for levels data repository	1 mesiac pred
README.md	60e49980cf correcting back to top header	1 mesiac pred
datacite.yml	26a9aa16c2 adding ORCID ids for everyone	1 mesiac pred

:floppy_disk: Data Repository for the "Levels" dataset

This repository contains the Levels data collected in support of the manuscript: Aligning Machine and Human Visual Representations across Abstraction Levels, which can be accessed here. In brief, the Levels dataset contains the `sourcedata` and `processed_data` from the HumanEval Experiment, conducted at the Max Planck Institute for Human Development (MPIB) in collaboration with BIFOLD at TU Berlin in 2024, by Frieda Born and colleagues.

🪁 What does the Levels dataset contain and why did we collect it? The Levels dataset is a new dataset of human similarity judgments spanning multiple levels of semantic abstraction.

In our main 'AligNet' project, the Levels dataset is used for evaluation. We used the human odd-one-out similarity judgments at multiple levels of semantic abstraction to evaluate whether (synthetically generated) similarity judgments of existing state-of-the-art human-aligned models (Muttenthaler et al., 2023) correspond to ground-truth human judgments. Please see the main manuscript (referenced below) for details on results and methods.

🗞 What is the broader background of this work? Human alignment is becoming central to representation learning (e.g., Muttenthaler et al. 2023, Sucholutsky et al., 2023). Models are needed that don’t just perform well on machine learning downstream tasks, but that align with human perception and intentions. To achieve this, we used existing approaches for aligning neural network models with human object perception (Muttenthaler et al., 2023) to create a large-scale dataset of human-aligned similarity judgments. To test if these judgments are in line with actual human similarity judgments, we needed to collect sets of ground truth human similarity judgments 🚀 The Levels dataset 🚀. While this dataset was primarily designed for evaluation, it can, of course, be utilized for a wide range of purposes, including training models or other applications that require human similarity evaluations.

:file_folder: Overview

This is a condensed overview of the data. For more detailed information, please see the materials linked below and feel free to contact the author(s) if needed.

The sourcedata folder contains the raw subject data as one *.json file per subject. Detailed information about each variable can be found in the variable codebook within the docs/ folder. "The processed_data folder contains preprocessed data files, offering a streamlined and efficient way to work with the dataset. For more details, please refer to the separate README file in this section."

🚨 Main variables

:page_facing_up: Key Data Columns:
- rt: Response times
- image1Path: Stimulus name of the triplet image 1
- image2Path: Stimulus name of the triplet image 2
- image3Path: Stimulus name of the triplet image 3
- selected_image: Name of the image selected as the odd-one-out in each trial
- exp_trial_type: Defines the type of trial (e.g., experiment or training trial)
- response: Demographic information of the participant

:floppy_disk: Download Instructions

You can download the full dataset using the following methods:

Via the GIN Client:
- Install: Follow GIN CLI Setup for installation instructions.
- Clone: Run the following command to clone the repository: gin get fborn/Levels_dataset/src/main/sourcedata
- Navigate to the root of the dataset: cd sourcedata
- Download all data: gin download --content
- If you wish to work on or edit the files, run: gin unlock *
Alternatively, click the small "download icon" on the right side above the list of files in the repository overview on GIN. This will show you how to download the data via the GIN documentation.

:file_folder: Loading Raw Data

If you are using Python, you can load the raw data like this:


def load_response_data(path_to_responses: str) -> List[Dict[str, Union[float, int, str]]]:
    """Load human odd-one-out responses from disk."""
    trials = []
    for file in os.scandir(path_to_results):
        if file.name.endswith(".json"):
            with open(file, "r") as f:
                for line in f:
                    trials.append(json.loads(line))
    return trials

📄 Using this Dataset

If you use this dataset in your work, please consider citing the following paper:

@article{muttenthaler2024aligning,
  title={Aligning Machine and Human Visual Representations across Abstraction Levels},
  author={Muttenthaler, Lukas and Greff, Klaus and Born, Frieda and Spitzer, Bernhard
  and Kornblith, Simon and Mozer, Michael C and M{\"u}ller, Klaus-Robert and
  Unterthiner, Thomas and Lampinen, Andrew K},
  journal={arXiv preprint arXiv:2409.06509},
  year={2024}
}

You can access the paper here.

(back to top)

:warning: License

This data is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: LICENSE. See also the human readable summary at: summary.

Please see the LICENSE file for details.

📬 Please do not hesitate to contact us (born[at]mpib-berlin.mpg.de) when you have questions about the data or wish to receive them in a different format.