This repository contains the "Levels" dataset.

Frieda Born 60e49980cf correcting back to top header 1 month ago
docs 5d7ff1b112 first commit of ressources for levels data repository 1 month ago
processed_data 5d7ff1b112 first commit of ressources for levels data repository 1 month ago
sourcedata 5d7ff1b112 first commit of ressources for levels data repository 1 month ago
.gitignore 5d7ff1b112 first commit of ressources for levels data repository 1 month ago
LICENSE 5d7ff1b112 first commit of ressources for levels data repository 1 month ago
README.md 60e49980cf correcting back to top header 1 month ago
datacite.yml 26a9aa16c2 adding ORCID ids for everyone 1 month ago

README.md

:floppy_disk: Data Repository for the "Levels" dataset

This repository contains the Levels data collected in support of the manuscript: Aligning Machine and Human Visual Representations across Abstraction Levels, which can be accessed here. In brief, the Levels dataset contains the `sourcedata` and `processed_data` from the HumanEval Experiment, conducted at the Max Planck Institute for Human Development (MPIB) in collaboration with BIFOLD at TU Berlin in 2024, by Frieda Born and colleagues.

🪁 What does the Levels dataset contain and why did we collect it? The Levels dataset is a new dataset of human similarity judgments spanning multiple levels of semantic abstraction.

In our main 'AligNet' project, the Levels dataset is used for evaluation. We used the human odd-one-out similarity judgments at multiple levels of semantic abstraction to evaluate whether (synthetically generated) similarity judgments of existing state-of-the-art human-aligned models (Muttenthaler et al., 2023) correspond to ground-truth human judgments. Please see the main manuscript (referenced below) for details on results and methods.

🗞 What is the broader background of this work? Human alignment is becoming central to representation learning (e.g., Muttenthaler et al. 2023, Sucholutsky et al., 2023). Models are needed that don’t just perform well on machine learning downstream tasks, but that align with human perception and intentions. To achieve this, we used existing approaches for aligning neural network models with human object perception (Muttenthaler et al., 2023) to create a large-scale dataset of human-aligned similarity judgments. To test if these judgments are in line with actual human similarity judgments, we needed to collect sets of ground truth human similarity judgments 🚀 The Levels dataset 🚀. While this dataset was primarily designed for evaluation, it can, of course, be utilized for a wide range of purposes, including training models or other applications that require human similarity evaluations.


:file_folder: Overview

This is a condensed overview of the data. For more detailed information, please see the materials linked below and feel free to contact the author(s) if needed.

The sourcedata folder contains the raw subject data as one *.json file per subject. Detailed information about each variable can be found in the variable codebook within the docs/ folder. "The processed_data folder contains preprocessed data files, offering a streamlined and efficient way to work with the dataset. For more details, please refer to the separate README file in this section."

    🚨 Main variables

  • :page_facing_up: Key Data Columns:
    • rt: Response times
    • image1Path: Stimulus name of the triplet image 1
    • image2Path: Stimulus name of the triplet image 2
    • image3Path: Stimulus name of the triplet image 3
    • selected_image: Name of the image selected as the odd-one-out in each trial
    • exp_trial_type: Defines the type of trial (e.g., experiment or training trial)
    • response: Demographic information of the participant

:floppy_disk: Download Instructions

You can download the full dataset using the following methods:

  1. Via the GIN Client:
    • Install: Follow GIN CLI Setup for installation instructions.
    • Clone: Run the following command to clone the repository: gin get fborn/Levels_dataset/src/main/sourcedata
    • Navigate to the root of the dataset: cd sourcedata
    • Download all data: gin download --content
    • If you wish to work on or edit the files, run: gin unlock *
  2. Alternatively, click the small "download icon" on the right side above the list of files in the repository overview on GIN. This will show you how to download the data via the GIN documentation.

:file_folder: Loading Raw Data

If you are using Python, you can load the raw data like this:


def load_response_data(path_to_responses: str) -> List[Dict[str, Union[float, int, str]]]:
    """Load human odd-one-out responses from disk."""
    trials = []
    for file in os.scandir(path_to_results):
        if file.name.endswith(".json"):
            with open(file, "r") as f:
                for line in f:
                    trials.append(json.loads(line))
    return trials


📄 Using this Dataset

If you use this dataset in your work, please consider citing the following paper:

@article{muttenthaler2024aligning,
  title={Aligning Machine and Human Visual Representations across Abstraction Levels},
  author={Muttenthaler, Lukas and Greff, Klaus and Born, Frieda and Spitzer, Bernhard
  and Kornblith, Simon and Mozer, Michael C and M{\"u}ller, Klaus-Robert and
  Unterthiner, Thomas and Lampinen, Andrew K},
  journal={arXiv preprint arXiv:2409.06509},
  year={2024}
}

You can access the paper here.

(back to top)

:warning: License

This data is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: LICENSE. See also the human readable summary at: summary.

Please see the LICENSE file for details.

📬 Please do not hesitate to contact us (born[at]mpib-berlin.mpg.de) when you have questions about the data or wish to receive them in a different format.

(back to top)

datacite.yml
Title The Levels Dataset
Authors Muttenthaler,Lukas;Google DeepMind, Machine Learning Group, Technische Universität Berlin, BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany;ORCID:0000-0002-0804-4687
Greff,Klaus;Google DeepMind;ORCID:0000-0001-6982-0937
Born,Frieda;Technische Universität Berlin, BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany, Adaptive Memory and Decision Making (AMD), Max Planck Institute for Human Development,Berlin, Germany;ORCID:0009-0002-1214-4864
Spitzer,Bernhard;Adaptive Memory and Decision Making (AMD), Max Planck Institute for Human Development, Berlin, Germany;ORCID:0000-0001-9752-932X
Kornblith,Simon;Anthropic;ORCID:0000-0002-9088-2443
Mozer,Michael C.;Google DeepMind;ORCID:0000-0002-9654-0575
Müller,Klaus-Robert;Google DeepMind, Machine Learning Group, Technische Universität Berlin, BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany,Department of Artificial Intelligence, Korea University, Seoul, Max Planck Institute for Informatics, Saarbrücken, Germany;ORCID:0000-0002-3861-7685
Unterthiner,Thomas;Google DeepMind;ORCID:0000-0001-5361-3087
Lampinen,Andrew K.;Google DeepMind;ORCID:0000-0002-6988-8437
Description To validate that AligNet can indeed help to increase the alignment between models and humans, we used crowd-sourcing to collect a novel evaluation dataset of human semantic judgments across multiple levels of abstraction that we call Levels.
License Open Data Commons Public Domain Dedication and License (PDDL) v1.0 (https://opendatacommons.org/licenses/pddl/1-0/)
References Muttenthaler, L., Greff, K., Born, F., Spitzer, B., Kornblith, S., Mozer, M.C., Müller, K.R., Unterthiner, T., Lampinen, A.K. : Aligning Machine and Human Visual Representations across Abstraction Levels [doi:10.48550/arXiv.2409.06509] (IsSupplementTo)
Born Frieda. (2024). Levels Collection Experiment Code (v1.0.0) [doi:10.5281/zenodo.13749102] (IsReferencedBy)
Funding
Keywords AI alignment
human cognition
representation learning
computer vision
Resource Type Dataset