How research programs come apart: code and data

Lucas Gautheron d827a2ffd2 Merge branch 'paper-branch' of https://gin.g-node.org/lucasgautheron/trading_zones_material into paper-branch 1 год назад
.datalad 506e39ca1d Initial 1 год назад
AbstractSemantics a8cb0487d7 embeddings 1 год назад
analyses d827a2ffd2 Merge branch 'paper-branch' of https://gin.g-node.org/lucasgautheron/trading_zones_material into paper-branch 1 год назад
inspire-harvest @ 9b2541c905 506e39ca1d Initial 1 год назад
output fd8c07830a longitudinal fit 1 год назад
plots d123019145 kfold 1 год назад
surveys 506e39ca1d Initial 1 год назад
tables 36f833ed24 larger training set 1 год назад
.DS_Store 506e39ca1d Initial 1 год назад
.gitattributes 506e39ca1d Initial 1 год назад
.gitignore 506e39ca1d Initial 1 год назад
.gitmodules 506e39ca1d Initial 1 год назад
README.md 6317e79ef8 Mise à jour de 'README.md' 1 год назад
requirements.txt 506e39ca1d Initial 1 год назад
setup.py 506e39ca1d Initial 1 год назад

README.md

How research programs come apart: code and data

This repository contains the code and data necessary to reproduce the figures and tables in "How research programs come apart: the case of supersymmetry and the disunity of physics" (the latest version of the manuscript source can be found here)

The repository is structured as follows:

Location Contents
inspire-harvest Subdataset including the data from which present analyses are derived
analyses This folder contains the scripts which generate intermediate results for the plots and tables.
plots This folder contains the plots included in the manuscript or supplementary materials as well as the scripts that produce them.
surveys This folder contains the scripts to generate and compile the manual validation tasks that were performed, as well as the experts' reports.
tables This folder contains the tables included in the manuscript or supplementary materials as well as the scripts that produce them.
output This folder contains intermediate results used to produce the material included in the paper.
AbstractSemantics This folder contains a basic python package for the multithreaded retrieval of n-grams matching certain patterns from large corpora.

Setup

Getting the data

In order to use this repository, DataLad is required. DataLad enables reproducible science with large datasets, and instructions for its installation on various systems can be found here.

Once DataLad is installed on your system, we recommend creating an account on the data sharing platform GIN, where our data are hosted, and configuring your SSH key in the parameters of your account.

The dataset can be installed by doing:

datalad install -r git@gin.g-node.org:/lucasgautheron/trading_zones_material.git

Data can be retrieved by executing the following command:

cd trading_zones_material
datalad get inspire-harvest/database -s s3

Running analyses

Before running analyses, it is necessary to install the required python packages. For that, please do:

pip install -r requirements.txt

This should cover the packages used in the present analyses. If not, please install missing dependencies manually and report them by creating a ticket.

Then, you may run any analysis by executing the corresponding script from the root of the repository:

python analyses/<name_of_the_analysis>.py

It is similarly possible to run the scripts in plots and tables for reproducing the material included in the paper and in the supplementary materials.