How research programs come apart: code and data

Lucas Gautheron d827a2ffd2 Merge branch 'paper-branch' of https://gin.g-node.org/lucasgautheron/trading_zones_material into paper-branch		1 год назад
.datalad	506e39ca1d Initial	1 год назад
AbstractSemantics	a8cb0487d7 embeddings	1 год назад
analyses	d827a2ffd2 Merge branch 'paper-branch' of https://gin.g-node.org/lucasgautheron/trading_zones_material into paper-branch	1 год назад
inspire-harvest @ 9b2541c905	506e39ca1d Initial	1 год назад
output	fd8c07830a longitudinal fit	1 год назад
plots	d123019145 kfold	1 год назад
surveys	506e39ca1d Initial	1 год назад
tables	36f833ed24 larger training set	1 год назад
.DS_Store	506e39ca1d Initial	1 год назад
.gitattributes	506e39ca1d Initial	1 год назад
.gitignore	506e39ca1d Initial	1 год назад
.gitmodules	506e39ca1d Initial	1 год назад
README.md	6317e79ef8 Mise à jour de 'README.md'	1 год назад
requirements.txt	506e39ca1d Initial	1 год назад
setup.py	506e39ca1d Initial	1 год назад

How research programs come apart: code and data

This repository contains the code and data necessary to reproduce the figures and tables in "How research programs come apart: the case of supersymmetry and the disunity of physics" (the latest version of the manuscript source can be found here)

The repository is structured as follows:

Location	Contents
`inspire-harvest`	Subdataset including the data from which present analyses are derived
`analyses`	This folder contains the scripts which generate intermediate results for the plots and tables.
`plots`	This folder contains the plots included in the manuscript or supplementary materials as well as the scripts that produce them.
`surveys`	This folder contains the scripts to generate and compile the manual validation tasks that were performed, as well as the experts' reports.
`tables`	This folder contains the tables included in the manuscript or supplementary materials as well as the scripts that produce them.
`output`	This folder contains intermediate results used to produce the material included in the paper.
`AbstractSemantics`	This folder contains a basic python package for the multithreaded retrieval of n-grams matching certain patterns from large corpora.

Setup

Getting the data

In order to use this repository, DataLad is required. DataLad enables reproducible science with large datasets, and instructions for its installation on various systems can be found here.

Once DataLad is installed on your system, we recommend creating an account on the data sharing platform GIN, where our data are hosted, and configuring your SSH key in the parameters of your account.

The dataset can be installed by doing:

datalad install -r git@gin.g-node.org:/lucasgautheron/trading_zones_material.git

Data can be retrieved by executing the following command:

cd trading_zones_material
datalad get inspire-harvest/database -s s3

Running analyses

Before running analyses, it is necessary to install the required python packages. For that, please do:

pip install -r requirements.txt

This should cover the packages used in the present analyses. If not, please install missing dependencies manually and report them by creating a ticket.

Then, you may run any analysis by executing the corresponding script from the root of the repository:

python analyses/<name_of_the_analysis>.py

It is similarly possible to run the scripts in plots and tables for reproducing the material included in the paper and in the supplementary materials.

README.md

How research programs come apart: code and data

Setup

Getting the data

Running analyses