PRODCOM data converted with the PRObs ontology

stephenjboyle fa13a1a0e5 modify tests to use w3id prefix 10 maanden geleden
.datalad e031539dc6 [DATALAD] new dataset 1 jaar geleden
data 26cb6c99bc Convert PRODCOM only 1 jaar geleden
ontology 8cb9de9f20 Update ontology 1 jaar geleden
outputs cb9318bd57 Remove COMTRADE output files 1 jaar geleden
raw_data 26cb6c99bc Convert PRODCOM only 1 jaar geleden
scripts 50182cabdf remove reference to probs-comtrade.ttl file 1 jaar geleden
tests fa13a1a0e5 modify tests to use w3id prefix 10 maanden geleden
.gitattributes 414714a038 Instruct annex to add text files to Git 1 jaar geleden
.gitignore e2e79c1544 Converted to datalad 1 jaar geleden
DEVELOPING.md a2d6ab5323 Remove COMTRADE references 1 jaar geleden
README.md a2d6ab5323 Remove COMTRADE references 1 jaar geleden
dodo.py 26cb6c99bc Convert PRODCOM only 1 jaar geleden
environment.yml 42d8d071cf modify convert_data.py to work with conversion type parameter 1 jaar geleden

README.md

PRODCOM data as PRObs Observations

This repository converts data from the PRODCOM database into a structure defined by the Physical Resources Observatory (PRObs) ontology.

See DEVELOPING.md for more information about using this repository.

Dataset structure

  • Repository is a datalad dataset
  • Input data files needing preprocessing are located in raw_data/.
  • Preprocessed data files ready for conversion are located in data/.
  • All custom code is located in scripts/.
  • Converted data is saved to outputs/.

Installation

Getting the code

To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:

datalad clone https://github.com/probs-lab/prodcom-data.git

Setting up the virtual environment and installing dependencies:

To create a virtual environment using conda/miniconda:

cd prodcom-data
conda env create

Running the code

After installation:

  • Open a terminal / git-bash window
  • Navigate to prodcom-data folder, e.g. cd prodcom-data
  • Activate environment using conda activate prodcom-data

To download the example output data files from the server use:

datalad get outputs

To preprocess input data files run the script:

doit run preprocess

To convert the preprocessed data in the data folder run:

doit run convert_data

To run all necessary tasks (i.e. preprocessing and conversion) simply run:

doit

Individual files can be converted by running the convert_data.py script with appropriate parameters specifying the file type and the input and output filenames:

scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz

For conversion of the example PRODCOM data files the type prodcom should be specified. Types prodcom_list and prodcom_correspondence are also defined.

Converting new data

For conversion of new data files (possibly in a different format from the examples) see the DEVELOPING.md file.

Testing the code

To test the code, after installing the software and running the doit script:

cd tests
pytest