PRODCOM data converted with the PRObs ontology

stephenjboyle 6a99e6ad59 Update DEVELOPING.md 7 miesięcy temu
.datalad e031539dc6 [DATALAD] new dataset 1 rok temu
bulk_data d313e13296 Update sold production files 7 miesięcy temu
data 24c613ddd2 Update for ontology changes 8 miesięcy temu
ontology a2cfb9daef Delete copy of ontology file 10 miesięcy temu
outputs 4117cd4a8e Add rdf output files 7 miesięcy temu
raw_data d5c3f42535 Add RDFox scripts/rules to load PRODCOM bulk data 10 miesięcy temu
scripts 7da3095146 Update to fix test queries using object name 7 miesięcy temu
tests 7f2c76ece0 Modify bulk tests to use object codes 7 miesięcy temu
.gitattributes ef3f58e9a7 Git attributes for bulk data files 7 miesięcy temu
.gitignore e2e79c1544 Converted to datalad 1 rok temu
DEVELOPING.md 6a99e6ad59 Update DEVELOPING.md 7 miesięcy temu
README.md a2d6ab5323 Remove COMTRADE references 1 rok temu
dodo.py 08615d2a92 Add total production to bulk processing 8 miesięcy temu
environment.yml e899370c68 Update environment file 7 miesięcy temu

README.md

PRODCOM data as PRObs Observations

This repository converts data from the PRODCOM database into a structure defined by the Physical Resources Observatory (PRObs) ontology.

See DEVELOPING.md for more information about using this repository.

Dataset structure

  • Repository is a datalad dataset
  • Input data files needing preprocessing are located in raw_data/.
  • Preprocessed data files ready for conversion are located in data/.
  • All custom code is located in scripts/.
  • Converted data is saved to outputs/.

Installation

Getting the code

To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:

datalad clone https://github.com/probs-lab/prodcom-data.git

Setting up the virtual environment and installing dependencies:

To create a virtual environment using conda/miniconda:

cd prodcom-data
conda env create

Running the code

After installation:

  • Open a terminal / git-bash window
  • Navigate to prodcom-data folder, e.g. cd prodcom-data
  • Activate environment using conda activate prodcom-data

To download the example output data files from the server use:

datalad get outputs

To preprocess input data files run the script:

doit run preprocess

To convert the preprocessed data in the data folder run:

doit run convert_data

To run all necessary tasks (i.e. preprocessing and conversion) simply run:

doit

Individual files can be converted by running the convert_data.py script with appropriate parameters specifying the file type and the input and output filenames:

scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz

For conversion of the example PRODCOM data files the type prodcom should be specified. Types prodcom_list and prodcom_correspondence are also defined.

Converting new data

For conversion of new data files (possibly in a different format from the examples) see the DEVELOPING.md file.

Testing the code

To test the code, after installing the software and running the doit script:

cd tests
pytest