PRODCOM data converted with the PRObs ontology

stephenjboyle a2cfb9daef Delete copy of ontology file hai 10 meses
.datalad e031539dc6 [DATALAD] new dataset hai 1 ano
data 600d42a278 Change prefixes to w3id.org hai 10 meses
ontology a2cfb9daef Delete copy of ontology file hai 10 meses
outputs cb9318bd57 Remove COMTRADE output files hai 1 ano
raw_data 26cb6c99bc Convert PRODCOM only hai 1 ano
scripts 600d42a278 Change prefixes to w3id.org hai 10 meses
tests 600d42a278 Change prefixes to w3id.org hai 10 meses
.gitattributes 414714a038 Instruct annex to add text files to Git hai 1 ano
.gitignore e2e79c1544 Converted to datalad hai 1 ano
DEVELOPING.md a2d6ab5323 Remove COMTRADE references hai 1 ano
README.md a2d6ab5323 Remove COMTRADE references hai 1 ano
dodo.py 26cb6c99bc Convert PRODCOM only hai 1 ano
environment.yml 600d42a278 Change prefixes to w3id.org hai 10 meses

README.md

PRODCOM data as PRObs Observations

This repository converts data from the PRODCOM database into a structure defined by the Physical Resources Observatory (PRObs) ontology.

See DEVELOPING.md for more information about using this repository.

Dataset structure

  • Repository is a datalad dataset
  • Input data files needing preprocessing are located in raw_data/.
  • Preprocessed data files ready for conversion are located in data/.
  • All custom code is located in scripts/.
  • Converted data is saved to outputs/.

Installation

Getting the code

To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:

datalad clone https://github.com/probs-lab/prodcom-data.git

Setting up the virtual environment and installing dependencies:

To create a virtual environment using conda/miniconda:

cd prodcom-data
conda env create

Running the code

After installation:

  • Open a terminal / git-bash window
  • Navigate to prodcom-data folder, e.g. cd prodcom-data
  • Activate environment using conda activate prodcom-data

To download the example output data files from the server use:

datalad get outputs

To preprocess input data files run the script:

doit run preprocess

To convert the preprocessed data in the data folder run:

doit run convert_data

To run all necessary tasks (i.e. preprocessing and conversion) simply run:

doit

Individual files can be converted by running the convert_data.py script with appropriate parameters specifying the file type and the input and output filenames:

scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz

For conversion of the example PRODCOM data files the type prodcom should be specified. Types prodcom_list and prodcom_correspondence are also defined.

Converting new data

For conversion of new data files (possibly in a different format from the examples) see the DEVELOPING.md file.

Testing the code

To test the code, after installing the software and running the doit script:

cd tests
pytest