PRODCOM data converted with the PRObs ontology

stephenjboyle a2cfb9daef Delete copy of ontology file há 10 meses atrás
.datalad e031539dc6 [DATALAD] new dataset há 1 ano atrás
data 600d42a278 Change prefixes to w3id.org há 10 meses atrás
ontology a2cfb9daef Delete copy of ontology file há 10 meses atrás
outputs cb9318bd57 Remove COMTRADE output files há 1 ano atrás
raw_data 26cb6c99bc Convert PRODCOM only há 1 ano atrás
scripts 600d42a278 Change prefixes to w3id.org há 10 meses atrás
tests 600d42a278 Change prefixes to w3id.org há 10 meses atrás
.gitattributes 414714a038 Instruct annex to add text files to Git há 1 ano atrás
.gitignore e2e79c1544 Converted to datalad há 1 ano atrás
DEVELOPING.md a2d6ab5323 Remove COMTRADE references há 1 ano atrás
README.md a2d6ab5323 Remove COMTRADE references há 1 ano atrás
dodo.py 26cb6c99bc Convert PRODCOM only há 1 ano atrás
environment.yml 600d42a278 Change prefixes to w3id.org há 10 meses atrás

README.md

PRODCOM data as PRObs Observations

This repository converts data from the PRODCOM database into a structure defined by the Physical Resources Observatory (PRObs) ontology.

See DEVELOPING.md for more information about using this repository.

Dataset structure

  • Repository is a datalad dataset
  • Input data files needing preprocessing are located in raw_data/.
  • Preprocessed data files ready for conversion are located in data/.
  • All custom code is located in scripts/.
  • Converted data is saved to outputs/.

Installation

Getting the code

To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:

datalad clone https://github.com/probs-lab/prodcom-data.git

Setting up the virtual environment and installing dependencies:

To create a virtual environment using conda/miniconda:

cd prodcom-data
conda env create

Running the code

After installation:

  • Open a terminal / git-bash window
  • Navigate to prodcom-data folder, e.g. cd prodcom-data
  • Activate environment using conda activate prodcom-data

To download the example output data files from the server use:

datalad get outputs

To preprocess input data files run the script:

doit run preprocess

To convert the preprocessed data in the data folder run:

doit run convert_data

To run all necessary tasks (i.e. preprocessing and conversion) simply run:

doit

Individual files can be converted by running the convert_data.py script with appropriate parameters specifying the file type and the input and output filenames:

scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz

For conversion of the example PRODCOM data files the type prodcom should be specified. Types prodcom_list and prodcom_correspondence are also defined.

Converting new data

For conversion of new data files (possibly in a different format from the examples) see the DEVELOPING.md file.

Testing the code

To test the code, after installing the software and running the doit script:

cd tests
pytest