PRODCOM data converted with the PRObs ontology

stephenjboyle ef3f58e9a7 Git attributes for bulk data files 7 mēneši atpakaļ
.datalad e031539dc6 [DATALAD] new dataset 1 gadu atpakaļ
bulk_data b74ec9e9cf Add csv bulk_data csv files 7 mēneši atpakaļ
data 24c613ddd2 Update for ontology changes 7 mēneši atpakaļ
ontology a2cfb9daef Delete copy of ontology file 10 mēneši atpakaļ
outputs c7d0c2a6d9 Add Portugal data 9 mēneši atpakaļ
raw_data d5c3f42535 Add RDFox scripts/rules to load PRODCOM bulk data 10 mēneši atpakaļ
scripts 7da3095146 Update to fix test queries using object name 7 mēneši atpakaļ
tests 7f2c76ece0 Modify bulk tests to use object codes 7 mēneši atpakaļ
.gitattributes ef3f58e9a7 Git attributes for bulk data files 7 mēneši atpakaļ
.gitignore e2e79c1544 Converted to datalad 1 gadu atpakaļ
DEVELOPING.md a2d6ab5323 Remove COMTRADE references 1 gadu atpakaļ
README.md a2d6ab5323 Remove COMTRADE references 1 gadu atpakaļ
dodo.py 08615d2a92 Add total production to bulk processing 7 mēneši atpakaļ
environment.yml 7da3095146 Update to fix test queries using object name 7 mēneši atpakaļ

README.md

PRODCOM data as PRObs Observations

This repository converts data from the PRODCOM database into a structure defined by the Physical Resources Observatory (PRObs) ontology.

See DEVELOPING.md for more information about using this repository.

Dataset structure

  • Repository is a datalad dataset
  • Input data files needing preprocessing are located in raw_data/.
  • Preprocessed data files ready for conversion are located in data/.
  • All custom code is located in scripts/.
  • Converted data is saved to outputs/.

Installation

Getting the code

To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:

datalad clone https://github.com/probs-lab/prodcom-data.git

Setting up the virtual environment and installing dependencies:

To create a virtual environment using conda/miniconda:

cd prodcom-data
conda env create

Running the code

After installation:

  • Open a terminal / git-bash window
  • Navigate to prodcom-data folder, e.g. cd prodcom-data
  • Activate environment using conda activate prodcom-data

To download the example output data files from the server use:

datalad get outputs

To preprocess input data files run the script:

doit run preprocess

To convert the preprocessed data in the data folder run:

doit run convert_data

To run all necessary tasks (i.e. preprocessing and conversion) simply run:

doit

Individual files can be converted by running the convert_data.py script with appropriate parameters specifying the file type and the input and output filenames:

scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz

For conversion of the example PRODCOM data files the type prodcom should be specified. Types prodcom_list and prodcom_correspondence are also defined.

Converting new data

For conversion of new data files (possibly in a different format from the examples) see the DEVELOPING.md file.

Testing the code

To test the code, after installing the software and running the doit script:

cd tests
pytest