This repository converts data from the PRODCOM database into a structure defined by the Physical Resources Observatory (PRObs) ontology.
See DEVELOPING.md for more information about using this repository.
raw_data/
.data/
.scripts/
.outputs/
.To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:
datalad clone https://github.com/probs-lab/prodcom-data.git
To create a virtual environment using conda/miniconda:
cd prodcom-data
conda env create
After installation:
prodcom-data
folder, e.g. cd prodcom-data
conda activate prodcom-data
To download the example RDF output files (outputs/sold_production
and outputs/total_production
) from the server use:
datalad get outputs
To download the input csv files used to generate the output files use:
datalad get bulk_data
These files have been generated from bulk csv files downloaded from the eurostat website, which have been split into files for each country and for each year (see DEVELOPING.md).
The dodo.py
script can be used to preprocess the files in raw_data
and convert the files in the data
and bulk_data
folders:
To preprocess input data files run the script:
doit run preprocess
To convert the preprocessed data in the data
folder run:
doit run convert_data
To convert all files in the bulk_data
folder run:
doit convert_bulk
To run all necessary tasks (i.e. preprocessing and conversion) simply run:
doit
Individual files can be converted by running the convert_data.py
script with appropriate parameters specifying the file type and the input and output filenames:
scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz
For conversion of the example PRODCOM data files in folder raw_data
the type prodcom
should be specified. Types prodcom_list
and prodcom_correspondence
are also defined, along with prodcom_bulk_sold
and prodcom_bulk_total
(for processing bulk files in folder bulk_data
).
For conversion of new data files (possibly in a different format from the examples) see the DEVELOPING.md file.
To test the code, after installing the software and running the doit
script:
cd tests
pytest