PRODCOM data converted with the PRObs ontology

Rick Lupton d5c3f42535 Add RDFox scripts/rules to load PRODCOM bulk data 10 ヶ月 前
.datalad e031539dc6 [DATALAD] new dataset 1 年間 前
data 76aa8bb89a Update defs.dlog files to generate prefixes 10 ヶ月 前
ontology a2cfb9daef Delete copy of ontology file 10 ヶ月 前
outputs b6fc3d8bbc updated RDF files using w3id prefix 10 ヶ月 前
raw_data d5c3f42535 Add RDFox scripts/rules to load PRODCOM bulk data 10 ヶ月 前
scripts d5c3f42535 Add RDFox scripts/rules to load PRODCOM bulk data 10 ヶ月 前
tests 600d42a278 Change prefixes to w3id.org 10 ヶ月 前
.gitattributes 414714a038 Instruct annex to add text files to Git 1 年間 前
.gitignore e2e79c1544 Converted to datalad 1 年間 前
DEVELOPING.md a2d6ab5323 Remove COMTRADE references 1 年間 前
README.md a2d6ab5323 Remove COMTRADE references 1 年間 前
dodo.py 26cb6c99bc Convert PRODCOM only 1 年間 前
environment.yml 600d42a278 Change prefixes to w3id.org 10 ヶ月 前

README.md

PRODCOM data as PRObs Observations

This repository converts data from the PRODCOM database into a structure defined by the Physical Resources Observatory (PRObs) ontology.

See DEVELOPING.md for more information about using this repository.

Dataset structure

  • Repository is a datalad dataset
  • Input data files needing preprocessing are located in raw_data/.
  • Preprocessed data files ready for conversion are located in data/.
  • All custom code is located in scripts/.
  • Converted data is saved to outputs/.

Installation

Getting the code

To clone the datalad dataset, in a shell/command window (e.g. git-bash) type:

datalad clone https://github.com/probs-lab/prodcom-data.git

Setting up the virtual environment and installing dependencies:

To create a virtual environment using conda/miniconda:

cd prodcom-data
conda env create

Running the code

After installation:

  • Open a terminal / git-bash window
  • Navigate to prodcom-data folder, e.g. cd prodcom-data
  • Activate environment using conda activate prodcom-data

To download the example output data files from the server use:

datalad get outputs

To preprocess input data files run the script:

doit run preprocess

To convert the preprocessed data in the data folder run:

doit run convert_data

To run all necessary tasks (i.e. preprocessing and conversion) simply run:

doit

Individual files can be converted by running the convert_data.py script with appropriate parameters specifying the file type and the input and output filenames:

scripts/convert_data.py prodcom data/PRODCOM2016DATA.csv outputs/PRODCOM2016DATA.nt.gz

For conversion of the example PRODCOM data files the type prodcom should be specified. Types prodcom_list and prodcom_correspondence are also defined.

Converting new data

For conversion of new data files (possibly in a different format from the examples) see the DEVELOPING.md file.

Testing the code

To test the code, after installing the software and running the doit script:

cd tests
pytest