Rick Lupton 81689300f9 Add log level option to convert_data.py 9 months ago
..
Convert_to_mass.py ebd807d628 Initial code 1 year ago
README.md ebd807d628 Initial code 1 year ago
convert_data.py 81689300f9 Add log level option to convert_data.py 9 months ago
doit_utils.py af82ee8668 minor: option for logging from convert_data 1 year ago
geonames.csv 1fe761699a Split bulk csv file and convert split files 1 year ago
load_data.rdfox ebd807d628 Initial code 1 year ago
load_data_geonames.rdfox f235e82e2b Update for RDFox 6.3 using modules 1 year ago
load_data_prodcom.rdfox e50c3dfb6e Update other RDFox scripts for version 6.3 1 year ago
load_data_prodcom_bulk.rdfox 10a3328e3c Merge branch 'master' into rcl-rdfox-modules-6.3 1 year ago
load_data_prodcom_correspondence.rdfox e50c3dfb6e Update other RDFox scripts for version 6.3 1 year ago
load_data_prodcom_list.rdfox e50c3dfb6e Update other RDFox scripts for version 6.3 1 year ago
load_data_units.rdfox f235e82e2b Update for RDFox 6.3 using modules 1 year ago
map_prodcom.dlog 24c613ddd2 Update for ontology changes 1 year ago
map_prodcom_bulk_sold.dlog 42fea3e62c Define prefixes used in Datalog files 9 months ago
map_prodcom_bulk_total.dlog 42fea3e62c Define prefixes used in Datalog files 9 months ago
map_prodcom_correspondence.dlog 7da3095146 Update to fix test queries using object name 1 year ago
map_prodcom_list.dlog 7da3095146 Update to fix test queries using object name 1 year ago
metrics_units.csv 25d2ffbb0c Update scripts to convert PRODCOM years separately 1 year ago
preprocess.py 26cb6c99bc Convert PRODCOM only 1 year ago
split_by_country.py 1fe761699a Split bulk csv file and convert split files 1 year ago
split_by_country_year.py 08615d2a92 Add total production to bulk processing 1 year ago
unit_conversion.dlog 42fea3e62c Define prefixes used in Datalog files 9 months ago

README.md

Running the PRObs system

The PRObs system can perform different operations. Each of them is encoded in a separate RDFox master script.

Modules

Ontology conversion

Converts the Turtle ontology into Functional-Style OWL.

How to execute it:

RDFox sandbox <root> 'exec scripts/ontology-conversion/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data pre-processing

Converts 'raw' data into CSV files for RDFox.

How to execute it:

python Ontologies/scripts/preprocess.py

Data conversion

Reads CSV files, and converts all of them into RDF (probs_original_data).

How to execute it:

RDFox sandbox <root> 'exec scripts/data-conversion/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data validation

Reads the RDF file (probs_original_data), and checks if some constraints are verified.

How to execute it:

RDFox sandbox <root> 'exec scripts/data-validation/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data enhancement

Reads the RDF file (probs_original_data), runs all the rules, and converts all of them into RDF.

How to execute it:

RDFox sandbox <root> 'exec scripts/data-enhancement/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Test queries

Reads the RDF file with the data (probs_enhanced_data), and answers some queries.

How to execute it:

RDFox sandbox <root> 'exec scripts/test-queries/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Reasoning

Reads the RDF file with the data (probs_enhanced_data), adds the reasoning rules, and opens the SPARQL endpoint.

How to execute it:

RDFox sandbox <root> 'exec scripts/reasoning/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Then go to http://localhost:12110/console/default to run your SPARQL queries.

Operations

Get an RDFox-friendly version of the ontology

Simply run the ontology conversion module.

Convert data from CSV files (or other data sources supported by RDFox) to an RDF file compatible with the PRObs ontology

If you need only one load_data file and one map file (this should generally be the case):

  1. Overwrite the load_data.rdfox and the map.dlog files in "data-conversion" (obviously, keeping the same names)
  2. Run the data conversion module

If you need a more complex loading (it might happen, but we do not have any example of this at the moment):

  1. Overwrite the input.rdfox file in "data-conversion" (obviously, keeping the same name)
  2. Run the data conversion module

Check if the data are "valid"

  1. Run the data validation module
  2. Check that all queries have 0 answers

Add another validation check

  1. Add a file "check_CUSTOM-NAME_rules" with the rules to check
  2. Add a file "check_CUSTOM-NAME_queries" with the ASK queries to check (note that the queries must return no answer if the check is passed)
  3. Add a new command exec check CUSTOM-NAME in the "validate" file

Enhance an RDF file containing data compatible with the PRObs ontology

  1. Save the file as probs_original_data.nt.gz in the data folder
  2. Run the data enhancement module

Run the test queries on an RDF file containing data compatible with the PRObs ontology

  1. Save the file as probs_enhanced_data.nt.gz in the data folder
  2. Run the test queries module

Open an endpoint to run SPARQL queries on an RDF file containing data compatible with the PRObs ontology

  1. Save the file as probs_enhanced_data.nt.gz in the data folder
  2. Run the reasoning module

Execute everything all at once

You should (almost) never need this, but you could achieve it by simply running:

doit

Execute from one specific step onwards

We also provide master-pipeline scripts if you want to execute multiple commands without saving intermediate files.

To achieve this, you need to use master-pipeline version of the module you are interested in instead of the master one.

Relationships

DOT code
digraph G {
    
  subgraph cluster_ontology_conversion {
    label = <<B>Ontology-conversion</B>>;
    colorscheme=paired10;
    color=1;
    
    original_ontology -> ontology_conversion;
    ontology_conversion -> ontology_fss;
  }
  
  subgraph cluster_data_pre_processing {
    label = <<B>Data pre-processing</B>>;
    colorscheme=paired10;
    color=2;
    
    raw_data -> data_pre_processing;
    data_pre_processing -> csv_files;
  }
  
  subgraph cluster_data_conversion {
    label = <<B>Data conversion</B>>;
    colorscheme=paired10;
    color=3;
    
    ontology_fss -> data_conversion;
    csv_files -> data_conversion;
    data_conversion -> probs_original_data;
  }
  
  subgraph cluster_data_enhancement {
    label = <<B>Data enhancement</B>>;
    colorscheme=paired10;
    color=4;
    
    ontology_fss -> data_enhancement;
    probs_original_data -> data_enhancement;
    enhancement_rules -> data_enhancement;
    data_enhancement -> probs_enhanced_data;
  }
  
  subgraph cluster_test_queries {
    label = <<B>Test queries</B>>;
    colorscheme=paired10;
    color=5;
    
    probs_enhanced_data -> test_queries;
    test_queries -> answers;
  }
  
  subgraph cluster_reasoning {
    label = <<B>Reasoning</B>>;
    colorscheme=paired10;
    color=6;
    
    probs_enhanced_data -> reasoning;
    reasoning_rules -> reasoning;
    reasoning -> endpoint;
  }

    original_ontology [label="original ontology", shape=tripleoctagon]
    ontology_conversion [label="ontology conversion", shape=rectangle]
    ontology_fss [label="Functional-Style OWL ontology", shape=tripleoctagon]

    raw_data [label="raw_data", shape=cylinder]
    data_pre_processing [label="data pre-processing", shape=rectangle]
    csv_files [label="data", shape=folder]

    data_conversion [label="data conversion", shape=rectangle]
    probs_original_data [label="probs_original_data", shape=tripleoctagon]

    enhancement_rules [label="enhancement rules", shape=hexagon]
    data_enhancement [label="data enhancement", shape=rectangle]
    probs_enhanced_data [label="probs_enhanced_data", shape=tripleoctagon]

    test_queries [label="test queries", shape=rectangle]
    answers [label="output", shape=folder]

    reasoning_rules [label="reasoning/rules", shape=hexagon]
    reasoning [label="reasoning", shape=rectangle]
    endpoint [label="endpoint", shape=component]

  subgraph cluster_legend {
    label = <<B>Legend</B>>;
    colorscheme=paired10;
    color=7;
    // rankdir=TB;
    // {rank = same; rdf_owl process }
    
    rdf_owl [label="RDF/OWL", shape=tripleoctagon]
    process [label="Process", shape=rectangle]
    datasource [label="Datasource", shape=cylinder]
    folder [label="Multiple files", shape=folder]
    datalog [label="Datalog", shape=hexagon]
    rdfox_endpoint [label="RDFox endpoint", shape=component]
    
    edge[style=invis];
    rdf_owl -> process -> datasource -> folder -> datalog -> rdfox_endpoint
  }

}

Relationships diagram