stephenjboyle 7da3095146 Update to fix test queries using object name		преди 7 месеца
..
Convert_to_mass.py	ebd807d628 Initial code	преди 1 година
README.md	ebd807d628 Initial code	преди 1 година
convert_data.py	08615d2a92 Add total production to bulk processing	преди 8 месеца
doit_utils.py	af82ee8668 minor: option for logging from convert_data	преди 9 месеца
geonames.csv	1fe761699a Split bulk csv file and convert split files	преди 10 месеца
load_data.rdfox	ebd807d628 Initial code	преди 1 година
load_data_geonames.rdfox	f235e82e2b Update for RDFox 6.3 using modules	преди 9 месеца
load_data_prodcom.rdfox	e50c3dfb6e Update other RDFox scripts for version 6.3	преди 9 месеца
load_data_prodcom_bulk.rdfox	10a3328e3c Merge branch 'master' into rcl-rdfox-modules-6.3	преди 7 месеца
load_data_prodcom_correspondence.rdfox	e50c3dfb6e Update other RDFox scripts for version 6.3	преди 9 месеца
load_data_prodcom_list.rdfox	e50c3dfb6e Update other RDFox scripts for version 6.3	преди 9 месеца
load_data_units.rdfox	f235e82e2b Update for RDFox 6.3 using modules	преди 9 месеца
map_prodcom.dlog	24c613ddd2 Update for ontology changes	преди 8 месеца
map_prodcom_bulk_sold.dlog	10a3328e3c Merge branch 'master' into rcl-rdfox-modules-6.3	преди 7 месеца
map_prodcom_bulk_total.dlog	10a3328e3c Merge branch 'master' into rcl-rdfox-modules-6.3	преди 7 месеца
map_prodcom_correspondence.dlog	7da3095146 Update to fix test queries using object name	преди 7 месеца
map_prodcom_list.dlog	7da3095146 Update to fix test queries using object name	преди 7 месеца
metrics_units.csv	25d2ffbb0c Update scripts to convert PRODCOM years separately	преди 1 година
preprocess.py	26cb6c99bc Convert PRODCOM only	преди 1 година
split_by_country.py	1fe761699a Split bulk csv file and convert split files	преди 10 месеца
split_by_country_year.py	08615d2a92 Add total production to bulk processing	преди 8 месеца
unit_conversion.dlog	24c613ddd2 Update for ontology changes	преди 8 месеца

Running the PRObs system

The PRObs system can perform different operations. Each of them is encoded in a separate RDFox master script.

Modules

Ontology conversion

Converts the Turtle ontology into Functional-Style OWL.

How to execute it:

RDFox sandbox <root> 'exec scripts/ontology-conversion/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data pre-processing

Converts 'raw' data into CSV files for RDFox.

How to execute it:

python Ontologies/scripts/preprocess.py

Data conversion

Reads CSV files, and converts all of them into RDF (probs_original_data).

How to execute it:

RDFox sandbox <root> 'exec scripts/data-conversion/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data validation

Reads the RDF file (probs_original_data), and checks if some constraints are verified.

How to execute it:

RDFox sandbox <root> 'exec scripts/data-validation/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data enhancement

Reads the RDF file (probs_original_data), runs all the rules, and converts all of them into RDF.

How to execute it:

RDFox sandbox <root> 'exec scripts/data-enhancement/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Test queries

Reads the RDF file with the data (probs_enhanced_data), and answers some queries.

How to execute it:

RDFox sandbox <root> 'exec scripts/test-queries/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Reasoning

Reads the RDF file with the data (probs_enhanced_data), adds the reasoning rules, and opens the SPARQL endpoint.

How to execute it:

RDFox sandbox <root> 'exec scripts/reasoning/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Then go to http://localhost:12110/console/default to run your SPARQL queries.

Operations

Get an RDFox-friendly version of the ontology

Simply run the ontology conversion module.

Convert data from CSV files (or other data sources supported by RDFox) to an RDF file compatible with the PRObs ontology

If you need only one load_data file and one map file (this should generally be the case):

Overwrite the load_data.rdfox and the map.dlog files in "data-conversion" (obviously, keeping the same names)
Run the data conversion module

If you need a more complex loading (it might happen, but we do not have any example of this at the moment):

Overwrite the input.rdfox file in "data-conversion" (obviously, keeping the same name)
Run the data conversion module

Check if the data are "valid"

Run the data validation module
Check that all queries have 0 answers

Add another validation check

Add a file "check_CUSTOM-NAME_rules" with the rules to check
Add a file "check_CUSTOM-NAME_queries" with the ASK queries to check (note that the queries must return no answer if the check is passed)
Add a new command exec check CUSTOM-NAME in the "validate" file

Enhance an RDF file containing data compatible with the PRObs ontology

Save the file as probs_original_data.nt.gz in the data folder
Run the data enhancement module

Run the test queries on an RDF file containing data compatible with the PRObs ontology

Save the file as probs_enhanced_data.nt.gz in the data folder
Run the test queries module

Open an endpoint to run SPARQL queries on an RDF file containing data compatible with the PRObs ontology

Save the file as probs_enhanced_data.nt.gz in the data folder
Run the reasoning module

Execute everything all at once

You should (almost) never need this, but you could achieve it by simply running:

doit

Execute from one specific step onwards

We also provide master-pipeline scripts if you want to execute multiple commands without saving intermediate files.

To achieve this, you need to use master-pipeline version of the module you are interested in instead of the master one.

Relationships

DOT code

digraph G {
    
  subgraph cluster_ontology_conversion {
    label = <<B>Ontology-conversion</B>>;
    colorscheme=paired10;
    color=1;
    
    original_ontology -> ontology_conversion;
    ontology_conversion -> ontology_fss;
  }
  
  subgraph cluster_data_pre_processing {
    label = <<B>Data pre-processing</B>>;
    colorscheme=paired10;
    color=2;
    
    raw_data -> data_pre_processing;
    data_pre_processing -> csv_files;
  }
  
  subgraph cluster_data_conversion {
    label = <<B>Data conversion</B>>;
    colorscheme=paired10;
    color=3;
    
    ontology_fss -> data_conversion;
    csv_files -> data_conversion;
    data_conversion -> probs_original_data;
  }
  
  subgraph cluster_data_enhancement {
    label = <<B>Data enhancement</B>>;
    colorscheme=paired10;
    color=4;
    
    ontology_fss -> data_enhancement;
    probs_original_data -> data_enhancement;
    enhancement_rules -> data_enhancement;
    data_enhancement -> probs_enhanced_data;
  }
  
  subgraph cluster_test_queries {
    label = <<B>Test queries</B>>;
    colorscheme=paired10;
    color=5;
    
    probs_enhanced_data -> test_queries;
    test_queries -> answers;
  }
  
  subgraph cluster_reasoning {
    label = <<B>Reasoning</B>>;
    colorscheme=paired10;
    color=6;
    
    probs_enhanced_data -> reasoning;
    reasoning_rules -> reasoning;
    reasoning -> endpoint;
  }

    original_ontology [label="original ontology", shape=tripleoctagon]
    ontology_conversion [label="ontology conversion", shape=rectangle]
    ontology_fss [label="Functional-Style OWL ontology", shape=tripleoctagon]

    raw_data [label="raw_data", shape=cylinder]
    data_pre_processing [label="data pre-processing", shape=rectangle]
    csv_files [label="data", shape=folder]

    data_conversion [label="data conversion", shape=rectangle]
    probs_original_data [label="probs_original_data", shape=tripleoctagon]

    enhancement_rules [label="enhancement rules", shape=hexagon]
    data_enhancement [label="data enhancement", shape=rectangle]
    probs_enhanced_data [label="probs_enhanced_data", shape=tripleoctagon]

    test_queries [label="test queries", shape=rectangle]
    answers [label="output", shape=folder]

    reasoning_rules [label="reasoning/rules", shape=hexagon]
    reasoning [label="reasoning", shape=rectangle]
    endpoint [label="endpoint", shape=component]

  subgraph cluster_legend {
    label = <<B>Legend</B>>;
    colorscheme=paired10;
    color=7;
    // rankdir=TB;
    // {rank = same; rdf_owl process }
    
    rdf_owl [label="RDF/OWL", shape=tripleoctagon]
    process [label="Process", shape=rectangle]
    datasource [label="Datasource", shape=cylinder]
    folder [label="Multiple files", shape=folder]
    datalog [label="Datalog", shape=hexagon]
    rdfox_endpoint [label="RDFox endpoint", shape=component]
    
    edge[style=invis];
    rdf_owl -> process -> datasource -> folder -> datalog -> rdfox_endpoint
  }

}

README.md

Running the PRObs system

Modules

Ontology conversion

Data pre-processing

Data conversion

Data validation

Data enhancement

Test queries

Reasoning

Operations

Get an RDFox-friendly version of the ontology

Convert data from CSV files (or other data sources supported by RDFox) to an RDF file compatible with the PRObs ontology

Check if the data are "valid"

Add another validation check

Enhance an RDF file containing data compatible with the PRObs ontology

Run the test queries on an RDF file containing data compatible with the PRObs ontology

Open an endpoint to run SPARQL queries on an RDF file containing data compatible with the PRObs ontology

Execute everything all at once

Execute from one specific step onwards

Relationships