stephenjboyle 7da3095146 Update to fix test queries using object name преди 7 месеца
..
Convert_to_mass.py ebd807d628 Initial code преди 1 година
README.md ebd807d628 Initial code преди 1 година
convert_data.py 08615d2a92 Add total production to bulk processing преди 8 месеца
doit_utils.py af82ee8668 minor: option for logging from convert_data преди 9 месеца
geonames.csv 1fe761699a Split bulk csv file and convert split files преди 10 месеца
load_data.rdfox ebd807d628 Initial code преди 1 година
load_data_geonames.rdfox f235e82e2b Update for RDFox 6.3 using modules преди 9 месеца
load_data_prodcom.rdfox e50c3dfb6e Update other RDFox scripts for version 6.3 преди 9 месеца
load_data_prodcom_bulk.rdfox 10a3328e3c Merge branch 'master' into rcl-rdfox-modules-6.3 преди 7 месеца
load_data_prodcom_correspondence.rdfox e50c3dfb6e Update other RDFox scripts for version 6.3 преди 9 месеца
load_data_prodcom_list.rdfox e50c3dfb6e Update other RDFox scripts for version 6.3 преди 9 месеца
load_data_units.rdfox f235e82e2b Update for RDFox 6.3 using modules преди 9 месеца
map_prodcom.dlog 24c613ddd2 Update for ontology changes преди 8 месеца
map_prodcom_bulk_sold.dlog 10a3328e3c Merge branch 'master' into rcl-rdfox-modules-6.3 преди 7 месеца
map_prodcom_bulk_total.dlog 10a3328e3c Merge branch 'master' into rcl-rdfox-modules-6.3 преди 7 месеца
map_prodcom_correspondence.dlog 7da3095146 Update to fix test queries using object name преди 7 месеца
map_prodcom_list.dlog 7da3095146 Update to fix test queries using object name преди 7 месеца
metrics_units.csv 25d2ffbb0c Update scripts to convert PRODCOM years separately преди 1 година
preprocess.py 26cb6c99bc Convert PRODCOM only преди 1 година
split_by_country.py 1fe761699a Split bulk csv file and convert split files преди 10 месеца
split_by_country_year.py 08615d2a92 Add total production to bulk processing преди 8 месеца
unit_conversion.dlog 24c613ddd2 Update for ontology changes преди 8 месеца

README.md

Running the PRObs system

The PRObs system can perform different operations. Each of them is encoded in a separate RDFox master script.

Modules

Ontology conversion

Converts the Turtle ontology into Functional-Style OWL.

How to execute it:

RDFox sandbox <root> 'exec scripts/ontology-conversion/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data pre-processing

Converts 'raw' data into CSV files for RDFox.

How to execute it:

python Ontologies/scripts/preprocess.py

Data conversion

Reads CSV files, and converts all of them into RDF (probs_original_data).

How to execute it:

RDFox sandbox <root> 'exec scripts/data-conversion/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data validation

Reads the RDF file (probs_original_data), and checks if some constraints are verified.

How to execute it:

RDFox sandbox <root> 'exec scripts/data-validation/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Data enhancement

Reads the RDF file (probs_original_data), runs all the rules, and converts all of them into RDF.

How to execute it:

RDFox sandbox <root> 'exec scripts/data-enhancement/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Test queries

Reads the RDF file with the data (probs_enhanced_data), and answers some queries.

How to execute it:

RDFox sandbox <root> 'exec scripts/test-queries/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Reasoning

Reads the RDF file with the data (probs_enhanced_data), adds the reasoning rules, and opens the SPARQL endpoint.

How to execute it:

RDFox sandbox <root> 'exec scripts/reasoning/master'

where <root> is the path to the "Ontologies" folder (. if you are inside it).

Then go to http://localhost:12110/console/default to run your SPARQL queries.

Operations

Get an RDFox-friendly version of the ontology

Simply run the ontology conversion module.

Convert data from CSV files (or other data sources supported by RDFox) to an RDF file compatible with the PRObs ontology

If you need only one load_data file and one map file (this should generally be the case):

  1. Overwrite the load_data.rdfox and the map.dlog files in "data-conversion" (obviously, keeping the same names)
  2. Run the data conversion module

If you need a more complex loading (it might happen, but we do not have any example of this at the moment):

  1. Overwrite the input.rdfox file in "data-conversion" (obviously, keeping the same name)
  2. Run the data conversion module

Check if the data are "valid"

  1. Run the data validation module
  2. Check that all queries have 0 answers

Add another validation check

  1. Add a file "check_CUSTOM-NAME_rules" with the rules to check
  2. Add a file "check_CUSTOM-NAME_queries" with the ASK queries to check (note that the queries must return no answer if the check is passed)
  3. Add a new command exec check CUSTOM-NAME in the "validate" file

Enhance an RDF file containing data compatible with the PRObs ontology

  1. Save the file as probs_original_data.nt.gz in the data folder
  2. Run the data enhancement module

Run the test queries on an RDF file containing data compatible with the PRObs ontology

  1. Save the file as probs_enhanced_data.nt.gz in the data folder
  2. Run the test queries module

Open an endpoint to run SPARQL queries on an RDF file containing data compatible with the PRObs ontology

  1. Save the file as probs_enhanced_data.nt.gz in the data folder
  2. Run the reasoning module

Execute everything all at once

You should (almost) never need this, but you could achieve it by simply running:

doit

Execute from one specific step onwards

We also provide master-pipeline scripts if you want to execute multiple commands without saving intermediate files.

To achieve this, you need to use master-pipeline version of the module you are interested in instead of the master one.

Relationships

DOT code
digraph G {
    
  subgraph cluster_ontology_conversion {
    label = <<B>Ontology-conversion</B>>;
    colorscheme=paired10;
    color=1;
    
    original_ontology -> ontology_conversion;
    ontology_conversion -> ontology_fss;
  }
  
  subgraph cluster_data_pre_processing {
    label = <<B>Data pre-processing</B>>;
    colorscheme=paired10;
    color=2;
    
    raw_data -> data_pre_processing;
    data_pre_processing -> csv_files;
  }
  
  subgraph cluster_data_conversion {
    label = <<B>Data conversion</B>>;
    colorscheme=paired10;
    color=3;
    
    ontology_fss -> data_conversion;
    csv_files -> data_conversion;
    data_conversion -> probs_original_data;
  }
  
  subgraph cluster_data_enhancement {
    label = <<B>Data enhancement</B>>;
    colorscheme=paired10;
    color=4;
    
    ontology_fss -> data_enhancement;
    probs_original_data -> data_enhancement;
    enhancement_rules -> data_enhancement;
    data_enhancement -> probs_enhanced_data;
  }
  
  subgraph cluster_test_queries {
    label = <<B>Test queries</B>>;
    colorscheme=paired10;
    color=5;
    
    probs_enhanced_data -> test_queries;
    test_queries -> answers;
  }
  
  subgraph cluster_reasoning {
    label = <<B>Reasoning</B>>;
    colorscheme=paired10;
    color=6;
    
    probs_enhanced_data -> reasoning;
    reasoning_rules -> reasoning;
    reasoning -> endpoint;
  }

    original_ontology [label="original ontology", shape=tripleoctagon]
    ontology_conversion [label="ontology conversion", shape=rectangle]
    ontology_fss [label="Functional-Style OWL ontology", shape=tripleoctagon]

    raw_data [label="raw_data", shape=cylinder]
    data_pre_processing [label="data pre-processing", shape=rectangle]
    csv_files [label="data", shape=folder]

    data_conversion [label="data conversion", shape=rectangle]
    probs_original_data [label="probs_original_data", shape=tripleoctagon]

    enhancement_rules [label="enhancement rules", shape=hexagon]
    data_enhancement [label="data enhancement", shape=rectangle]
    probs_enhanced_data [label="probs_enhanced_data", shape=tripleoctagon]

    test_queries [label="test queries", shape=rectangle]
    answers [label="output", shape=folder]

    reasoning_rules [label="reasoning/rules", shape=hexagon]
    reasoning [label="reasoning", shape=rectangle]
    endpoint [label="endpoint", shape=component]

  subgraph cluster_legend {
    label = <<B>Legend</B>>;
    colorscheme=paired10;
    color=7;
    // rankdir=TB;
    // {rank = same; rdf_owl process }
    
    rdf_owl [label="RDF/OWL", shape=tripleoctagon]
    process [label="Process", shape=rectangle]
    datasource [label="Datasource", shape=cylinder]
    folder [label="Multiple files", shape=folder]
    datalog [label="Datalog", shape=hexagon]
    rdfox_endpoint [label="RDFox endpoint", shape=component]
    
    edge[style=invis];
    rdf_owl -> process -> datasource -> folder -> datalog -> rdfox_endpoint
  }

}

Relationships diagram