# What is Semantic Web and RDF?

**RDF (Resource Description Framework)** is one of the three foundational [Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) technologies, the other two being SPARQL and OWL.

In particular, RDF is the data model of the Semantic Web. That means that all data in Semantic Web technologies is represented as RDF. If you store Semantic Web data, it's in RDF. If you query Semantic Web data (typically using SPARQL), it's RDF data. If you send Semantic Web data to your friend, it's RDF.

RDF data model is based upon the idea of making statements about resources (in particular web resources) in the form of *subject–predicate–object* expressions, known as [*triples*](https://en.wikipedia.org/wiki/Semantic_triple). The *subject* denotes the resource, and the *predicate* denotes traits or aspects of the resource, and expresses a relationship between the *subject* and the *object*.

For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a **subject** denoting *"the sky"*, a **predicate** denoting *"has the color"*, and an **object** denoting *"blue"*. Therefore, RDF uses subject instead of object(or entity) in contrast to the typical approach of an entity–attribute–value model in object-oriented design: entity (sky), attribute (color), and value (blue). <br>
(Resource Description Framework, Wikipedia, 2017)

![RDF_example_graph.png](RDF_example_graph.png)

Find out more: <br>
- http://fast.wistia.net/embed/iframe/8nm9xf4jip?popover=true <br>
- https://en.wikipedia.org/wiki/Resource_Description_Framework <br>
- https://www.cambridgesemantics.com/semantic-university/rdf-101 <br>
- http://www.cambridgesemantics.com/semantic-university/introduction-semantic-web-0

# RDF<->odML converter

Here we will explore RDF-odML and odML-RDF conversion in `odml/tools/rdf_converter.py` module.

If you are new python odML please read the tutorial first:
https://g-node.github.io/python-odml/tutorial.html

Let's create the example odML document.

In [1]:
import os
os.chdir('..')

import odml
import datetime

doc = odml.Document(author="D. N. Adams",
                    date=datetime.date(1979, 10, 12))

# CREATE AND APPEND THE MAIN SECTIONs
doc.append(odml.Section(name="Arthur Philip Dent",
                           type="crew/person",
                           definition="Information on Arthur Dent"))

# SET NEW PARENT NODE
parent = doc['Arthur Philip Dent']


# APPEND PROPERTIES WITH VALUES
parent.append(odml.Property(name="Species",
                            value="Human",
                            dtype=odml.DType.string,
                            definition="Species to which subject belongs to"))

## RDFWriter class

RDFWriter class is used for conversion documents from odML to one of the supported RDF formats:<br>
'xml', 'pretty-xml', 'trix', 'n3', 'turtle', 'ttl', 'ntriples', 'nt', 'nt11', 'trig', 'json-ld'.<br>
Both one document or list of multiple documents can be passed to `RDFWriter()` constructor.

It's possible to get the output as a string.

In [2]:
from odml.tools.rdf_converter import RDFWriter

print(RDFWriter(doc).get_rdf_str('turtle'))

@prefix odml: <https://g-node.org/projects/odml-rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

odml:Hub odml:hasDocument <https://g-node.org/projects/odml-rdf#02e1d29e-937d-4de7-a83e-3e756d954c92> .

<https://g-node.org/projects/odml-rdf#02e1d29e-937d-4de7-a83e-3e756d954c92> a odml:Document ;
    odml:hasAuthor "D. N. Adams" ;
    odml:hasDate "1979-10-12"^^xsd:date ;
    odml:hasSection odml:f3de1e21-f6f5-4eae-8f58-db94ee10f812 .

<https://g-node.org/projects/odml-rdf#8e59c55a-ac69-4b71-b101-61f3b8b8590f> a rdf:Bag ;
    rdf:li "Human" .

odml:c46a5ee8-811a-4947-8e4b-7f164fbf4c8a a odml:Property ;
    odml:hasDefinition "Species to which subject belongs to" ;
    odml:hasDtype "string" ;
    odml:hasName "Species" ;
    odml:hasValue <https://g-node.org/projects/odml-rdf#8e59c55a-ac69-4b71-b101-61f3b8b85

Or write the output to the specified file.

In [3]:
import tempfile
import os

# Create temporary file
f = tempfile.NamedTemporaryFile(mode='w', suffix=".ttl")
path = f.name

RDFWriter(doc).write_file(path, "turtle")

with open(path) as ff:
    data = ff.read()
    print(data)

f.close()

@prefix odml: <https://g-node.org/projects/odml-rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

odml:Hub odml:hasDocument <https://g-node.org/projects/odml-rdf#02e1d29e-937d-4de7-a83e-3e756d954c92> .

<https://g-node.org/projects/odml-rdf#02e1d29e-937d-4de7-a83e-3e756d954c92> a odml:Document ;
    odml:hasAuthor "D. N. Adams" ;
    odml:hasDate "1979-10-12"^^xsd:date ;
    odml:hasSection odml:f3de1e21-f6f5-4eae-8f58-db94ee10f812 .

odml:c46a5ee8-811a-4947-8e4b-7f164fbf4c8a a odml:Property ;
    odml:hasDefinition "Species to which subject belongs to" ;
    odml:hasDtype "string" ;
    odml:hasName "Species" ;
    odml:hasValue odml:ddde531a-663a-46f5-b474-edbc73254077 .

odml:ddde531a-663a-46f5-b474-edbc73254077 a rdf:Bag ;
    rdf:li "Human" .

odml:f3de1e21-f6f5-4eae-8f58-db94ee10f812 a odml:Section ;
 

## RDFReader class

RDFReader class enables RDF to odML conversion.

There are 2 ways to obtain objects with converted odML documents:
- from **RDF file**  ( `RDFReader().from_file("/path_to_input_rdf", "rdf_format")` )
- from **RDF string**  ( `RDFReader().from_string("rdf file as a string", "rdf_format")` )

In [4]:
from odml.tools.rdf_converter import RDFReader

rdf_file = RDFWriter(doc).get_rdf_str('turtle')
odml_doc = RDFReader().from_string(rdf_file, "turtle")

print(odml_doc)

[<Doc None by D. N. Adams (1 sections)>]


In [5]:
# Create temporary file
rdf_file = tempfile.NamedTemporaryFile(mode='w', suffix=".ttl")
rdf_path = rdf_file.name
RDFWriter(doc).write_file(rdf_path, "turtle")

odml_doc = RDFReader().from_file(rdf_path, "turtle")

print(odml_doc)

[<Doc None by D. N. Adams (1 sections)>]


Another option is to write the output to one or multiple files. <br>
`RDFReader().write_file("/input_path", "rdf_format", "/output_path_to_file")`

In [6]:
# If RDF file contains one odML document, specify output path as file
odml_file = tempfile.NamedTemporaryFile(mode='w', suffix=".odml")
odml_path = odml_file.name

RDFReader().write_file(rdf_path, "turtle", odml_path)

with open(odml_path) as ff:
    data = ff.read()
    print(data)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet  type="text/xsl" href="odmlTerms.xsl"?>
<?xml-stylesheet  type="text/xsl" href="odml.xsl"?>
<odML version="1.1">
  <section>
    <name>Arthur Philip Dent</name>
    <id>f3de1e21-f6f5-4eae-8f58-db94ee10f812</id>
    <property>
      <name>Species</name>
      <id>c46a5ee8-811a-4947-8e4b-7f164fbf4c8a</id>
      <value>[Human]</value>
      <definition>Species to which subject belongs to</definition>
      <type>string</type>
    </property>
    <definition>Information on Arthur Dent</definition>
    <type>crew/person</type>
  </section>
  <id>02e1d29e-937d-4de7-a83e-3e756d954c92</id>
  <date>1979-10-12</date>
  <author>D. N. Adams</author>
</odML>



If RDF file contains several odML docs, specify output path as a directory.<br>
`RDFReader().write_file("/input_path", "rdf_format", "/output_path_to_directory")`

Module creates files in specified directory and writes parsed docs to them.
Example of created file: `/<dir_path>/doc_<id>.odml`
(`<id>` - id of the document).

## Quering the data with rdflib and SPARQL

In [3]:
# please run the first code snipet to change working directory if you have
# [Errno 2] No such file or directory: '/home/rick/g-node/python-odml/doc/doc/example_rdfs/example_data/'
# or insert this line after `import os`: `os.chdir('..')` below
from rdflib import Graph
import os

graph = Graph()
input_dir = os.path.join(os.getcwd(), 'doc/example_rdfs/example_data/')
for file_name in os.listdir(input_dir):
    f = os.path.join(input_dir, file_name)
    if os.path.isfile(f):
        graph.parse(f, format="turtle")
print('Total number of triples: ', len(graph))

Total number of triples:  3041


Quick video about what is SPARQL: https://www.youtuboe.com/watch?v=FvGndkpa4K0 <br> <br>
Example query using rdflib tool to find each section with type `Recording`, that has property with the name `Recording duration` and prints its value:

In [3]:
from rdflib import Graph, Namespace, RDF
from rdflib.plugins.sparql import prepareQuery

q = prepareQuery("""SELECT ?d ?s ?p ?value WHERE {
    ?d odml:hasSection ?s .
    ?s rdf:type odml:Section .
    ?s odml:hasType "Recording" .
    ?s odml:hasProperty ?p .
    ?p rdf:type odml:Property .
    ?p odml:hasName "Recording duration" .
    ?p odml:hasValue ?v .
    ?v rdf:type rdf:Bag .
    ?v rdf:li ?value .}""", initNs={"odml": Namespace("https://g-node.org/projects/odml-rdf#"),
                          "rdf": RDF})

for row in graph.query(q):
    print("Doc: {0}, Sec: {1}, \n"
          "Prop: {2}, Val:{3}".format(row.d, row.s, row.p, row.value))

Doc: https://g-node.org/projects/odml-rdf#cd24b60f-1d5e-4040-9881-5e5a597baef7, Sec: https://g-node.org/projects/odml-rdf#782bd29d-e4b0-4c14-a417-1772a4851ffd, 
Prop: https://g-node.org/projects/odml-rdf#9aeede78-678c-4db8-acb5-fbd6d408b762, Val:13.9
Doc: https://g-node.org/projects/odml-rdf#537c6cc8-7dfe-4d53-a111-24b3ce0f3c1a, Sec: https://g-node.org/projects/odml-rdf#346773f2-abee-4892-b052-840ddcff35ee, 
Prop: https://g-node.org/projects/odml-rdf#1636af03-8e97-4ef2-9d7d-6c7db23dcd02, Val:11.88
Doc: https://g-node.org/projects/odml-rdf#24066355-1ee8-4eb5-a715-96bbb6231cd5, Sec: https://g-node.org/projects/odml-rdf#bbd44815-5016-49e0-9f4b-5b83778d00de, 
Prop: https://g-node.org/projects/odml-rdf#0ed215a2-5d20-48eb-b744-bf3b731459fc, Val:0.33
Doc: https://g-node.org/projects/odml-rdf#cc66e78a-3742-490a-9fdb-1c66761d7652, Sec: https://g-node.org/projects/odml-rdf#5365f7e5-603c-4154-a5ea-33bb1a07a956, 
Prop: https://g-node.org/projects/odml-rdf#41316903-80f1-45a3-9b06-400a02903531, Val:

## FuzzyFinder class

**FuzzyFinder** is the tool for querying graph through *fuzzy* queries. The finder executes multiple queries to better match input parameters and returns sets of triples, prioritized from more to less amount of matched parameters. <br>

The function `find()` accepts several oprtional parameters.
- `graph`: rdflib graph object
- `q_str`: fuzzy query string, we explore it later
- `q_params`: dict object with parameters of a query
- `mode`: default 'fuzzy' and 'match'

Each mode works with specific type of fuzzy query (`q_str`).
Let's see on the `match` mode in the example:

In [4]:
from odml.tools.fuzzy_finder import FuzzyFinder

query_string = 'prop(name:Date) section(name:Recording-2013-02-08-ak, type:Recording)'

f = FuzzyFinder(graph)
print(f.find(mode='match', q_str=query_string))

SELECT * WHERE {
?d odml:hasSection ?s .
?s rdf:type odml:Section .
?s odml:hasType "Recording" .
?s odml:hasProperty ?p .
?p rdf:type odml:Property .
?p odml:hasName "Date" .
}
Document: https://g-node.org/projects/odml-rdf#cc66e78a-3742-490a-9fdb-1c66761d7652
Property: https://g-node.org/projects/odml-rdf#f1699eb6-4cab-4dd0-9327-120eab2089ae
Section: https://g-node.org/projects/odml-rdf#5365f7e5-603c-4154-a5ea-33bb1a07a956
Document: https://g-node.org/projects/odml-rdf#537c6cc8-7dfe-4d53-a111-24b3ce0f3c1a
Property: https://g-node.org/projects/odml-rdf#138f08f7-23c7-4722-8577-85a6fa633ae1
Section: https://g-node.org/projects/odml-rdf#346773f2-abee-4892-b052-840ddcff35ee
Document: https://g-node.org/projects/odml-rdf#cd24b60f-1d5e-4040-9881-5e5a597baef7
Property: https://g-node.org/projects/odml-rdf#1d6db4ce-87f3-4e9c-b221-e76ba05b2759
Section: https://g-node.org/projects/odml-rdf#782bd29d-e4b0-4c14-a417-1772a4851ffd
Document: https://g-node.org/projects/odml-rdf#24066355-1ee8-4eb5-a71

As you can see from the output, finder builds multiple sparql queries from 'match' queries, executes them and returns some matched results. The first result always represents the most specific query (the biggest combination of input parameters that returned at least one triple).

The query syntax is pretty straightforward. Just write the name of the entity `property`, `section` or `document` (also possible to use shortened names `prop`, `sec` and `doc`) and add attributes with their values inside the parentheses divided by colon.

Example from code: `prop(name:Date) section(name:Recording-2013-02-08-ak, type:Recording)`.
Here we search for sections and properties that `property` has attribute `name` and its value is `Date`.

For building 'match' queries you should need to know exactly for which odML attribute the value(subject) is related. So if you write `prop(name:Date) section(name:Recording, type:Recording-2013-02-08-ak)` the `find()` method would not return any triples with section parameters. Because it's likely that there is no section with type `Recording-2013-02-08-ak`.

Non-odML entities' attributes here also will be ignored (e.g. only `id, author, date, version, repository, sections` can exist in the `Document` object).
In the example `section(not-odml-name:Recording-2013-02-08-ak, record:Recording)` the find method return nothing.

In [5]:
from odml.tools.fuzzy_finder import FuzzyFinder

query_string = 'section(not-odml-name:Recording-2013-02-08-ak, record:Recording)'

f = FuzzyFinder(graph)
print(f.find(mode='match', q_str=query_string))




This is often inconvinient if you do not know exactly what the information is related to in the graph. For situations like this *'fuzzy'* mode comes into play. It is also set by default.

The output logic is similair to the previous mode, but there you can provide more broad information, the finder will match the parameters and create meaningful queries based on the input.

The query string consists of two parts: *FIND* and *HAVING*.

In the *FIND* part a user specifies the set of odML objects and its attributes. 
e.g. `FIND prop(name) section(name, type)`

In the *HAVING* part a user specifies set of search values which could relate to the attributes in *FIND* part.
e.g `HAVING Recording, Recording-2012-04-04-ab, Date`

Finally, the complete query will look like this:
`FIND sec(name, type) prop(name) HAVING Recording, Recording-2012-04-04-ab, Date`

As you can see in the example you should not really know to which attribute search values in *HAVING* part relates to, the finder can do it for you.

In [6]:
from odml.tools.fuzzy_finder import FuzzyFinder

query_string = 'FIND sec(name, type) prop(name) HAVING Recording, Recording-2012-04-04-ab, Date, Some_value'

f = FuzzyFinder(graph)
print(f.find(mode='fuzzy', q_str=query_string))

SELECT * WHERE {
?d odml:hasSection ?s .
?s rdf:type odml:Section .
?s odml:hasType "Recording" .
?s odml:hasProperty ?p .
?p rdf:type odml:Property .
?p odml:hasName "Date" .
}
Document: https://g-node.org/projects/odml-rdf#cc66e78a-3742-490a-9fdb-1c66761d7652
Property: https://g-node.org/projects/odml-rdf#f1699eb6-4cab-4dd0-9327-120eab2089ae
Section: https://g-node.org/projects/odml-rdf#5365f7e5-603c-4154-a5ea-33bb1a07a956
Document: https://g-node.org/projects/odml-rdf#537c6cc8-7dfe-4d53-a111-24b3ce0f3c1a
Property: https://g-node.org/projects/odml-rdf#138f08f7-23c7-4722-8577-85a6fa633ae1
Section: https://g-node.org/projects/odml-rdf#346773f2-abee-4892-b052-840ddcff35ee
Document: https://g-node.org/projects/odml-rdf#cd24b60f-1d5e-4040-9881-5e5a597baef7
Property: https://g-node.org/projects/odml-rdf#1d6db4ce-87f3-4e9c-b221-e76ba05b2759
Section: https://g-node.org/projects/odml-rdf#782bd29d-e4b0-4c14-a417-1772a4851ffd
Document: https://g-node.org/projects/odml-rdf#24066355-1ee8-4eb5-a71