=============
odML Tutorial
=============
:Author:
Lyuba Zehl;
based on work by Hagen Fritsch
:Release:
0.1
:License:
Creative Commons Attribution-ShareAlike 4.0 International
`License `_
-------------------------------------------------------------------------------
odML (open metadata Markup Language)
====================================
odML (open metadata Markup Language) is an XML based file format,
proposed by [Grewe et al. (2011) Front Neuroinform 5:16], in order
to provide metadata in an organized, human- and machine-readable way.
In this tutorial we will illustrate the conceptual design of odML and
show hands-on how you can generate your own odML metadata collection.
In addition, we demonstrate the advantages of using odML to screen
large numbers of data sets according to selection criteria relevant for
subsequent analyses. Well organized metadata management is a key
component to guarantee reproducibility of experiments and to track
provenance of performed analyses.
What are metadata and why are they needed?
Metadata are data about data. They describe the conditions under which the
actual raw-data of an experimental study were acquired. The organization of
such metadata and their accessibility may sound like a trivial task, and
most laboratories developed their home-made solutions to keep track of
their metadata. Most of these solutions, however, break down if data and
metadata need to be shared within a collaboration, because implicit
knowledge of what is important and how it is organized is often
underestimated.
While maintaining the relation to the actual raw-data, odML can help to
collect all metadata which are usually distributed over several files and
formats, and to store them unitetly which facilitates sharing data and
metadata.
Key features of odML
- open, XML based language, to collect, store and share metadata
- Machine- and human-readable
- Interactive odML-Editor
- Python-odML library
-------------------------------------------------------------------------------
Structure of this tutorial
==========================
The scientific background of the possible user community of odML varies
enormously (e.g. physics, informatics, mathematics, biology, medicine,
psychology). Some users will be trained programmers, others probably have never
learned a programming language.
To cover the different demands of all users, we first provide a slow
introduction to odML that allows programming beginners to learn the basic
concepts. In a next step, we will demonstrate how to generate an odML file via
the Python-odML library. In later chapters we present more advanced possibilies
of the Python-odML library (e.g. how to search for certain metadata or how to
integrate existing terminologies or templates).
Although the structure of an odML is depending on the needs of each individual
user, we would like to provide at the end of this tutorial a few guidelines.
The code for the example odML files, which we use within this tutorial is part
of the documentation package (see doc/example_odMLs/).
A summary of available odML terminologies and templates can be found `here
`_.
-------------------------------------------------------------------------------
Download and Installation
=========================
The Python-odML library (including the odML-Editor) is available on
`GitHub `_. If you are not familiar with
the version control system **git**, but still want to use it, have a look at
the documentaion available on the `git-scm website `_.
Dependencies
------------
The Python-odML library runs under Python 2.7.
Additionally, the Python-odML library depends on Enum (version 0.4.4).
Installation
------------
To download the Python-odML library please either use git and clone the
repository from GitHub::
$ cd /home/usr/toolbox/
$ git clone https://github.com/G-Node/python-odml.git
... or if you don't want to use git download the ZIP file also provided on
GitHub to your computer (e.g. as above on your home directory under a "toolbox"
folder).
To install the Python-odML library, enter the corresponding directory and run::
$ cd /home/usr/toolbox/python-odml/
$ python setup.py install
Bugs & Questions
----------------
Should you find a behaviour that is likely a bug, please file a bug report at
`the github bug tracker `_.
If you have questions regarding the use of the library or the editor, ask
the question on `Stack Overflow `_, be sure to tag
it with `odml` and we'll do our best to quickly solve the problem.
-------------------------------------------------------------------------------
Basic knowledge on odML
=======================
Before we start, it is important to know the basic structure of an odML
file. Within an odML file metadata are grouped and stored in a
hierarchical tree structure which consists of four different odML
objects.
Document
- corresponds to the root of the tree (groups everything together)
- *parent*: no parent
- *children*: Section
Section
- corresponds to (big) branches of the tree
- *parent*: Section or Document
- *children*: Section and/or Property
Property
- corresponds to (small) branches of the tree (groups values)
- *parent*: Section
- *children*: at least one Value
Value
- corresponds to leaf of the tree (contains metadata)
- *parent*: Property
- *children*: no children
Each of these odML objects has a certain set of attributes where the
user can describe the object and its contents. Which attribute belongs
to which object and what the attributes are used for, is better explained
in an example odML file (e.g., "THGTTG.odml").
A first look
============
If you want to get familiar with the concept behind an odML and how to handle
odML files in Python, you can have a first look at the example odML file
provided in the Python-odML library. For this you first need to run the python
code ("thgttg.py") to generate the example odML file ("THGTTG.odml")::
$ cd /home/usr/toolbox/python-odml/doc/example_odMLs/
$ ls
thgttg.py
$ python thgttg.py
$ ls
THGTTG.odml thgttg.py
Now open a Python shell within the Python-odML library directory, e.g. with
IPython::
$ cd /home/usr/toolbox/python-odml/
$ ipython
In the IPython shell, first import the odml package::
>>> import odml
Second, load the example odML file with the following command lines::
>>> to_load = '/home/usr/toolbox/python-odml/doc/example_odMLs/THGTTG.odml'
>>> odmlEX = odml.tools.xmlparser.load(to_load)
If you open a Python shell outside of the Python-odML library directory, please
adapt your Python-Path and the path to the "THGTTG.odml" file accordingly.
How you can access the different odML objects and their attributes once you
loaded an odML file and how you can make use of the attributes is described in
more detail in the following chapters for each odML object type (document,
section, property, value). Please note that some attributes are obligatory,
some are recommended and others are optional. The optional attributes are
important for the advanced odML possibilities and can for now be ignored by
odML beginners. You can find an example of their usage in later chapters.
The Document
------------
If you loaded the example odML file, you can have a first look at the Document
either by explicitely calling the odml object,...::
>>> print odmlEX.document
... or using the following short cut::
>>> print odmlEX
As you can see, both commands will printout the same short summary about the
Document of the loaded example odML file. In the following we will only use the
short cut notation.
The print out gives you already the follwing information about the odML file:
- '<...>' indicates that you are looking at an object
- 'Doc' tells you that you are looking at an odML Document
- '42' is the version of the odML file
- 'by D. N. Adams' states the author of the odML file
- '(2 sections)' tells you that this odML Document has 2 Section directly
appended
Note that the Document printout tells you nothing about the depth of the
complete tree structure, because it is not displaying the children of its
directly attached Sections. It also does not display all Document attributes.
In total, a Document has the following 4 attributes:
author
- recommended Document attribute
- The author of this odML file.
date
- recommended Document attribute
- The date this odML file was created (yyyy-mm-dd format).
repository
- optional Document attribute
- The URL to the repository of terminologies used in this odML file.
version
- recommended Document attribute
- The version of this odML file.
Let's find out what attributes were defined for our example Document using the
following commands::
>>> odmlEX.author
'D. N. Adams'
>>> odmlfile.date
'1979-10-12'
>>> odmlEX.version
42
>>> odmlEX.repository
As you learned in the beginning, Sections can be attached to a Document, as the
first hierarchy level of the odML file. Let's have a look which Sections were
attached to the Document of our example odML file using the following command::
>>> odmlEX.sections
[, ]
The printout of a Section is explained in the next chapter.
The Sections
------------
There are several ways to access Sections. You can either call them by name or
by index using either explicitely the function that returns the list of
Sections (see last part of 'The Document' chapter) or using again a short cut
notation. Let's test all the different ways to access a Section, by having a
look at the first Section in the sections list attached to the Document in our
example odML file::
>>> odmlEX.sections['TheCrew']
>>> odmlEX.sections[0]
>>> odmlEX['TheCrew']
>>> odmlEX[0]
In the following we will use the short cut notation and calling Sections
explicitely by their name.
The printout of a Section is similar to the Document printout and gives you
already the following information:
- '<...>' indicates that you are looking at an object
- 'Section' tells you that you are looking at an odML Section
- 'TheCrew' tells you that the Section was named 'TheCrew'
- '[...]' highlights the type of the Section (here 'crew')
- '(4)' states that this Section has four sub-Sections directly attached to it
Note that the Section printout tells you nothing about the number of attached
Properties or again about the depth of a possible sub-Section tree below the
directly attached ones. It also only list the type of the Section as one of the
Section attributes. In total, a Section can be defined by the following 5
attributes:
name
- obligatory Section attribute
- The name of the section. Should describe what kind of information can be
found in this section.
definition
- recommended Section attribute
- The definition of the content within this section.
type
- recommended Section attribute
- The classification type which allows to connect related Sections due to
a superior semantic context.
reference
- optional Section attribute
- The ?
repository
- optional Section attribute
- The URL to the repository of terminologies used in this odML file.
Let's have a look what attributes were defined for the Section "TheCrew" using
the following commands::
>>> odmlEX['TheCrew'].name
'TheCrew'
>>> odmlEX['TheCrew'].definition
'Information on the crew'
>>> odmlEX['TheCrew'].type
'crew'
>>> odmlEX['TheCrew'].reference
>>> odmlEX['TheCrew'].repository
To see which Sections are directly attached to the Section 'TheCrew' use again
the following command::
>>> odmlEX['TheCrew'].sections
[,
,
,
]
For accessing these sub-Sections you can use again all the following commands::
>>> odmlEX['TheCrew'].sections['Ford Prefect']
>>> odmlEX['TheCrew'].sections[3]
>>> odmlEX['TheCrew']['Ford Prefect']
>>> odmlEX['TheCrew'][3]
Besides sub-Sections a Section can also have Properties attached. To see if and
which Properties are attached to the Section 'TheCrew' you have to use the
following command::
>>> odmlEX['TheCrew'].properties
[, ]
The printout of a Property is explained in the next chapter.
The Properties
--------------
Properties need to be called explicitely via the properties function of a
Section. You can then, either call a Property by name or by index::
>>> odmlEX['TheCrew'].properties['NoCrewMembers']
>>> odmlEX['Setup'].properties[1]
In the following we will only call Properties explicitely by their name.
The Property printout is reduced and only gives you information about the
following:
- '<...>' indicates that you are looking at an object
- 'Property' tells you that you are looking at an odML Property
- 'NoCrewMembers' tells you that the Property was named 'NoCrewMembers'
Note that the Property printout tells you nothing about the number of Values,
and very little about the Property attributes. In total, a Property can be
defined by the following 6 attributes:
name
- obligatory Property attribute
- The name of the Property. Should describe what kind of Values can be
found in this Property.
value
- obligatory Property attribute
- The value container of this property. See in 'The Value' chapter for
details.
definition
- recommended Property attribute
- The definition of this Property.
dependency
- optional Property attribute
- A name of another Property within the same section, which this property
depends on.
dependency_value
- optional Property attribute
- Value of the other Property specified in the 'dependency' attribute on
which this Property depends on.
mapping
- optional Property attribute
- The odML path within the same odML file (internal link) to another
Section to which all children of this section, if a conversion is
requested, should be transferred to, as long as the children not
themselves define a mapping.
Let's check which attributes were defined for the Property "NoCrewMembers"::
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].name
'NoCrewMembers'
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].definition
'Number of crew members'
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].dependency
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].dependency_value
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].mapping
The Value or Values attached to a Property can be accessed via two different
commands. If only one value object was attached to the Property, the first
command returns directly a Value::
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value
If multiple Values were attached to the Property, a list of Values is
returned::
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].value
[, ,
, ]
The second command will always return a list independent of the number of
Values attached::
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].values
[]
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values
[, ,
, ]
The printout of the Value is explained in the next chapter.
The Values
----------
Depending on how many Values are attached to a Property, it can be accessed
in two different ways. If you know, only one value is attached, you can use the
following command::
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value
If you know, more then one Value is attached, and you would like for e.g.,
access the forth one you can use::
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3]
The Value printout is reduced and only gives you information about the
following:
- '<...>' indicates that you are looking at an object
- 'int' tells you that the value has the odml data type (dtype) 'int'
- '4' is the actual data stored within the value object
In total, a Value can be defined by the following 6 attributes:
data
- obligatory Value attribute
- The actual metadata value.
dtype
- recommended Value attribute
- The odml data type of the given metadata value.
definition
- recommended Value attribute
- The definition of the given metadata value.
uncertainty
- recommended Value attribute
- Can be used to specify the uncertainty of the given metadata value.
unit
- recommended Value attribute
- The unit of the given metadata value, if it has a unit.
reference
- optional Value attribute
- The ?
filename
- optional Value attribute
- The ?
encoder
- optional Value attribute
- Name of the applied encoder used to encode a binary metadata value into
ascii.
checksum
- optional Value attribute
- Checksum and name of the algorithm that calculated the checksum of a
given binary metadata value (algorithm$checksum format)
Let's see which attributes were defined for the Value of the Property
'NoCrewMembers' of the Section 'TheCrew'::
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.data
4
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.dtype
'int'
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.definition
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.uncertainty
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.unit
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.reference
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.filename
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.encoder
>>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.checksum
Note that these commands are for Properties containing one Value. For
accessing attributes of a Value of a Property with multiple Values use::
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].data
'Ford Prefect'
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].dtype
'person'
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].definition
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].uncertainty
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].unit
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].reference
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].filename
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].encoder
>>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].checksum
If you would like to get all the actual metadata values back from a Property
with multiple Values, iterate over the Values list::
>>> all_metadata = []
>>> for val in doc['TheCrew'].properties['NameCrewMembers'].values:
... all_metadata.append(val.data)
...
>>> all_metadata
['Arthur Philip Dent', 'Zaphod Beeblebrox',
'Tricia Marie McMillan', 'Ford Prefect']
------------------------------------------------------------------------
Generating an odML-file
=======================
After getting familiar with the different odml objects and their attributes,
you will now learn how to generate your own odML file by reproducing some parts
of the example odml file we presented before.
We will show you first how to create the different odML objects with their
obligatory and recommended attributes. Please have a look at the tutorial part
describing the advanced possibilities of the Python odML library for the usage
of all other attributes.
If you opened a new IPython shell, please import first again the odml package::
>>> import odml
Create a document
-----------------
Let's start by creating the Document::
>>> MYodML = odml.Document(author='Douglas Adams',
version=42)
You can check if your new Document contains actually what you created by using
some of the commands you learned before::
>>> MYodML
>>>
>>> MYodML.date
As you can see, we created a Document with the same attributes as the example,
except that we forgot to define the date. Note that you can always edit
attributes of generated odml objects. For this let's first import the Python
package datetime::
>>> import datetime as dt
Now we edit the date attribute of the Document::
>>> MYodML.date = dt.date(1979, 10, 12)
>>> MYodML.date
'1979-10-12'
Another part which is still missing is that so far we have no Sections attached
to our Document. Let's change this!
Create a section
----------------
We now create a Section by reproducing the Section "TheCrew" of the example
odml file from the beginning::
>>> sec = odml.Section(name='TheCrew',
definition='Information on the crew',
type='crew')
Check if your new Section contains actually what you created::
>>> sec.name
'TheCrew'
>>> sec.definition
'Information on the crew'
>>> sec.type
'crew'
Now we need to attach the Section to our previously generated Document::
>>> MYodML.append(sec)
>>> MYodML
>>> MYodML.sections
[]
We repeat the procedure to create now a second Section which we will attach as
a sub-Section to the Section 'TheCrew'::
>>> sec = odml.Section(name='Arthur Philip Dent',
definition='Information on Arthur Dent',
type='crew/person')
>>> sec
>>> MYodML['TheCrew'].append(sec)
>>> MYodML.sections
[]
>>> MYodML['TheCrew'].sections
[]
Note that all of our created Sections do not contain any Properties and Values,
yet. Let's see if we can change this...
Create a Property-Value(s) pair:
--------------------------------
The creation of a Property is not independent from creating a Value, because a
Property always needs at least on Value attached. Therefore we will demonstrate
the creation of Value and Property together.
Let's first create a Property with a single Value::
>>> val = odml.Value(data="male",
dtype=odml.DType.string)
>>> val
>>> prop = odml.Property(name='Gender',
definition='Sex of the subject',
value=val)
>>> prop
>>> prop.value
As you can see, we define a odML data type (dtype) for the Value. Generally,
you can use the following odML data types to describe the format of the stored
metadata:
+-----------------------------------+---------------------------------------+
| dtype | required data examples |
+===================================+=======================================+
| odml.DType.int or 'int' | 42 |
+-----------------------------------+---------------------------------------+
| odml.DType.float or 'float' | 42.0 |
+-----------------------------------+---------------------------------------+
| odml.DType.boolean or 'boolean' | True or False |
+-----------------------------------+---------------------------------------+
| odml.DType.string or 'string' | 'Earth' |
+-----------------------------------+---------------------------------------+
| odml.DType.date or 'date' | dt.date(1979, 10, 12) |
+-----------------------------------+---------------------------------------+
| odml.DType.datetime or 'datetime' | dt.datetime(1979, 10, 12, 11, 11, 11) |
+-----------------------------------+---------------------------------------+
| odml.DType.time or 'time' | dt.time(11, 11, 11) |
+-----------------------------------+---------------------------------------+
| odml.DType.person or 'person' | 'Zaphod Beeblebrox' |
+-----------------------------------+---------------------------------------+
| odml.DType.text or 'text' | |
+-----------------------------------+---------------------------------------+
| odml.DType.url or 'url' | "https://en.wikipedia.org/wiki/Earth" |
+-----------------------------------+---------------------------------------+
| odml.DType.binary or 'binary' | '00101010' |
+-----------------------------------+---------------------------------------+
The available types are implemented in the odml.types Module.
After learning how we create a simple Porperty-Value-pair, we need to know how
we can attach it to a Section. As exercise, we attach our first Porperty-Value-
pair to the sub-Section 'Arthur Philip Dent'::
>>> MYodML['TheCrew']['Arthur Philip Dent'].append(prop)
>>> MYodML['TheCrew']['Arthur Philip Dent'].properties
[]
If the odML data type of a Value is distinctly deducible ('int', 'float',
'boolean', 'string', 'date', 'datetime', or 'time'), you can also use a short
cut to create a Property-Value pair::
>>> prop = odml.Property(name='Gender',
definition='Sex of the subject',
value='male')
>>> prop
>>> prop.value
Mark that this short cut will not work for the following odML data types
'person', 'text', 'url', and 'binary', because they are not automatically
distinguishable from the odML data type 'string'.
Next we learn how to create a Property with multiple Values attached to it::
>>> vals = [odml.Value(data='Arthur Philip Dent',
dtype=odml.DType.person),
odml.Value(data='Zaphod Beeblebrox',
dtype=odml.DType.person),
odml.Value(data='Tricia Marie McMillan',
dtype=odml.DType.person),
odml.Value(data='Ford Prefect',
dtype=odml.DType.person)]
>>> vals
[, ,
, ]
>>> prop = odml.Property(name = 'NameCrewMembers',
definition = 'List of crew members names',
value = vals)
>>> prop
>>> prop.values
[, ,
, ]
To build up our odML file further, we attach this Porperty-Values-pair to
the Section 'TheCrew'::
>>> MYodML['TheCrew'].append(prop)
>>> MYodML['TheCrew'].properties
[]
Just to illustrate you again, we could also make use again of the short cut
notation, if we would agree to use the odML data type 'string' instead of
'person' for our Porperty-Values-pair::
>>> prop = odml.Property(name = 'NameCrewMembers',
definition = 'List of crew members names',
value = ['Arthur Philip Dent',
'Zaphod Beeblebrox',
'Tricia Marie McMillan',
'Ford Prefect'])
>>> prop.value
[, ,
, ]
Note that this short cut also works for creating a Property with a list of
Values of different data types, e.g.::
>>> prop = odml.Property(name = 'TestMultipleValueList',
definition = 'List of Values of with different '
'odML data types',
value = [42,
42.0,
True,
"Don't Panic",
dt.date(1979, 10, 12),
dt.datetime(1979, 10, 12, 11, 11, 11),
dt.time(11, 11, 11)])
>>> prop.values
[,
,
,
,
,
,