=============
odML Tutorial
=============
:Author:
Lyuba Zehl;
based on work by Hagen Fritsch
:Release:
1.4
:License:
Creative Commons Attribution-ShareAlike 4.0 International
`License `_
-------------------------------------------------------------------------------
odML (open metadata Markup Language)
====================================
odML (open metadata Markup Language) is a framework, proposed by `Grewe et al.
(2011) `_,
to organize and store experimental metadata in a human- and machine-readable,
XML based format (odml). In this tutorial we will illustrate the conceptual
design of the odML framework and show hands-on how you can generate your own
odML metadata file collection. A well organized metadata management of your
experiment is a key component to guarantee the reproducibility of your research
and facilitate the provenance tracking of your analysis projects.
What are metadata and why are they needed?
Metadata are data about data. They describe the conditions under which the
actual raw-data of an experimental study were acquired. The organization of
such metadata and their accessibility may sound like a trivial task, and
most laboratories developed their home-made solutions to keep track of
their metadata. Most of these solutions, however, break down if data and
metadata need to be shared within a collaboration, because implicit
knowledge of what is important and how it is organized is often
underestimated.
While maintaining the relation to the actual raw-data, odML can help to
collect all metadata which are usually distributed over several files and
formats, and to store them unitedly which facilitates sharing data and
metadata.
Key features of odML
- open, XML based language, to collect, store and share metadata
- Machine- and human-readable
- Python-odML library
- Interactive odML-Editor
-------------------------------------------------------------------------------
Structure of this tutorial
==========================
The scientific background of the possible user community of odML varies
enormously (e.g. physics, informatics, mathematics, biology, medicine,
psychology). Some users will be trained programmers, others probably have never
learned a programming language.
To cover the different demands of all users, we provide a slow introduction to
the odML framework that even allows programming beginners to learn the basic
concepts. We will demonstrate how to generate an odML file and present more
advanced possibilities of the Python-odML library (e.g., how to search for
certain metadata or how to integrate existing terminologies).
At the end of this tutorial we will provide a few guidelines that will help you
to create an odML file structure that is optimised for your individual
experimental project and complements the special needs of your laboratory.
The code for the example odML files, which we use within this tutorial is part
of the documentation package (see doc/example_odMLs/).
A summary of available odML terminologies and templates can be found `here
`_.
-------------------------------------------------------------------------------
Download and Installation
=========================
The odML framework is an open source project of the German Neuroinformatics
Node (`G-Node `_, `odML project website
`_) of the International Neuroinformatics
Coordination Facility (`INCF `_). The source code for
the Python-odML library is available on `GitHub `_ under
the project name `python-odml `_.
Dependencies
------------
The Python-odML library (version 1.4) runs under Python 2.7 or 3.5.
Additionally, the Python-odML library depends on Enum, lxml, pyyaml and rdflib.
When the odML-Python library is installed via pip or the setup.py, these
packages will be automatically downloaded and installed. Alternatively, they
can be installed from the OS package manager.
On Ubuntu, the dependency packages are available as ``python-enum`` and
``python-lxml``.
Note that on Ubuntu 14.04, the latter package additionally requires the
installation of ``libxml2-dev``, ``libxslt1-dev``, and ``lib32z1-dev``.
Installation...
---------------
... via pip:
************
The simplest way to install the Python-odML library is from `PyPI
`_ using `pip `_::
$ pip install odml
The appropriate Python dependencies will be automatically
downloaded and installed.
If you are not familiar with PyPI and pip, please have a look at the available
online documentation.
Installation
------------
To download the Python-odML library please either use git and clone the
repository from GitHub::
$ cd /home/usr/toolbox/
$ git clone https://github.com/G-Node/python-odml.git
... or if you don't want to use git, download the ZIP file also provided on
GitHub to your computer (e.g. as above on your home directory under a "toolbox"
folder).
To install the Python-odML library, enter the corresponding directory and run::
$ cd /home/usr/toolbox/python-odml/
$ python setup.py install
Bugs & Questions
----------------
Should you find a behaviour that is likely a bug, please file a bug report at
`the github bug tracker `_.
If you have questions regarding the use of the library or the editor, ask
the question on `Stack Overflow `_, be sure to tag
it with `odml` and we'll do our best to quickly solve the problem.
-------------------------------------------------------------------------------
Basic knowledge on odML
=======================
Before we start, it is important to know the basic structure of an odML
file. Within an odML file metadata are grouped and stored in a
hierarchical tree structure which consists of three basic odML
objects.
Document:
- description: *root of the tree*
- parent: *no parent*
- children: *Section*
Section:
- description: *branches of the tree*
- parent: *Document or Section*
- children: *Section and/or Property*
Property:
- description: *leafs of the tree (contains metadata values)*
- parent: *Section*
- children: *none*
Each of these odML objects has a certain set of attributes where the
user can describe the object and its contents. Which attribute belongs
to which object and what the attributes are used for, is better explained
in an example odML file (e.g., "THGTTG.odml").
A first look
============
If you want to get familiar with the concept behind the odML framework and how
to handle odML files in Python, you can have a first look at the example odML
file provided in the Python-odML library. For this you first need to run the
python code ("thgttg.py") to generate the example odML file ("THGTTG.odml").
When using the following commands, make sure you adapt the paths to the
python-odml module to your owns!::
$ cd /home/usr/.../python-odml
$ ls doc/example_odMLs
thgttg.py
$ python doc/example_odMLs/example_odMLs.py "/home/usr/.../python-odml"
$ ls doc/example_odMLs
THGTTG.odml thgttg.py
Now open a Python shell within the Python-odML library directory, e.g. with
IPython::
$ ipython
In the IPython shell, first import the odml package::
>>> import odml
Second, load the example odML file with the following command lines::
>>> to_load = './doc/example_odMLs/THGTTG.odml'
>>> odmlEX = odml.load(to_load)
If you open a Python shell outside of the Python-odML library directory, please
adapt your Python-Path and the path to the "THGTTG.odml" file accordingly.
How you can access the different odML objects and their attributes once you
loaded an odML file and how you can make use of the attributes is described in
more detail in the following chapters for each odML object type (Document,
Section, Property).
How you can create the different odML objects on your own and how to connect
them to build your own metadata odML file will be described in later chapters.
Further advanced functions you can use to navigate through your odML files, or to
create an odML template file, or to make use of common odML terminologies
provided via `the G-Node repository
`_ can also
be found later on in this tutorial.
But now, let us first have a look at the example odML file (THGTTG.odml)!
The Document
------------
If you loaded the example odML file, let's have a first look at the Document::
>>> print odmlEX
Document 42 {author = D. N. Adams, 2 sections}
As you can see, the printout gives you a short summary of the Document of the
loaded example odML file.
The print out gives you already the follwing information about the odML file:
- ``Document`` tells you that you are looking at an odML Document
- ``42`` is the user defined version of this odML file
- ``{...}`` provides ``author`` and number of attached sections
- ``author`` states the author of the odML file, "D. N. Adams" in the example case
- ``2 sections`` tells you that this odML Document has 2 Section directly
appended
Note that the Document printout tells you nothing about the depth of the
complete tree structure, because it is not displaying the children of its
directly attached Sections. It also does not display all Document attributes.
In total, a Document has the following 4 attributes:
author
- Returns the author (returned as string) of this odML file.
date
- Returns ta user defined date. Could for example be used to state
the date of first creation or the date of last changes.
document
- Returns the current Document object.
parent
- Returns the parent object (which is ``None`` for a Document).
repository
- Returns the URL (returned as string) to a user defined repository of
terminologies used in this Document. Could be the URL to the G-Node
terminologies or to a user defined template.
version
- Returns the user defined version (returned as string) of this odML file.
id
- id is a UUID (universally unique identifiers) that uniquely identifies
the current document. If not otherwise specified, this id is automatically
created and assigned.
Let's check out all attributes with the following commands::
>>> print(odmlEX.author)
D. N. Adams
>>> print(odmlEX.date)
1979-10-12
>>> print(odmlEX.document)
Document 42 {author = D. N. Adams, 2 sections}
>>> print(odmlEX.parent)
None
>>> print(odmlEX.repository)
http://portal.g-node.org/odml/terminologies/v1.1/terminologies.xml
>>> print(odmlEX.version)
42
As expected for a Document, the attributes author and version match the
information given in the Document printout, the document attribute just returns
the Document, and the parent attribute is ``None``.
As you learned in the beginning, Sections can be attached to a Document. They
represent the next hierarchy level of an odML file. Let's have a look which
Sections were attached to the Document of our example odML file using the
following command::
>>> print(odmlEX.sections)
[Section[4|2] {name = TheCrew, type = crew, id = ...},
Section[1|7] {name = TheStarship, type = starship, id = ...}]
As expected from the Document printout our example contains two Sections. The
printout and attributes of a Section are explained in the next chapter.
The Sections
------------
There are several ways to access Sections. You can either call them by name or
by index using either explicitly the function that returns the list of
Sections (see last part of `The Document`_ chapter) or using again a short cut
notation. Let's test all the different ways to access a Section, by having a
look at the first Section in the sections list attached to the Document in our
example odML file::
>>> print(odmlEX.sections['TheCrew'])
Section[4|2] {name = TheCrew, type = crew, id = ...}
>>> print(odmlEX.sections[0])
Section[4|2] {name = TheCrew, type = crew, id = ...}
>>> print(odmlEX['TheCrew'])
Section[4|2] {name = TheCrew, type = crew, id = ...}
>>> print(odmlEX[0])
Section[4|2] {name = TheCrew, type = crew, id = ...}
In the following we will call Sections explicitly by their name using the
short cut notation.
The printout of a Section is similar to the Document printout and gives you
already the following information:
- ``Section`` tells you that you are looking at an odML Section
- ``[4|2]`` states that this Section has four Sections and two Properties directly attached to it
- ``{...}`` provides ``name``, ``type`` and ``id`` of the Section
- ``name`` is the name of this Section, 'TheCrew' in the example case
- ``type`` provides the type of the Section, 'crew' in the example case
- ``id`` provides the uuid of the Section, the actual value has been omitted in the example to improve readability.
Note that the Section printout tells you nothing about the depth of a possible
sub-Section tree below the directly attached ones. It also only list the type
of the Section as one of the Section attributes. In total, a Section can be
defined by the following 5 attributes:
name
- Returns the name of this Section. Should indicate what kind of
information can be found in this Section.
definition
- Returns the definition of the content within this Section. Should
describe what kind of information can be found in this Section.
document
- Returns the Document to which this Section belongs to. Note that this
attribute is set automatically for a Section and all its children when
it is attached to a Document.
parent
- Returns the parent to which this Section was directly attached to. Can be
either a Document or another Section.
type
- Returns the classification type which allows to connect related Sections
due to a superior semantic context.
reference
- Returns a reference that can be used to state the origin or source file
of the metadata stored in the Properties that are grouped by this
Section.
repository
- Returns the URL (returned as string) to a user defined repository of
terminologies used in this Document. Could be the URL to the G-Node
terminologies or to a user defined template.
id
- id is a UUID (universally unique identifiers) that uniquely identifies
the current section. If not otherwise specified, this id is automatically
created and assigned.
Let's have a look at the attributes for the Section 'TheCrew'::
>>> print(odmlEX['TheCrew'].name)
TheCrew
>>> print(odmlEX['TheCrew'].definition)
Information on the crew
>>> print(odmlEX['TheCrew'].document)
Document 42 {author = D. N. Adams, 2 sections}
>>> print(odmlEX['TheCrew'].parent)
Document 42 {author = D. N. Adams, 2 sections}
>>> print(odmlEX['TheCrew'].type)
crew
>>> print(odmlEX['TheCrew'].reference)
None
>>> print(odmlEX['TheCrew'].repository)
None
>>> print(odmlEX['TheCrew'].id)
None
As expected for this Section, the name and type attribute match the information
given in the Section printout, and the document and parent attribute return the
same object, namely our example Document.
To see which Sections are directly attached to the Section 'TheCrew' again use
the following command::
>>> print(odmlEX['TheCrew'].sections)
[Section[0|5] {name = Arthur Philip Dent, type = crew/person, id = ...},
Section[0|5] {name = Zaphod Beeblebrox, type = crew/person, id = ...},
Section[0|5] {name = Tricia Marie McMillan, type = crew/person, id = ...},
Section[0|5] {name = Ford Prefect, type = crew/person, id = ...}]
Or, for accessing these sub-Sections::
>>> print(odmlEX['TheCrew'].sections['Ford Prefect'])
Section[0|5] {name = Ford Prefect, type = crew/person, id = ...}
>>> print(odmlEX['TheCrew'].sections[3])
Section[0|5] {name = Ford Prefect, type = crew/person, id = ...}
>>> print(odmlEX['TheCrew']['Ford Prefect'])
Section[0|5] {name = Ford Prefect, type = crew/person, id = ...}
>>> print(odmlEX['TheCrew'][3])
Section[0|5] {name = Ford Prefect, type = crew/person, id = ...}
As you learned, besides sub-Sections, a Section can also have Properties
attached. Let's see which Properties are attached to the Section 'TheCrew'::
>>> print(odmlEX['TheCrew'].properties)
[Property: {name = NameCrewMembers},
Property: {name = NoCrewMembers}]
The printout and attributes of a Property are explained in the next chapter.
The Properties
--------------
Properties need to be called explicitly via the properties function of a
Section. You can then either call a Property by name or by index::
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'])
Property: {name = NoCrewMembers}
>>> print(odmlEX['TheCrew'].properties[1])
Property: {name = NoCrewMembers}
In the following we will only call Properties explicitly by their name.
The Property printout is reduced and only gives you information about the
following:
- ``Property`` tells you that you are looking at an odML Property
- ``{...}`` provides the ``name`` of the Property
- ``NoCrewMembers`` is the name of this Property
Note that the Property printout tells you nothing about the number of Values,
and very little about the Property attributes. In total, a Property can be
defined by the following 9 attributes:
name
- Returns the name of the Property. Should indicate what kind of metadata
are stored in this Property.
definition
- Returns the definition of this Property. Should describe what kind of
metadata are stored in this Property.
document
- Returns the Document to which the parent Section of this Property belongs
to. Note that this attribute is set automatically for a Section and all
its children when it is attached to a Document.
parent
- Returns the parent Section to which this Property was attached to.
values
- Returns the metadata of this Property. Can be either a single metadata or
multiple, but homogeneous metadata (all with same dtype and unit). For
this reason, the output is always provided as a list.
dtype
- Returns the odml data type of the stored metadata.
unit
- Returns the unit of the stored metadata.
uncertainty
- recommended
- Can be used to specify the uncertainty of the given metadata value.
reference
- Returns a reference that can be used to state an external definition
of the metadata value.
dependency
- optional
- A name of another Property within the same section, which this property
depends on.
dependency_value
- optional
- Value of the other Property specified in the 'dependency' attribute on
which this Property depends on.
value_origin
- A reference to state the origin of the metadata value e.g. a file name.
Let's check which attributes were defined for the Property 'NoCrewMembers'::
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].name)
NoCrewMembers
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].definition)
Number of crew members
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].document)
Document 42 {author = D. N. Adams, 2 sections}
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].values)
[4]
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].dtype)
int
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].unit)
None
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].uncertainty)
1
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].reference)
The Hitchhiker's guide to the Galaxy (novel)
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].dependency)
None
>>> print(odmlEX['TheCrew'].properties['NoCrewMembers'].dependency_value)
None
As mentioned the values attribute of a Property can only contain multiple
metadata when they have the same ``dtype`` and ``unit``, as it is the case for
the Property 'NameCrewMembers'::
>>> print(odmlEX['TheCrew'].properties['NameCrewMembers'].values)
['Arthur Philip Dent',
'Zaphod Beeblebrox',
'Tricia Marie McMillan',
'Ford Prefect']
>>> print(odmlEX['TheCrew'].properties['NameCrewMembers'].dtype)
person
>>> print(odmlEX['TheCrew'].properties['NameCrewMembers'].unit)
None
NOTE: 'property.values' will always return a copy! Any direct changes to the
returned list will have no affect on the actual property values. If you want to
make changes to a property value, either use the 'append', 'extend' and 'remove'
methods or assign a new value list to the property.
-------------------------------------------------------------------------------
Generating an odML-file
=======================
After getting familiar with the different odML objects and their attributes,
you will now learn how to generate your own odML file by reproducing some parts
of the example THGTTG.odml.
We will show you first how to create the different odML objects with their
attributes. Please note that some attributes are obligatory, some are
recommended and others are optional when creating the corresponding odML
objects. A few are automatically generated in the process of creating an odML
file. Furthermore, all attributes of an odml object can be edited at any time.
If you opened a new IPython shell, please import first again the odml package::
>>> import odml
Create a document
-----------------
Let's start by creating the Document. Note that none of the Document attributes
are obligatory::
>>> MYodML = odml.Document()
You can check if your new Document contains actually what you created by using
some of the commands you learned before::
>>> MYodML
>>> Document None {author = None, 0 sections}
As you can see, we created an "empty" Document where the version and the author
attributes are not defined and no section is yet attached. You will learn how to create
and add a Section to a Document in the next chapter. Let's focus here on defining
the Document attributes::
>>> MYodML.author = 'D. N. Adams'
>>> MYodML.version = 42
For the date attribute you require a datetime object as entry. For this reason,
you need to first import the Python package datetime::
>>> import datetime as dt
Now, let's define the date attribute of the Document::
>>> MYodML.date = dt.date(1979, 10, 12)
Next, let us also add a repository attribute. Exemplary, we can import the
Python package os to extract the absolute path to our previously used example
odML file and add this as repository::
>>> import os
>>> url2odmlEX = 'file:///' + os.path.abspath(to_load)
>>> MYodML.repository = url2odmlEX
The document and parent attribute are automatically set and should not be
fiddled with.
Check if your new Document contains actually all attributes now::
>>> print(MYodML.author)
D. N. Adams
>>> print(MYodML.date)
1979-10-12
>>> print(MYodML.document)
Document 42 {author = D. N. Adams, 0 sections}
>>> print(MYodML.parent)
None
>>> print(MYodML.repository)
file:///home/usr/.../python-odml/doc/example_odMLs/THGTTG.odml
>>> print(MYodML.version)
42
Note that you can also define all attributes when first creating a Document::
>>> MYodML = odml.Document(author='D. N. Adams',
version=42,
date=dt.date(1979, 10, 12),
repository=url2odmlEX)
Our new created Document is, though, still "empty", because it does not contain
yet Sections. Let's change this!
Create a section
----------------
We now create a Section by reproducing the Section "TheCrew" of the example
odml file from the beginning::
>>> sec1 = odml.Section(name="TheCrew",
definition="Information on the crew",
type="crew")
Note that only the attribute name is obligatory. The attributes definition and
type are recommended, but could be either not defined at all or defined later
on.
Let us now attach this Section to our previously generated Document. With this,
the attribute document and parent of our new Section are automatically
updated::
>>> MYodML.append(sec1)
>>> print(MYodML)
Document 42 {author = D. N. Adams, 1 sections}
>>> print(MYodML.sections)
[Section[0|0] {name = TheCrew, type = crew, id = ...}]
>>> print(sec1.document)
Document 42 {author = D. N. Adams, 1 sections}
>>> print(sec1.parent)
Document 42 {author = D. N. Adams, 1 sections}
It is also possible to connect a Section directly to a parent object.
Let's try this with the next Section we create::
>>> sec2 = odml.Section(name="Arthur Philip Dent",
definition="Information on Arthur Dent",
type="crew/person",
parent=sec1)
>>> print(sec2)
Section[0|0] {name = Arthur Philip Dent, type = crew/person, id = ...}
>>> print(sec2.document)
Document 42 {author = D. N. Adams, 1 sections}
>>> print(sec2.parent)
[Section[1|0] {name = TheCrew, type = crew, id = ...}
Note that all of our created Sections do not contain any Properties yet. Let's
see if we can change this...
Create a Property:
------------------
Let's create our first Property::
>>> prop1 = odml.Property(name="Gender",
definition="Sex of the subject",
values="male")
Note that again, only the name attribute is obligatory for creating a Property.
The remaining attributes can be defined later on, or are automatically
generated in the process.
If a value is defined, but the dtype is not, as it is the case for our example
above, the dtype is deduced automatically::
>>> print(prop1.dtype)
string
Generally, you can use the following odML data types to describe the format of
the stored metadata:
+-----------------------------------+---------------------------------------+
| dtype | required data examples |
+===================================+=======================================+
| odml.DType.int or 'int' | 42 |
+-----------------------------------+---------------------------------------+
| odml.DType.float or 'float' | 42.0 |
+-----------------------------------+---------------------------------------+
| odml.DType.boolean or 'boolean' | True or False |
+-----------------------------------+---------------------------------------+
| odml.DType.string or 'string' | 'Earth' |
+-----------------------------------+---------------------------------------+
| odml.DType.date or 'date' | dt.date(1979, 10, 12) |
+-----------------------------------+---------------------------------------+
| odml.DType.datetime or 'datetime' | dt.datetime(1979, 10, 12, 11, 11, 11) |
+-----------------------------------+---------------------------------------+
| odml.DType.time or 'time' | dt.time(11, 11, 11) |
+-----------------------------------+---------------------------------------+
| odml.DType.person or 'person' | 'Zaphod Beeblebrox' |
+-----------------------------------+---------------------------------------+
| odml.DType.text or 'text' | 'any text containing \n linebreaks' |
+-----------------------------------+---------------------------------------+
| odml.DType.url or 'url' | "https://en.wikipedia.org/wiki/Earth" |
+-----------------------------------+---------------------------------------+
| odml.DType.tuple | "(39.12; 67.19)" |
+-----------------------------------+---------------------------------------+
The available types are implemented in the 'odml.dtypes' Module. Note that the
last four data types, if not defined, cannot be deduced, but are instead
always interpreted as string.
If we append now our new Property to the previously created sub-Section
'Arthur Philip Dent', the Property will also inherit the document attribute and
automatically update its parent attribute::
>>> MYodML['TheCrew']['Arthur Philip Dent'].append(prop1)
>>> print(prop1.document)
Document 42 {author = D. N. Adams, 1 sections}
>>> print(prop1.parent)
Section[0|1] {name = Arthur Philip Dent, type = crew/person, id = ...}
Next, let us create a Property with multiple metadata entries::
>>> prop2 = odml.Property(name="NameCrewMembers",
definition="List of crew members names",
values=["Arthur Philip Dent",
"Zaphod Beeblebrox",
"Tricia Marie McMillan",
"Ford Prefect"],
dtype=odml.DType.person)
As you learned before, in such a case, the metadata entries must be
homogeneous! That means they have to be of the same dtype, unit, and
uncertainty (here ``odml.DType.person``, None, and None, respectively).
To further build up our odML file, let us attach now this new Property to the
previously created Section 'TheCrew'::
>>> MYodML['TheCrew'].append(prop2)
Note that it is also possible to add a metadata entry later on::
>>> prop2.append("Blind Passenger")
>>> print(MYodML['TheCrew'].properties['NameCrewMembers'].values)
['Arthur Philip Dent',
'Zaphod Beeblebrox',
'Tricia Marie McMillan',
'Ford Prefect',
'Blind Passenger']
The tuple datatype you might have noticed in the dtype table above has to be
specially handled. It is intended to enforce a specific number of datapoints
for each value entry. This is useful in case of 2D or 3D data, where all
datapoints always have to be present for each entry.
The dtype itself has to contain the number corresponding to the required value
data points. For the value data points themselves, they have to be enclosed
by brackets and separated by a semicolon.
>>> pixel_prop = odml.Property(name="pixel map")
>>> pixel_prop.dtype = "2-tuple"
>>> pixel_prop.values = ["(1; 2)", "(3; 4)"]
>>> voxel_prop = odml.Property(name="voxel map")
>>> voxel_prop.dtype = "3-tuple"
>>> voxel_prop.values = "(1; 2; 3)"
Please note, that inconsistent tuple values will raise an error:
>>> tprop = odml.Property(name="tuple fail")
>>> tprop.dtype = "3-tuple"
>>> tprop.values = ["(1; 2)"]
Printing XML-representation of an odML file:
--------------------------------------------
Although the XML-representation of an odML file is a bit hard to read, it is
sometimes helpful to check, especially during a generation process, how the
hierarchical structure of the odML file looks like.
Let's have a look at the XML-representation of our small odML file we just
generated::
>>> print(odml.tools.xmlparser.XMLWriter(MYodML))
1979-10-12
Information on the crew
List of crew members names
NameCrewMembers
person
[Arthur Philip Dent,Zaphod Beeblebrox,Tricia Marie McMillan,Ford Prefect,Blind Passenger
]
TheCrew
Information on Arthur Dent
Sex of the subject
Gender
string
[male
]
Arthur Philip Dent
crew/person
crew
42
file:///home/zehl/Projects/toolbox/python-odml/doc/example_odMLs/THGTTG.odml
D. N. Adams
Saving an odML file:
--------------------
You can save your odML file using the following command::
>>> save_to = '/home/usr/toolbox/python-odml/doc/example_odMLs/myodml.odml'
>>> odml.save(MYodML, save_to)
Loading an odML file:
---------------------
You already learned how to load the example odML file. Here just as a reminder
you can try to reload your own saved odML file::
>>> my_reloaded_odml = odml.load(save_to)
-------------------------------------------------------------------------------
Advanced odML-Features
======================
Advanced knowledge on Values
----------------------------
Data type conversions
*********************
After creating a Property with metadata the data type can be changed and the
format of the corresponding entry will converted to the new data type, if the
new format is valid for the given metadata::
>>> test_dtype_conv = odml.Property('p', values=1.0)
>>> print(test_dtype_conv.values)
[1.0]
>>> print(test_dtype_conv.dtype)
float
>>> test_dtype_conv.dtype = odml.DType.int
>>> print(test_dtype_conv.values)
[1]
>>> print(test_dtype_conv.dtype)
int
If the conversion is invalid a ValueError is raised.
Also note, that during such a process metadata loss may occur if a float is
converted to an integer and then back to a float::
>>> test_dtype_conv = odml.Property('p', values=42.42)
>>> print(test_dtype_conv.values)
[42.42]
>>> test_dtype_conv.dtype = odml.DType.int
>>> test_dtype_conv.dtype = odml.DType.float
>>> print(test_dtype_conv.values)
[42.0]
Advanced knowledge on Properties
--------------------------------
Advanced knowledge on Sections
------------------------------
Links & Includes
****************
(DEPRECATED; new version coming soon)
Sections can be linked to other Sections, so that they include their defined
attributes. A link can be within the document (``link`` property) or to an
external one (``include`` property).
After parsing a document, these links are not yet resolved, but can be using
the :py:meth:`odml.doc.BaseDocument.finalize` method::
>>> d = xmlparser.load("sample.odml")
>>> d.finalize()
Note: Only the parser does not automatically resolve link properties, as the referenced
sections may not yet be available.
However, when manually setting the ``link`` (or ``include``) attribute, it will
be immediately resolved. To avoid this behaviour, set the ``_link`` (or ``_include``)
attribute instead.
The object remembers to which one it is linked in its ``_merged`` attribute.
The link can be unresolved manually using :py:meth:`odml.section.BaseSection.unmerge`
and merged again using :py:meth:`odml.section.BaseSection.merge`.
Unresolving means to remove sections and properties that do not differ from their
linked equivalents. This should be done globally before saving using the
:py:meth:`odml.doc.BaseDocument.clean` method::
>>> d.clean()
>>> xmlparser.XMLWriter(d).write_file('sample.odml')
Changing a ``link`` (or ``include``) attribute will first unmerge the section and
then set merge with the new object.
Terminologies
*************
(deprecated; new version coming soon)
odML supports terminologies that are data structure templates for typical use cases.
Sections can have a ``repository`` attribute. As repositories can be inherited,
the current applicable one can be obtained using the
:py:meth:`odml.section.BaseSection.get_repository` method.
To see whether an object has a terminology equivalent, use the
:py:meth:`odml.property.BaseProperty.get_terminology_equivalent`
method, which returns the corresponding object of the terminology.