tutorial.rst 37 KB


  1. =============
  2. odML Tutorial
  3. =============
  4. :Author:
  5. Lyuba Zehl;
  6. based on work by Hagen Fritsch
  7. :Release:
  8. 0.1
  9. :License:
  10. Creative Commons Attribution-ShareAlike 4.0 International
  11. `License <http://creativecommons.org/licenses/by-sa/4.0/>`_
  12. -------------------------------------------------------------------------------
  13. odML (open metadata Markup Language)
  14. ====================================
  15. odML (open metadata Markup Language) is an XML based file format,
  16. proposed by [Grewe et al. (2011) Front Neuroinform 5:16], in order
  17. to provide metadata in an organized, human- and machine-readable way.
  18. In this tutorial we will illustrate the conceptual design of odML and
  19. show hands-on how you can generate your own odML metadata collection.
  20. In addition, we demonstrate the advantages of using odML to screen
  21. large numbers of data sets according to selection criteria relevant for
  22. subsequent analyses. Well organized metadata management is a key
  23. component to guarantee reproducibility of experiments and to track
  24. provenance of performed analyses.
  25. What are metadata and why are they needed?
  26. Metadata are data about data. They describe the conditions under which the
  27. actual raw-data of an experimental study were acquired. The organization of
  28. such metadata and their accessibility may sound like a trivial task, and
  29. most laboratories developed their home-made solutions to keep track of
  30. their metadata. Most of these solutions, however, break down if data and
  31. metadata need to be shared within a collaboration, because implicit
  32. knowledge of what is important and how it is organized is often
  33. underestimated.
  34. While maintaining the relation to the actual raw-data, odML can help to
  35. collect all metadata which are usually distributed over several files and
  36. formats, and to store them unitetly which facilitates sharing data and
  37. metadata.
  38. Key features of odML
  39. - open, XML based language, to collect, store and share metadata
  40. - Machine- and human-readable
  41. - Interactive odML-Editor
  42. - Python-odML library
  43. -------------------------------------------------------------------------------
  44. Structure of this tutorial
  45. ==========================
  46. The scientific background of the possible user community of odML varies
  47. enormously (e.g. physics, informatics, mathematics, biology, medicine,
  48. psychology). Some users will be trained programmers, others probably have never
  49. learned a programming language.
  50. To cover the different demands of all users, we first provide a slow
  51. introduction to odML that allows programming beginners to learn the basic
  52. concepts. In a next step, we will demonstrate how to generate an odML file via
  53. the Python-odML library. In later chapters we present more advanced possibilies
  54. of the Python-odML library (e.g. how to search for certain metadata or how to
  55. integrate existing terminologies or templates).
  56. Although the structure of an odML is depending on the needs of each individual
  57. user, we would like to provide at the end of this tutorial a few guidelines.
  58. The code for the example odML files, which we use within this tutorial is part
  59. of the documentation package (see doc/example_odMLs/).
  60. A summary of available odML terminologies and templates can be found `here
  61. <http://portal.g-node.org/odml/terminologies/v1.0/terminologies.xml>`_.
  62. -------------------------------------------------------------------------------
  63. Download and Installation
  64. =========================
  65. The Python-odML library (including the odML-Editor) is available on
  66. `GitHub <https://github.com/G-Node/python-odml>`_. If you are not familiar with
  67. the version control system **git**, but still want to use it, have a look at
  68. the documentaion available on the `git-scm website <https://git-scm.com/>`_.
  69. Dependencies
  70. ------------
  71. The Python-odML library runs under Python 2.7.
  72. Additionally, the Python-odML library depends on Enum (version 0.4.4).
  73. Installation
  74. ------------
  75. To download the Python-odML library please either use git and clone the
  76. repository from GitHub::
  77. $ cd /home/usr/toolbox/
  78. $ git clone https://github.com/G-Node/python-odml.git
  79. ... or if you don't want to use git download the ZIP file also provided on
  80. GitHub to your computer (e.g. as above on your home directory under a "toolbox"
  81. folder).
  82. To install the Python-odML library, enter the corresponding directory and run::
  83. $ cd /home/usr/toolbox/python-odml/
  84. $ python setup.py install
  85. Bugs & Questions
  86. ----------------
  87. Should you find a behaviour that is likely a bug, please file a bug report at
  88. `the github bug tracker <https://github.com/G-Node/python-odml/issues>`_.
  89. If you have questions regarding the use of the library or the editor, ask
  90. the question on `Stack Overflow <http://stackoverflow.com/>`_, be sure to tag
  91. it with `odml` and we'll do our best to quickly solve the problem.
  92. -------------------------------------------------------------------------------
  93. Basic knowledge on odML
  94. =======================
  95. Before we start, it is important to know the basic structure of an odML
  96. file. Within an odML file metadata are grouped and stored in a
  97. hierarchical tree structure which consists of four different odML
  98. objects.
  99. Document
  100. - corresponds to the root of the tree (groups everything together)
  101. - *parent*: no parent
  102. - *children*: Section
  103. Section
  104. - corresponds to (big) branches of the tree
  105. - *parent*: Section or Document
  106. - *children*: Section and/or Property
  107. Property
  108. - corresponds to (small) branches of the tree (groups values)
  109. - *parent*: Section
  110. - *children*: at least one Value
  111. Value
  112. - corresponds to leaf of the tree (contains metadata)
  113. - *parent*: Property
  114. - *children*: no children
  115. Each of these odML objects has a certain set of attributes where the
  116. user can describe the object and its contents. Which attribute belongs
  117. to which object and what the attributes are used for, is better explained
  118. in an example odML file (e.g., "THGTTG.odml").
  119. A first look
  120. ============
  121. If you want to get familiar with the concept behind an odML and how to handle
  122. odML files in Python, you can have a first look at the example odML file
  123. provided in the Python-odML library. For this you first need to run the python
  124. code ("thgttg.py") to generate the example odML file ("THGTTG.odml")::
  125. $ cd /home/usr/toolbox/python-odml/doc/example_odMLs/
  126. $ ls
  127. thgttg.py
  128. $ python thgttg.py
  129. $ ls
  130. THGTTG.odml thgttg.py
  131. Now open a Python shell within the Python-odML library directory, e.g. with
  132. IPython::
  133. $ cd /home/usr/toolbox/python-odml/
  134. $ ipython
  135. In the IPython shell, first import the odml package::
  136. >>> import odml
  137. Second, load the example odML file with the following command lines::
  138. >>> to_load = '/home/usr/toolbox/python-odml/doc/example_odMLs/THGTTG.odml'
  139. >>> odmlEX = odml.tools.xmlparser.load(to_load)
  140. If you open a Python shell outside of the Python-odML library directory, please
  141. adapt your Python-Path and the path to the "THGTTG.odml" file accordingly.
  142. How you can access the different odML objects and their attributes once you
  143. loaded an odML file and how you can make use of the attributes is described in
  144. more detail in the following chapters for each odML object type (document,
  145. section, property, value). Please note that some attributes are obligatory,
  146. some are recommended and others are optional. The optional attributes are
  147. important for the advanced odML possibilities and can for now be ignored by
  148. odML beginners. You can find an example of their usage in later chapters.
  149. The Document
  150. ------------
  151. If you loaded the example odML file, you can have a first look at the Document
  152. either by explicitely calling the odml object,...::
  153. >>> print odmlEX.document
  154. <Doc 42 by Douglas Adams (2 sections)>
  155. ... or using the following short cut::
  156. >>> print odmlEX
  157. <Doc 42 by Douglas Adams (2 sections)>
  158. As you can see, both commands will printout the same short summary about the
  159. Document of the loaded example odML file. In the following we will only use the
  160. short cut notation.
  161. The print out gives you already the follwing information about the odML file:
  162. - '<...>' indicates that you are looking at an object
  163. - 'Doc' tells you that you are looking at an odML Document
  164. - '42' is the version of the odML file
  165. - 'by D. N. Adams' states the author of the odML file
  166. - '(2 sections)' tells you that this odML Document has 2 Section directly
  167. appended
  168. Note that the Document printout tells you nothing about the depth of the
  169. complete tree structure, because it is not displaying the children of its
  170. directly attached Sections. It also does not display all Document attributes.
  171. In total, a Document has the following 4 attributes:
  172. author
  173. - recommended Document attribute
  174. - The author of this odML file.
  175. date
  176. - recommended Document attribute
  177. - The date this odML file was created (yyyy-mm-dd format).
  178. repository
  179. - optional Document attribute
  180. - The URL to the repository of terminologies used in this odML file.
  181. version
  182. - recommended Document attribute
  183. - The version of this odML file.
  184. Let's find out what attributes were defined for our example Document using the
  185. following commands::
  186. >>> odmlEX.author
  187. 'D. N. Adams'
  188. >>> odmlfile.date
  189. '1979-10-12'
  190. >>> odmlEX.version
  191. 42
  192. >>> odmlEX.repository
  193. As you learned in the beginning, Sections can be attached to a Document, as the
  194. first hierarchy level of the odML file. Let's have a look which Sections were
  195. attached to the Document of our example odML file using the following command::
  196. >>> odmlEX.sections
  197. [<Section TheCrew[crew] (4)>, <Section TheStarship[crew] (1)>]
  198. The printout of a Section is explained in the next chapter.
  199. The Sections
  200. ------------
  201. There are several ways to access Sections. You can either call them by name or
  202. by index using either explicitely the function that returns the list of
  203. Sections (see last part of 'The Document' chapter) or using again a short cut
  204. notation. Let's test all the different ways to access a Section, by having a
  205. look at the first Section in the sections list attached to the Document in our
  206. example odML file::
  207. >>> odmlEX.sections['TheCrew']
  208. <Section TheCrew[crew] (4)>
  209. >>> odmlEX.sections[0]
  210. <Section TheCrew[crew] (4)>
  211. >>> odmlEX['TheCrew']
  212. <Section TheCrew[crew] (4)>
  213. >>> odmlEX[0]
  214. <Section TheCrew[crew] (4)>
  215. In the following we will use the short cut notation and calling Sections
  216. explicitely by their name.
  217. The printout of a Section is similar to the Document printout and gives you
  218. already the following information:
  219. - '<...>' indicates that you are looking at an object
  220. - 'Section' tells you that you are looking at an odML Section
  221. - 'TheCrew' tells you that the Section was named 'TheCrew'
  222. - '[...]' highlights the type of the Section (here 'crew')
  223. - '(4)' states that this Section has four sub-Sections directly attached to it
  224. Note that the Section printout tells you nothing about the number of attached
  225. Properties or again about the depth of a possible sub-Section tree below the
  226. directly attached ones. It also only list the type of the Section as one of the
  227. Section attributes. In total, a Section can be defined by the following 5
  228. attributes:
  229. name
  230. - obligatory Section attribute
  231. - The name of the section. Should describe what kind of information can be
  232. found in this section.
  233. definition
  234. - recommended Section attribute
  235. - The definition of the content within this section.
  236. type
  237. - recommended Section attribute
  238. - The classification type which allows to connect related Sections due to
  239. a superior semantic context.
  240. reference
  241. - optional Section attribute
  242. - The ?
  243. repository
  244. - optional Section attribute
  245. - The URL to the repository of terminologies used in this odML file.
  246. Let's have a look what attributes were defined for the Section "TheCrew" using
  247. the following commands::
  248. >>> odmlEX['TheCrew'].name
  249. 'TheCrew'
  250. >>> odmlEX['TheCrew'].definition
  251. 'Information on the crew'
  252. >>> odmlEX['TheCrew'].type
  253. 'crew'
  254. >>> odmlEX['TheCrew'].reference
  255. >>> odmlEX['TheCrew'].repository
  256. To see which Sections are directly attached to the Section 'TheCrew' use again
  257. the following command::
  258. >>> odmlEX['TheCrew'].sections
  259. [<Section Arthur Philip Dent[crew/person] (0)>,
  260. <Section Zaphod Beeblebrox[crew/person] (0)>,
  261. <Section Tricia Marie McMillan[crew/person] (0)>,
  262. <Section Ford Prefect[crew/person] (0)>]
  263. For accessing these sub-Sections you can use again all the following commands::
  264. >>> odmlEX['TheCrew'].sections['Ford Prefect']
  265. <Section Ford Prefect[crew/person] (0)>
  266. >>> odmlEX['TheCrew'].sections[3]
  267. <Section Ford Prefect[crew/person] (0)>
  268. >>> odmlEX['TheCrew']['Ford Prefect']
  269. <Section Ford Prefect[crew/person] (0)>
  270. >>> odmlEX['TheCrew'][3]
  271. <Section Ford Prefect[crew/person] (0)>
  272. Besides sub-Sections a Section can also have Properties attached. To see if and
  273. which Properties are attached to the Section 'TheCrew' you have to use the
  274. following command::
  275. >>> odmlEX['TheCrew'].properties
  276. [<Property NameCrewMembers>, <Property NoCrewMembers>]
  277. The printout of a Property is explained in the next chapter.
  278. The Properties
  279. --------------
  280. Properties need to be called explicitely via the properties function of a
  281. Section. You can then, either call a Property by name or by index::
  282. >>> odmlEX['TheCrew'].properties['NoCrewMembers']
  283. <Property NoCrewMembers>
  284. >>> odmlEX['Setup'].properties[1]
  285. <Property NoCrewMembers>
  286. In the following we will only call Properties explicitely by their name.
  287. The Property printout is reduced and only gives you information about the
  288. following:
  289. - '<...>' indicates that you are looking at an object
  290. - 'Property' tells you that you are looking at an odML Property
  291. - 'NoCrewMembers' tells you that the Property was named 'NoCrewMembers'
  292. Note that the Property printout tells you nothing about the number of Values,
  293. and very little about the Property attributes. In total, a Property can be
  294. defined by the following 6 attributes:
  295. name
  296. - obligatory Property attribute
  297. - The name of the Property. Should describe what kind of Values can be
  298. found in this Property.
  299. value
  300. - obligatory Property attribute
  301. - The value container of this property. See in 'The Value' chapter for
  302. details.
  303. definition
  304. - recommended Property attribute
  305. - The definition of this Property.
  306. dependency
  307. - optional Property attribute
  308. - A name of another Property within the same section, which this property
  309. depends on.
  310. dependency_value
  311. - optional Property attribute
  312. - Value of the other Property specified in the 'dependency' attribute on
  313. which this Property depends on.
  314. mapping
  315. - optional Property attribute
  316. - The odML path within the same odML file (internal link) to another
  317. Section to which all children of this section, if a conversion is
  318. requested, should be transferred to, as long as the children not
  319. themselves define a mapping.
  320. Let's check which attributes were defined for the Property "NoCrewMembers"::
  321. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].name
  322. 'NoCrewMembers'
  323. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].definition
  324. 'Number of crew members'
  325. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].dependency
  326. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].dependency_value
  327. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].mapping
  328. The Value or Values attached to a Property can be accessed via two different
  329. commands. If only one value object was attached to the Property, the first
  330. command returns directly a Value::
  331. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value
  332. <int 4>
  333. If multiple Values were attached to the Property, a list of Values is
  334. returned::
  335. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].value
  336. [<string Arthur Philip Dent>, <string Zaphod Beeblebrox>,
  337. <string Tricia Marie McMillan>, <string Ford Prefect>]
  338. The second command will always return a list independent of the number of
  339. Values attached::
  340. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].values
  341. [<int 4>]
  342. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values
  343. [<string Arthur Philip Dent>, <string Zaphod Beeblebrox>,
  344. <string Tricia Marie McMillan>, <string Ford Prefect>]
  345. The printout of the Value is explained in the next chapter.
  346. The Values
  347. ----------
  348. Depending on how many Values are attached to a Property, it can be accessed
  349. in two different ways. If you know, only one value is attached, you can use the
  350. following command::
  351. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value
  352. <int 4>
  353. If you know, more then one Value is attached, and you would like for e.g.,
  354. access the forth one you can use::
  355. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3]
  356. <string Ford Prefect>
  357. The Value printout is reduced and only gives you information about the
  358. following:
  359. - '<...>' indicates that you are looking at an object
  360. - 'int' tells you that the value has the odml data type (dtype) 'int'
  361. - '4' is the actual data stored within the value object
  362. In total, a Value can be defined by the following 6 attributes:
  363. data
  364. - obligatory Value attribute
  365. - The actual metadata value.
  366. dtype
  367. - recommended Value attribute
  368. - The odml data type of the given metadata value.
  369. definition
  370. - recommended Value attribute
  371. - The definition of the given metadata value.
  372. uncertainty
  373. - recommended Value attribute
  374. - Can be used to specify the uncertainty of the given metadata value.
  375. unit
  376. - recommended Value attribute
  377. - The unit of the given metadata value, if it has a unit.
  378. reference
  379. - optional Value attribute
  380. - The ?
  381. filename
  382. - optional Value attribute
  383. - The ?
  384. encoder
  385. - optional Value attribute
  386. - Name of the applied encoder used to encode a binary metadata value into
  387. ascii.
  388. checksum
  389. - optional Value attribute
  390. - Checksum and name of the algorithm that calculated the checksum of a
  391. given binary metadata value (algorithm$checksum format)
  392. Let's see which attributes were defined for the Value of the Property
  393. 'NoCrewMembers' of the Section 'TheCrew'::
  394. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.data
  395. 4
  396. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.dtype
  397. 'int'
  398. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.definition
  399. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.uncertainty
  400. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.unit
  401. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.reference
  402. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.filename
  403. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.encoder
  404. >>> odmlEX['TheCrew'].properties['NoCrewMembers'].value.checksum
  405. Note that these commands are for Properties containing one Value. For
  406. accessing attributes of a Value of a Property with multiple Values use::
  407. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].data
  408. 'Ford Prefect'
  409. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].dtype
  410. 'person'
  411. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].definition
  412. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].uncertainty
  413. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].unit
  414. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].reference
  415. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].filename
  416. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].encoder
  417. >>> odmlEX['TheCrew'].properties['NameCrewMembers'].values[3].checksum
  418. If you would like to get all the actual metadata values back from a Property
  419. with multiple Values, iterate over the Values list::
  420. >>> all_metadata = []
  421. >>> for val in doc['TheCrew'].properties['NameCrewMembers'].values:
  422. ... all_metadata.append(val.data)
  423. ...
  424. >>> all_metadata
  425. ['Arthur Philip Dent', 'Zaphod Beeblebrox',
  426. 'Tricia Marie McMillan', 'Ford Prefect']
  427. ------------------------------------------------------------------------
  428. Generating an odML-file
  429. =======================
  430. After getting familiar with the different odml objects and their attributes,
  431. you will now learn how to generate your own odML file by reproducing some parts
  432. of the example odml file we presented before.
  433. We will show you first how to create the different odML objects with their
  434. obligatory and recommended attributes. Please have a look at the tutorial part
  435. describing the advanced possibilities of the Python odML library for the usage
  436. of all other attributes.
  437. If you opened a new IPython shell, please import first again the odml package::
  438. >>> import odml
  439. Create a document
  440. -----------------
  441. Let's start by creating the Document::
  442. >>> MYodML = odml.Document(author='Douglas Adams',
  443. version=42)
  444. You can check if your new Document contains actually what you created by using
  445. some of the commands you learned before::
  446. >>> MYodML
  447. >>> <Doc 42 by Douglas Adams (0 sections)>
  448. >>> MYodML.date
  449. As you can see, we created a Document with the same attributes as the example,
  450. except that we forgot to define the date. Note that you can always edit
  451. attributes of generated odml objects. For this let's first import the Python
  452. package datetime::
  453. >>> import datetime as dt
  454. Now we edit the date attribute of the Document::
  455. >>> MYodML.date = dt.date(1979, 10, 12)
  456. >>> MYodML.date
  457. '1979-10-12'
  458. Another part which is still missing is that so far we have no Sections attached
  459. to our Document. Let's change this!
  460. Create a section
  461. ----------------
  462. We now create a Section by reproducing the Section "TheCrew" of the example
  463. odml file from the beginning::
  464. >>> sec = odml.Section(name='TheCrew',
  465. definition='Information on the crew',
  466. type='crew')
  467. Check if your new Section contains actually what you created::
  468. >>> sec.name
  469. 'TheCrew'
  470. >>> sec.definition
  471. 'Information on the crew'
  472. >>> sec.type
  473. 'crew'
  474. Now we need to attach the Section to our previously generated Document::
  475. >>> MYodML.append(sec)
  476. >>> MYodML
  477. <Doc 42 by Douglas Adams (1 sections)>
  478. >>> MYodML.sections
  479. [<Section TheCrew[crew] (0)>]
  480. We repeat the procedure to create now a second Section which we will attach as
  481. a sub-Section to the Section 'TheCrew'::
  482. >>> sec = odml.Section(name='Arthur Philip Dent',
  483. definition='Information on Arthur Dent',
  484. type='crew/person')
  485. >>> sec
  486. <Section Arthur Philip Dent[crew/person] (0)>
  487. >>> MYodML['TheCrew'].append(sec)
  488. >>> MYodML.sections
  489. [<Section TheCrew[crew] (0)>]
  490. >>> MYodML['TheCrew'].sections
  491. [<Section Arthur Philip Dent[crew/person] (0)>]
  492. Note that all of our created Sections do not contain any Properties and Values,
  493. yet. Let's see if we can change this...
  494. Create a Property-Value(s) pair:
  495. --------------------------------
  496. The creation of a Property is not independent from creating a Value, because a
  497. Property always needs at least on Value attached. Therefore we will demonstrate
  498. the creation of Value and Property together.
  499. Let's first create a Property with a single Value::
  500. >>> val = odml.Value(data="male",
  501. dtype=odml.DType.string)
  502. >>> val
  503. <string male>
  504. >>> prop = odml.Property(name='Gender',
  505. definition='Sex of the subject',
  506. value=val)
  507. >>> prop
  508. <Property Gender>
  509. >>> prop.value
  510. <string male>
  511. As you can see, we define a odML data type (dtype) for the Value. Generally,
  512. you can use the following odML data types to describe the format of the stored
  513. metadata:
  514. +-----------------------------------+---------------------------------------+
  515. | dtype | required data examples |
  516. +===================================+=======================================+
  517. | odml.DType.int or 'int' | 42 |
  518. +-----------------------------------+---------------------------------------+
  519. | odml.DType.float or 'float' | 42.0 |
  520. +-----------------------------------+---------------------------------------+
  521. | odml.DType.boolean or 'boolean' | True or False |
  522. +-----------------------------------+---------------------------------------+
  523. | odml.DType.string or 'string' | 'Earth' |
  524. +-----------------------------------+---------------------------------------+
  525. | odml.DType.date or 'date' | dt.date(1979, 10, 12) |
  526. +-----------------------------------+---------------------------------------+
  527. | odml.DType.datetime or 'datetime' | dt.datetime(1979, 10, 12, 11, 11, 11) |
  528. +-----------------------------------+---------------------------------------+
  529. | odml.DType.time or 'time' | dt.time(11, 11, 11) |
  530. +-----------------------------------+---------------------------------------+
  531. | odml.DType.person or 'person' | 'Zaphod Beeblebrox' |
  532. +-----------------------------------+---------------------------------------+
  533. | odml.DType.text or 'text' | |
  534. +-----------------------------------+---------------------------------------+
  535. | odml.DType.url or 'url' | "https://en.wikipedia.org/wiki/Earth" |
  536. +-----------------------------------+---------------------------------------+
  537. | odml.DType.binary or 'binary' | '00101010' |
  538. +-----------------------------------+---------------------------------------+
  539. The available types are implemented in the odml.types Module.
  540. After learning how we create a simple Porperty-Value-pair, we need to know how
  541. we can attach it to a Section. As exercise, we attach our first Porperty-Value-
  542. pair to the sub-Section 'Arthur Philip Dent'::
  543. >>> MYodML['TheCrew']['Arthur Philip Dent'].append(prop)
  544. >>> MYodML['TheCrew']['Arthur Philip Dent'].properties
  545. [<Property Gender>]
  546. If the odML data type of a Value is distinctly deducible ('int', 'float',
  547. 'boolean', 'string', 'date', 'datetime', or 'time'), you can also use a short
  548. cut to create a Property-Value pair::
  549. >>> prop = odml.Property(name='Gender',
  550. definition='Sex of the subject',
  551. value='male')
  552. >>> prop
  553. <Property Gender>
  554. >>> prop.value
  555. <string male>
  556. Mark that this short cut will not work for the following odML data types
  557. 'person', 'text', 'url', and 'binary', because they are not automatically
  558. distinguishable from the odML data type 'string'.
  559. Next we learn how to create a Property with multiple Values attached to it::
  560. >>> vals = [odml.Value(data='Arthur Philip Dent',
  561. dtype=odml.DType.person),
  562. odml.Value(data='Zaphod Beeblebrox',
  563. dtype=odml.DType.person),
  564. odml.Value(data='Tricia Marie McMillan',
  565. dtype=odml.DType.person),
  566. odml.Value(data='Ford Prefect',
  567. dtype=odml.DType.person)]
  568. >>> vals
  569. [<person Arthur Philip Dent>, <person Zaphod Beeblebrox>,
  570. <person Tricia Marie McMillan>, <person Ford Prefect>]
  571. >>> prop = odml.Property(name = 'NameCrewMembers',
  572. definition = 'List of crew members names',
  573. value = vals)
  574. >>> prop
  575. <Property NameCrewMembers>
  576. >>> prop.values
  577. [<person Arthur Philip Dent>, <person Zaphod Beeblebrox>,
  578. <person Tricia Marie McMillan>, <person Ford Prefect>]
  579. To build up our odML file further, we attach this Porperty-Values-pair to
  580. the Section 'TheCrew'::
  581. >>> MYodML['TheCrew'].append(prop)
  582. >>> MYodML['TheCrew'].properties
  583. [<Property NameCrewMembers>]
  584. Just to illustrate you again, we could also make use again of the short cut
  585. notation, if we would agree to use the odML data type 'string' instead of
  586. 'person' for our Porperty-Values-pair::
  587. >>> prop = odml.Property(name = 'NameCrewMembers',
  588. definition = 'List of crew members names',
  589. value = ['Arthur Philip Dent',
  590. 'Zaphod Beeblebrox',
  591. 'Tricia Marie McMillan',
  592. 'Ford Prefect'])
  593. >>> prop.value
  594. [<string Arthur Philip Dent>, <string Zaphod Beeblebrox>,
  595. <string Tricia Marie McMillan>, <string Ford Prefect>]
  596. Note that this short cut also works for creating a Property with a list of
  597. Values of different data types, e.g.::
  598. >>> prop = odml.Property(name = 'TestMultipleValueList',
  599. definition = 'List of Values of with different '
  600. 'odML data types',
  601. value = [42,
  602. 42.0,
  603. True,
  604. "Don't Panic",
  605. dt.date(1979, 10, 12),
  606. dt.datetime(1979, 10, 12, 11, 11, 11),
  607. dt.time(11, 11, 11)])
  608. >>> prop.values
  609. [<int 42>,
  610. <float 42.0>,
  611. <boolean True>,
  612. <string Don't Panic>,
  613. <date 1979-10-12>,
  614. <datetime 1979-10-12 11:11:11>,
  615. <time 11:11:11>]
  616. A third way to create a Porperty with multiple Values would be to attach first
  617. one Value and the append further Values later on::
  618. >>> val = odml.Value(data="Arthur Philip Dent",
  619. type=odml.DType.person)
  620. >>> prop = odml.Property(name = 'NameCrewMembers',
  621. definition = 'List of crew members names',
  622. value = val)
  623. >>> prop.values
  624. [<person Arthur Philip Dent>]
  625. >>> val = odml.Value(data="Zaphod Beeblebrox",
  626. type=odml.DType.person)
  627. >>> prop.append(val)
  628. >>> prop.values
  629. [<person Arthur Philip Dent>, <person Zaphod Beeblebrox>]
  630. >>> val = odml.Value(data="Tricia Marie McMillan",
  631. type=odml.DType.person)
  632. >>> prop.append(val)
  633. >>> prop.values
  634. [<person Arthur Philip Dent>, <person Zaphod Beeblebrox>,
  635. <person Tricia Marie McMillan>]
  636. >>> val = odml.Value(data="Ford Prefect",
  637. type=odml.DType.person)
  638. >>> prop.append(val)
  639. >>> prop.values
  640. [<person Arthur Philip Dent>, <person Zaphod Beeblebrox>,
  641. <person Tricia Marie McMillan>, <person Ford Prefect>]
  642. Printing XML-representation of an odML file:
  643. --------------------------------------------
  644. Although the XML-representation of an odML file is a bit hard to read, it is
  645. sometimes helpful to check, especially during a generation process, how the
  646. hierarchical structure of the odML file looks like.
  647. Let's have a look at the XML-representation of our small odML file we just
  648. generated::
  649. >>> print unicode(odml.tools.xmlparser.XMLWriter(MYodML))
  650. <odML version="1">
  651. <date>1979-10-12</date>
  652. <section>
  653. <definition>Information on the crew</definition>
  654. <property>
  655. <definition>List of crew members names</definition>
  656. <value>Arthur Philip Dent<type>person</type></value>
  657. <value>Zaphod Beeblebrox<type>person</type></value>
  658. <value>Tricia Marie McMillan<type>person</type></value>
  659. <value>Ford Prefect<type>person</type></value>
  660. <name>NameCrewMembers</name>
  661. </property>
  662. <name>TheCrew</name>
  663. <section>
  664. <definition>Information on Arthur Dent</definition>
  665. <property>
  666. <definition>Sex of the subject</definition>
  667. <value>male<type>string</type></value>
  668. <name>Gender</name>
  669. </property>
  670. <name>Arthur Philip Dent</name>
  671. <type>crew/person</type>
  672. </section>
  673. <type>crew</type>
  674. </section>
  675. <version>42</version>
  676. <author>Douglas Adams</author>
  677. </odML>
  678. Saving an odML file:
  679. --------------------
  680. You can save your odML file using the following command::
  681. >>> save_to = '/home/usr/toolbox/python-odml/doc/example_odMLs/myodml.odml'
  682. >>> odml.tools.xmlparser.XMLWriter(MYodML).write_file(save_to)
  683. Loading an odML file:
  684. ---------------------
  685. You already learned how to load the example odML file. Here just as a reminder
  686. you can try to reload your own saved odML file::
  687. >>> to_load = '/home/usr/toolbox/python-odml/doc/example_odMLs/myodml.odml'
  688. >>> my_reloaded_odml = odml.tools.xmlparser.load(to_load)
  689. -------------------------------------------------------------------------------
  690. Advanced odML-Features
  691. ======================
  692. Advanced knowledge on Values
  693. ----------------------------
  694. Data type conversions
  695. *********************
  696. After creating a Value the data type can be changed and the corresponding Value
  697. will converted to the new data type, if the new format is valid for the given
  698. metadata::
  699. >>> test_value = odml.Value(data=1.0)
  700. >>> test_value
  701. <float 1.0>
  702. >>> test_value.dtype = odml.DType.int
  703. >>> test_value
  704. <int 1>
  705. >>> test_value.dtype = odml.DType.boolean
  706. >>> test_value
  707. <boolean True>
  708. If the conversion is invalid a ValueError is raised::
  709. >>> test_value.dtype = odml.DType.date
  710. Traceback (most recent call last):
  711. File "<stdin>", line 1, in <module>
  712. File "/home/zehl/Projects/toolbox/python-odml/odml/value.py", line 163, in dtype
  713. raise ValueError("cannot convert '%s' from '%s' to '%s'" % (self.value, old_type, new_type))
  714. ValueError: cannot convert 'True' from 'boolean' to 'date'
  715. Also note, that during such a process, metadata loss may occur if a float is
  716. converted to an integer and then back to a float::
  717. >>> test_value = odml.Value(data=42.42)
  718. >>> test_value
  719. <float 42.42>
  720. >>> test_value.dtype = odml.DType.int
  721. >>> test_value
  722. <int 42>
  723. >>> test_value.dtype = odml.DType.float
  724. >>> test_value
  725. <float 42.0>
  726. Binary metadata
  727. ***************
  728. For metadata of binary data type you also need to be specify the correct
  729. encoder. The following table lists all possible encoders of the odML-libarary
  730. and their binary metadata representation:
  731. +------------------+--------------------------+
  732. | binary encoder | binary metadata example |
  733. +==================+==========================+
  734. | quoted-printable | Ford Prefect |
  735. +------------------+--------------------------+
  736. | hexadecimal | 466f72642050726566656374 |
  737. +------------------+--------------------------+
  738. | base64 | Rm9yZCBQcmVmZWN0 |
  739. +------------------+--------------------------+
  740. The encoder can also be edited later on::
  741. >>> test_value = odml.Value(data='Ford Prefect',
  742. dtype=odml.DType.binary,
  743. encoder='quoted-printable')
  744. >>> test_value
  745. <binary Ford Prefect>
  746. >>> test_value.encoder = 'hexadecimal'
  747. >>> test_value
  748. <binary 466f72642050726566656374>
  749. >>> test_value.encoder = 'base64'
  750. >>> test_value
  751. <binary Rm9yZCBQcmVmZWN0>
  752. The checksum of binary metadata is automatically calculated with ``crc32`` as
  753. default checksum::
  754. >>> test_value.checksum
  755. 'crc32$10e6c0cf
  756. Alternatively, ``md5`` can be used for the checksum calculation::
  757. >>> test_value.checksum = "md5"
  758. >>> test_value.checksum
  759. 'md5$c1282d5763e2249028047757b6209518'
  760. Advanced knowledge on Properties
  761. --------------------------------
  762. Dependencies & dependency values
  763. ********************************
  764. (coming soon)
  765. Advanced knowledge on Sections
  766. ------------------------------
  767. Links & Includes
  768. ****************
  769. (deprecated; new version coming soon)
  770. Sections can be linked to other Sections, so that they include their defined
  771. attributes. A link can be within the document (``link`` property) or to an
  772. external one (``include`` property).
  773. After parsing a document, these links are not yet resolved, but can be using
  774. the :py:meth:`odml.doc.BaseDocument.finalize` method::
  775. >>> d = xmlparser.load("sample.odml")
  776. >>> d.finalize()
  777. Note: Only the parser does not automatically resolve link properties, as the referenced
  778. sections may not yet be available.
  779. However, when manually setting the ``link`` (or ``include``) attribute, it will
  780. be immediately resolved. To avoid this behaviour, set the ``_link`` (or ``_include``)
  781. attribute instead.
  782. The object remembers to which one it is linked in its ``_merged`` attribute.
  783. The link can be unresolved manually using :py:meth:`odml.section.BaseSection.unmerge`
  784. and merged again using :py:meth:`odml.section.BaseSection.merge`.
  785. Unresolving means to remove sections and properties that do not differ from their
  786. linked equivalents. This should be done globally before saving using the
  787. :py:meth:`odml.doc.BaseDocument.clean` method::
  788. >>> d.clean()
  789. >>> xmlparser.XMLWriter(d).write_file('sample.odml')
  790. Changing a ``link`` (or ``include``) attribute will first unmerge the section and
  791. then set merge with the new object.
  792. Terminologies
  793. *************
  794. (deprecated; new version coming soon)
  795. odML supports terminologies that are data structure templates for typical use cases.
  796. Sections can have a ``repository`` attribute. As repositories can be inherited,
  797. the current applicable one can be obtained using the :py:meth:`odml.section.BaseSection.get_repository`
  798. method.
  799. To see whether an object has a terminology equivalent, use the :py:meth:`odml.property.BaseProperty.get_terminology_equivalent`
  800. method, which returns the corresponding object of the terminology.
  801. Mappings
  802. ********
  803. (deprecated; new version coming soon)
  804. A sometimes obscure but very useful feature is the idea of mappings, which can
  805. be used to write documents in a user-defined terminology, but provide mapping
  806. information to a standard-terminology that allows the document to be viewed in
  807. the standard-terminology (provided that adequate mapping-information is provided).
  808. See :py:class:`test.mapping.TestMapping` if you need to understand the
  809. mapping-process itself.
  810. Mappings are views on documents and are created as follows::
  811. >>> import odml
  812. >>> import odml.mapping as mapping
  813. >>> doc = odml.Document()
  814. >>> mdoc = mapping.create_mapping(doc)
  815. >>> mdoc
  816. P(<Doc None by None (0 sections)>)
  817. >>> mdoc.__class__
  818. <class 'odml.tools.proxy.DocumentProxy'>
  819. Creating a view has the advantage, that changes on a Proxy-object are
  820. propagated to the original document.
  821. This works quite well and is extensively used in the GUI.
  822. However, be aware that you are typically dealing with proxy objects only
  823. and not all API methods may be available.