|
@@ -2,12 +2,13 @@
|
|
|
"cells": [
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
"source": [
|
|
|
"# Compression\n",
|
|
|
"\n",
|
|
|
"If we take a look at the file size of the \"radar_trap.nix\" file in its last version it is grater than **80MB** (Depends a bit on the number of images stored)!\n",
|
|
|
"\n",
|
|
|
- "The reason is the image data the individual images have a shape of 1024 * 768 * 4 * 4 byte (float32 values) which sums to about 12.5 MB per picture.\n",
|
|
|
+ "The reason is the image data of the individual images have a shape of 1024 * 768 * 4 * 4 byte (float32 values) which sums to about 12.5 MB per picture.\n",
|
|
|
"\n",
|
|
|
"An easy way to work around this is to enable dataset compression in the **HDF5** backend. Simply open a file with the ``DeflateNormal`` flag when creating it.\n",
|
|
|
"\n",
|
|
@@ -18,11 +19,11 @@
|
|
|
"## Exercise\n",
|
|
|
"\n",
|
|
|
" 1. Try it out and compare the file sizes with and without compression.\n"
|
|
|
- ],
|
|
|
- "metadata": {}
|
|
|
+ ]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
"source": [
|
|
|
"\n",
|
|
|
"## Compression comes at a price\n",
|
|
@@ -34,11 +35,11 @@
|
|
|
"``block.create_data_array(\"name\", \"type\", data=data, compression=nixio.Compression.Deflate.No)``\n",
|
|
|
"\n",
|
|
|
"**Note:** Once a DataArray has been created the compression can not be changed."
|
|
|
- ],
|
|
|
- "metadata": {}
|
|
|
+ ]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
"source": [
|
|
|
"# Chunking\n",
|
|
|
"\n",
|
|
@@ -57,20 +58,20 @@
|
|
|
"1. The read and write speed (large datasets can be read faster with larger chunks).\n",
|
|
|
"2. The resize performance and overhead.\n",
|
|
|
"3. The efficiency of the compression.\n"
|
|
|
- ],
|
|
|
- "metadata": {}
|
|
|
+ ]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
"source": [
|
|
|
"\n",
|
|
|
"## Read/write performance\n",
|
|
|
"\n",
|
|
|
- "Generally one could think about large datasets can be written and read faster with large chunks. This is not wrong unless the usual access is in small pieces. Then the backend would need to read the full chunk to memory (probably decompress it) and teh return the small piece of data, the user requested.\n",
|
|
|
+ "Generally one could think about large datasets can be written and read faster with large chunks. This is not wrong unless the usual access is in small pieces. Then the backend would need to read the full chunk to memory (probably decompress it) and then return the small piece of data the user requested.\n",
|
|
|
"\n",
|
|
|
"## Resize performance\n",
|
|
|
"\n",
|
|
|
- "Let's assume that we have already filled the full 9 by 9 ckunk with data. Now we want to increase the dataset by another 3 by 3 bit of data. With the large chunks we would ask the backend to reserve the full 9 by 9 matrix, and write just 9 data points into it. Reserving large amount of memory takes more time, and if not filled up with meaningful data, creates larger files than strictly necessary.\n",
|
|
|
+ "Let's assume that we have already filled the full 9 by 9 chunk with data. Now we want to increase the dataset by another 3 by 3 bit of data. With the large chunks we would ask the backend to reserve the full 9 by 9 matrix, and write just 9 data points into it. Reserving large amounts of memory takes more time, and if not filled up with meaningful data, creates larger files than strictly necessary.\n",
|
|
|
"\n",
|
|
|
"## Compression performance\n",
|
|
|
"\n",
|
|
@@ -81,7 +82,7 @@
|
|
|
"\n",
|
|
|
"``block.create_data_array(\"name\", \"type\", data=data)``\n",
|
|
|
"\n",
|
|
|
- "The **HDF5** backend will try to figure out the optimal chunk size depending on the shape of the data. If one wants to affect the chunking and has a good idea about the usual read and write access patterns (e.g. I know that I will always read one second of data at a time). One can crate the **DataArray** with a defined shape and later write the data.\n",
|
|
|
+ "The **HDF5** backend will try to figure out the optimal chunk size depending on the shape of the data. If one wants to affect the chunking and has a good idea about the usual read and write access patterns (e.g. I know that I will always read one second of data at a time). One can create the **DataArray** with a defined shape and later write the data.\n",
|
|
|
"\n",
|
|
|
"```python\n",
|
|
|
" data_array = block.create_data_array(\"name\", \"id\", dtype=nixio.DataType.Double,\n",
|
|
@@ -93,24 +94,33 @@
|
|
|
"```\n",
|
|
|
"\n",
|
|
|
"**Note:** If we do not provide the data at the time of **DataArray** creation, we need to provide the data type *dtype*.\n"
|
|
|
- ],
|
|
|
- "metadata": {}
|
|
|
+ ]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
"source": [
|
|
|
"\n",
|
|
|
"## Exercise: Let's test the effects of chunking on the write performance.\n",
|
|
|
"\n",
|
|
|
- "1. Use the code below and extend it to test different chunk sizes (chunk_samples controls how many samples per channel ata a time). **Note:** make sure to have the same total number of samples written.\n",
|
|
|
+ "1. Use the code below and extend it to test different chunk sizes (chunk_samples controls how many samples per channel at a time). **Note:** make sure to have the same total number of samples written.\n",
|
|
|
"2. Compare the write performance with and without compression.\n",
|
|
|
"3. Select the \"optimal\" chunking strategy and test the read performance with slices of varying size.\n"
|
|
|
- ],
|
|
|
- "metadata": {}
|
|
|
+ ]
|
|
|
},
|
|
|
{
|
|
|
"cell_type": "code",
|
|
|
"execution_count": 19,
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [
|
|
|
+ {
|
|
|
+ "name": "stdout",
|
|
|
+ "output_type": "stream",
|
|
|
+ "text": [
|
|
|
+ "0.16677212715148926\n"
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ ],
|
|
|
"source": [
|
|
|
"import nixio\n",
|
|
|
"import time\n",
|
|
@@ -154,41 +164,31 @@
|
|
|
"\n",
|
|
|
"time_needed = write_nixfile(\"chunking_test.nix\", chunk_samples=100000, chunk_count=10)\n",
|
|
|
"print(time_needed)\n"
|
|
|
- ],
|
|
|
- "outputs": [
|
|
|
- {
|
|
|
- "output_type": "stream",
|
|
|
- "name": "stdout",
|
|
|
- "text": [
|
|
|
- "0.16677212715148926\n"
|
|
|
- ]
|
|
|
- }
|
|
|
- ],
|
|
|
- "metadata": {}
|
|
|
+ ]
|
|
|
}
|
|
|
],
|
|
|
"metadata": {
|
|
|
- "orig_nbformat": 4,
|
|
|
+ "interpreter": {
|
|
|
+ "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
|
|
|
+ },
|
|
|
+ "kernelspec": {
|
|
|
+ "display_name": "Python 3",
|
|
|
+ "language": "python",
|
|
|
+ "name": "python3"
|
|
|
+ },
|
|
|
"language_info": {
|
|
|
- "name": "python",
|
|
|
- "version": "3.9.6",
|
|
|
- "mimetype": "text/x-python",
|
|
|
"codemirror_mode": {
|
|
|
"name": "ipython",
|
|
|
"version": 3
|
|
|
},
|
|
|
- "pygments_lexer": "ipython3",
|
|
|
+ "file_extension": ".py",
|
|
|
+ "mimetype": "text/x-python",
|
|
|
+ "name": "python",
|
|
|
"nbconvert_exporter": "python",
|
|
|
- "file_extension": ".py"
|
|
|
- },
|
|
|
- "kernelspec": {
|
|
|
- "name": "python3",
|
|
|
- "display_name": "Python 3.9.5 64-bit"
|
|
|
- },
|
|
|
- "interpreter": {
|
|
|
- "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
|
|
|
+ "pygments_lexer": "ipython3",
|
|
|
+ "version": "3.9.5"
|
|
|
}
|
|
|
},
|
|
|
"nbformat": 4,
|
|
|
"nbformat_minor": 2
|
|
|
-}
|
|
|
+}
|