#12 Add nix version of the datasets

ادغام شده
sprenger 40 کامیت ادغام شده از INT/to_nix به INT/master 9 ماه پیش

This PR adds a version of the datasets in nix format, making the use of the ReachGraspIO obsolete for the user.

For further simplification additional changes could be added:

  • provide usage instructions of the nix file in the README
  • rewriting of the example scripts to use the nix file instead of BlackRock files

Warning: The nix files are relatively large (8GB, 12GB). See also #5.

This PR adds a version of the datasets in nix format, making the use of the ReachGraspIO obsolete for the user. For further simplification additional changes could be added: - [x] provide usage instructions of the nix file in the README - [x] rewriting of the example scripts to use the nix file instead of BlackRock files Warning: The nix files are relatively large (8GB, 12GB). See also #5.

I just experimented Alpaca (the provenance tracking tool) with the two new datasets. I have a few suggestions to improve the description of the objects and the structure of the files.

Here is the information for each .analogsignals in the segment (the tuple with information is (ansig.name, ansig.description, ansig.annotations['neural_signal'], ansig.sampling_rate, ansig.file_origin)):

i140703-001.nix

{0: ('Channels: (chan1 chan2 chan3 chan4 chan5 chan6 chan7 chan8 chan9 chan10 '
     'chan11 chan12 chan13 chan14 chan15 chan16 chan17 chan18 chan19 chan20 '
     'chan21 chan22 chan23 chan24 chan25 chan26 chan27 chan28 chan29 chan30 '
     'chan31 chan32 chan33 chan34 chan35 chan36 chan37 chan38 chan39 chan40 '
     'chan41 chan42 chan43 chan44 chan45 chan46 chan47 chan48 chan49 chan50 '
     'chan51 chan52 chan53 chan54 chan55 chan56 chan57 chan58 chan59 chan60 '
     'chan61 chan62 chan63 chan64 chan65 chan66 chan67 chan68 chan69 chan70 '
     'chan71 chan72 chan73 chan74 chan75 chan76 chan77 chan78 chan79 chan80 '
     'chan81 chan82 chan83 chan84 chan85 chan86 chan87 chan88 chan89 chan90 '
     'chan91 chan92 chan93 chan94 chan95 chan96)',
     'AnalogSignal from  nsx2',
     True,
     array(1000.) * Hz,
     None),
 1: ('Channels: (GFpr1 GFside1 GFpr2 GFside2 LF Displ)',
     'AnalogSignal from  nsx2',
     False,
     array(1000.) * Hz,
     None),
 2: ('nsx6', 'AnalogSignal from  nsx6', True, array(30000.) * Hz, None)}

l101210-001.nix

{0: ('nsx2', 'AnalogSignal from  nsx2', False, array(1000.) * Hz, None),
 1: ('nsx5', 'AnalogSignal from  nsx5', True, array(30000.) * Hz, None)}

The descriptions and names are not consistent (behavior and neural signal descriptions are the same for i140703-001.nix; the behavior signal names have the friendly names for one monkey, but just nsx2 for the other, etc.). Furthermore, the names and descriptions are not very telling. It would help to have something more specific, as this helps in inspecting the records.

Moreover, the file_origins are missing (both the attribute and the array annotations with the actual source file for the channels). These I don't know why they are missing as this seems to load if I use Reach2GraspIO.

Finally, it would be good to use a consistent order for the signals: 30 KHz neural signal, behavior, 1 KHz neural signal (as this one is available only for one monkey. A downsampled version from the NS5 could also be generated, as discussed with @denker).

I just experimented Alpaca (the provenance tracking tool) with the two new datasets. I have a few suggestions to improve the description of the objects and the structure of the files. Here is the information for each `.analogsignals` in the segment (the tuple with information is `(ansig.name, ansig.description, ansig.annotations['neural_signal'], ansig.sampling_rate, ansig.file_origin)`): `i140703-001.nix` ``` {0: ('Channels: (chan1 chan2 chan3 chan4 chan5 chan6 chan7 chan8 chan9 chan10 ' 'chan11 chan12 chan13 chan14 chan15 chan16 chan17 chan18 chan19 chan20 ' 'chan21 chan22 chan23 chan24 chan25 chan26 chan27 chan28 chan29 chan30 ' 'chan31 chan32 chan33 chan34 chan35 chan36 chan37 chan38 chan39 chan40 ' 'chan41 chan42 chan43 chan44 chan45 chan46 chan47 chan48 chan49 chan50 ' 'chan51 chan52 chan53 chan54 chan55 chan56 chan57 chan58 chan59 chan60 ' 'chan61 chan62 chan63 chan64 chan65 chan66 chan67 chan68 chan69 chan70 ' 'chan71 chan72 chan73 chan74 chan75 chan76 chan77 chan78 chan79 chan80 ' 'chan81 chan82 chan83 chan84 chan85 chan86 chan87 chan88 chan89 chan90 ' 'chan91 chan92 chan93 chan94 chan95 chan96)', 'AnalogSignal from nsx2', True, array(1000.) * Hz, None), 1: ('Channels: (GFpr1 GFside1 GFpr2 GFside2 LF Displ)', 'AnalogSignal from nsx2', False, array(1000.) * Hz, None), 2: ('nsx6', 'AnalogSignal from nsx6', True, array(30000.) * Hz, None)} ``` `l101210-001.nix` ``` {0: ('nsx2', 'AnalogSignal from nsx2', False, array(1000.) * Hz, None), 1: ('nsx5', 'AnalogSignal from nsx5', True, array(30000.) * Hz, None)} ``` The descriptions and names are not consistent (behavior and neural signal descriptions are the same for `i140703-001.nix`; the behavior signal names have the friendly names for one monkey, but just nsx2 for the other, etc.). Furthermore, the names and descriptions are not very telling. It would help to have something more specific, as this helps in inspecting the records. Moreover, the `file_origin`s are missing (both the attribute and the array annotations with the actual source file for the channels). These I don't know why they are missing as this seems to load if I use Reach2GraspIO. Finally, it would be good to use a consistent order for the signals: 30 KHz neural signal, behavior, 1 KHz neural signal (as this one is available only for one monkey. A downsampled version from the NS5 could also be generated, as discussed with @denker).
sprenger نظر 2 سال پیش
همكار

Hi @c.koehler! Thanks for having a look. I think many of your comments are actually addressing the ReachGraspIO itself as the nix files are just a direct transformation of what the ReachGraspIO provided. So maybe it would be good to improve the IO first and then do the Nix file generation.

The file_origin attribute is an issue on the nix side, as here I would expect the filename of the nix file to be listed.

Hi @c.koehler! Thanks for having a look. I think many of your comments are actually addressing the ReachGraspIO itself as the nix files are just a direct transformation of what the ReachGraspIO provided. So maybe it would be good to improve the IO first and then do the Nix file generation. The `file_origin` attribute is an issue on the nix side, as here I would expect the filename of the nix file to be listed.

Looking a bit deeper, these issues actually stem from the BlackrockIO, i.e., they are values copied directly from the source data files. I will work on a fix directly in ReachGraspIO by overwriting the corresponding fields with more telling names. This will lead to a change in the data also for those people using ReachGraspIO to load data once they update, but I think these changes may be worth the breaking nature.

The alternative would be to intercept in the script that converts to nix, such that only the NIX files become nicer. This however would lead to different Neo objects based on whether NIX or nsX was used for loading within the same release, which I find more troublesome than the suggested solution.

Let me know if you think otherwise.

Looking a bit deeper, these issues actually stem from the BlackrockIO, i.e., they are values copied directly from the source data files. I will work on a fix directly in ReachGraspIO by overwriting the corresponding fields with more telling names. This will lead to a change in the data also for those people using ReachGraspIO to load data once they update, but I think these changes may be worth the breaking nature. The alternative would be to intercept in the script that converts to nix, such that only the NIX files become nicer. This however would lead to different Neo objects based on whether NIX or nsX was used for loading within the same release, which I find more troublesome than the suggested solution. Let me know if you think otherwise.

I (finally) got around to working on the NIX files. I pushed two file versions (one with raw signals, one without) in addition to the script creating the files (convert_to_nix.py) and a new example_nix.py that mimicks the old example, but with the nix files.

The two nix files should contain two Blocks for the two files that are very similar, most changes I implemented in the BlackrockIO, there is only one or two minor things that are equalized in the conversion script -- mainly these are things that are already different in the original blackrock file (string names of the behavior channels).

The conversion script also creates an artificial LFP for Lilou, based on a sequence of butterworth filtering and downsampling. From the example_nix.py one can see that this is a decent fit.

I (finally) got around to working on the NIX files. I pushed two file versions (one with raw signals, one without) in addition to the script creating the files (`convert_to_nix.py`) and a new `example_nix.py` that mimicks the old example, but with the nix files. The two nix files should contain two Blocks for the two files that are very similar, most changes I implemented in the BlackrockIO, there is only one or two minor things that are equalized in the conversion script -- mainly these are things that are already different in the original blackrock file (string names of the behavior channels). The conversion script also creates an artificial LFP for Lilou, based on a sequence of butterworth filtering and downsampling. From the `example_nix.py` one can see that this is a decent fit.

This shows a comparison of data (Lilou and Nikos) in the Neo Blocks of NIX and Blackrock, as produced by the two example scripts.

This shows a comparison of data (Lilou and Nikos) in the Neo Blocks of NIX and Blackrock, as produced by the two example scripts.
sprenger نظر 1 سال پیش
همكار

Hi @denker! Thanks for continuing with this PR. The files look good, I just noticed that the examples codes we provide were not updated yet. I fixed the paths for the data_overview_*.py scripts and some issues in the reachtograspio itself, but the tests for the io are still not all passing. To keep everything consistent I would suggest to add in addition to the requirements.txt file also a lock file so we have a fix set of working package dependencies.

I suggest to postpone the rewriting of the data_overview_*.py scripts to another PR, as this one is already getting quite lengthy.

Hi @denker! Thanks for continuing with this PR. The files look good, I just noticed that the examples codes we provide were not updated yet. I fixed the paths for the `data_overview_*.py` scripts and some issues in the reachtograspio itself, but the tests for the io are still not all passing. To keep everything consistent I would suggest to add in addition to the `requirements.txt` file also a `lock` file so we have a fix set of working package dependencies. I suggest to postpone the rewriting of the `data_overview_*.py` scripts to another PR, as this one is already getting quite lengthy.

Hi @sprenger, thanks for the detailed checking. True, I completely forgot about the data overviews, thanks for fixing the paths. I agree, transforming those scripts to use nix should not enter this PR.

Hi @sprenger, thanks for the detailed checking. True, I completely forgot about the data overviews, thanks for fixing the paths. I agree, transforming those scripts to use nix should not enter this PR.

Regarding the failing test_rejection_annotations_present, I replaced this with an updated test_data_annotations_present that tests for those annotations currently used -- I think this was simply heavily outdated.

Regarding the failing `test_rejection_annotations_present`, I replaced this with an updated `test_data_annotations_present` that tests for those annotations currently used -- I think this was simply heavily outdated.

Added a fix for the unit tests regarding consecutive trial IDs. For Nikos, indeed there were a few wrongly labeled events in trials 158-161, which should have been 157-160. These events were the analog events extracted from the force signals. The fix provides a new odML and xls, where the analog events were moved "one trial back", and also the corresponding periods (like the movement period, which is obtained by subtracting two analog events) are fixed.

For Lilou, there is one rogue trial where for some reason, the object is returned while the new trial is already starting. This is not wrong, but still leads to non-consecutive trial IDs. This instance has been accounted for in the test with a special if-clause.

Added a fix for the unit tests regarding consecutive trial IDs. For Nikos, indeed there were a few wrongly labeled events in trials 158-161, which should have been 157-160. These events were the analog events extracted from the force signals. The fix provides a new odML and xls, where the analog events were moved "one trial back", and also the corresponding periods (like the movement period, which is obtained by subtracting two analog events) are fixed. For Lilou, there is one rogue trial where for some reason, the object is returned while the new trial is already starting. This is not wrong, but still leads to non-consecutive trial IDs. This instance has been accounted for in the test with a special if-clause.

Remaining TODO: update corresponding NIX file

Remaining TODO: update corresponding NIX file

@sprenger, your commit Fix r2gio annotation dtypes breaks the convert_2_nix.py script, which now errors on writing the nix file:

  File "[...]/miniconda3/envs/multielectrode_grasp_test/lib/python3.8/site-packages/nixio/section.py", line 144, in create_property
    raise TypeError("Please provide either a non empty value or a DataType.")
TypeError: Please provide either a non empty value or a DataType.

I tested this also with the latest Neo (0.12.0), and the error persists. I have not yet tested, which of the changes in your commit leads to the problem. I was not quite sure how important those changes were, could we revert them, or do you have an idea how to fix this?

@sprenger, your commit `Fix r2gio annotation dtypes` breaks the `convert_2_nix.py` script, which now errors on writing the nix file: ``` File "[...]/miniconda3/envs/multielectrode_grasp_test/lib/python3.8/site-packages/nixio/section.py", line 144, in create_property raise TypeError("Please provide either a non empty value or a DataType.") TypeError: Please provide either a non empty value or a DataType. ``` I tested this also with the latest Neo (0.12.0), and the error persists. I have not yet tested, which of the changes in your commit leads to the problem. I was not quite sure how important those changes were, could we revert them, or do you have an idea how to fix this?

Appart from this, I suppose we could update the requirements to

neo>=0.9.0, <0.13
elephant>=0.9.0, <0.13

?

Also, in this context I was not quite sure what you menant by a "lock file".

Appart from this, I suppose we could update the requirements to ``` neo>=0.9.0, <0.13 elephant>=0.9.0, <0.13 ``` ? Also, in this context I was not quite sure what you menant by a "lock file".

Hi @sprenger, I added a fix that inverts your last change: Instead of making all channel_id and channel_ids (array) annotations of dtype str (which is what BlackrockIO outputs for analogsignals), it converts all channel_id and channel_ids (array) annotations to dtype int (which is what BlackrockIO outputs for Spiketrains).

This makes more sense to me given that channel ids are numeric, and also it solves issues saving the nix files. In either way, I fully support your aim to make these annotations consistent. Please let me know if you disagree with this latest change.

Hi @sprenger, I added a fix that inverts your last change: Instead of making all `channel_id` and `channel_ids` (array) annotations of dtype `str` (which is what `BlackrockIO` outputs for analogsignals), it converts all `channel_id` and `channel_ids` (array) annotations to dtype `int` (which is what `BlackrockIO` outputs for Spiketrains). This makes more sense to me given that channel ids are numeric, and also it solves issues saving the nix files. In either way, I fully support your aim to make these annotations consistent. Please let me know if you disagree with this latest change.

@sprenger I would suggest to merge this PR soon. From our side it looks very good and has also been tested in a small analysis scenario. Would you be ok with this?

@sprenger I would suggest to merge this PR soon. From our side it looks very good and has also been tested in a small analysis scenario. Would you be ok with this?
sprenger نظر 9 ماه پیش
همكار

Hi @denker! Sorry for the delay again.

Regarding the type of the channel_id, handling those as str is more generic than int, so latest when using the nixrawio those will be presented as str. But for now within this project the data types are consistent, so we can keep it like this.

Regarding the lock file I was thinking about a file that gives exact version numbers with which this code worked instead of providing only ranges or lower borders. See also e.g. poetry lock files or conda lock files I added some pipenv lock files as these are closest to pip and don't add a lot of poetry / conda overhead.

I will merge this now as it has been already way too long.

Hi @denker! Sorry for the delay again. Regarding the type of the channel_id, handling those as str is more generic than int, so latest when using the nixrawio those will be presented as str. But for now within this project the data types are consistent, so we can keep it like this. Regarding the `lock` file I was thinking about a file that gives exact version numbers with which this code worked instead of providing only ranges or lower borders. See also e.g. [poetry lock files](https://realpython.com/dependency-management-python-poetry/#pin-dependencies-in-poetrylock) or [conda lock files](https://github.com/conda/conda-lock) I added some pipenv lock files as these are closest to pip and don't add a lot of poetry / conda overhead. I will merge this now as it has been already way too long.
sprenger 9 ماه پیش بسته شد

Ok, great!

Ok, great!
درخواست pull request با موفقیت ادغام شد!
برای پیوستن به گفتگو، وارد شودید.
بدون نقطه عطف
بدون مسئول رسیدگی
3 مشارکت کننده
درحال بارگذاری...
لغو
ذخيره
هنوز محتوایی ایجاد نشده.