#78 reduce filesize of large test files

Merged
samuelgarcia merged 30 commits from NeuralEnsemble/reduce_filesize into NeuralEnsemble/master 1 year ago
sprenger commented 2 years ago

This PR reduces the size of the files stored in git-annex by reducing the recording time of large files. This concerns the following formats and files:

  • Neuralynx/Cheetah_v5.6.3/original_data/*.nvt
  • Neuralynx/Cheetah_v5.6.3/plain_data/CSC*.txt
  • Neuralynx/Cheetah_v5.7.4/plain_data/CSC*.txt
  • Neuralynx/Cheetah_v5.5.1/plain_data/Tet*.txt
  • neuroshare/Multichannel_fil_1.mcd
  • tdt/aep_05/Block-1/aep_05_Block-1.tev
  • tdt/aep_05/Block-1/aep_05_Block-1.tsq
  • tdt/aep_05/Block-2/aep_05_Block-2.tev
  • tdt/aep_05/Block-2/aep_05_Block-2.tsq
  • spikeglx/Noise4Sam_g0/Noise4Sam_g0_imec0/Noise4Sam_g0_t0.imec0.ap.bin
  • neuroscope/test1/test1.dat (see https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/80)

Update

The current changes reduce the repo size from 1.3Gb to 1.1Gb More potential files for shortening

  • asciisignal/File_asciisignal_1.txt
  • asciisignal/File_asciisignal_2.txt
  • asciisignal/File_asciisignal_3.txt
  • axona/axona_raw.bin
  • axona/dataset_multi_modal/axona_sample.bin
  • axona/dataset_multi_modal/axona_sample.egf{,2,3,4}
  • blackrock/FileSpec2.3001.ns5
  • blackrock/FileSpec2.3001.mat
  • blackrock/blackrock_2_1/l101210-001.ns5
  • blackrock/blackrock_2_1/l101210-001_nev-02_ns5.mat
  • elan/File_elan_1.eeg
  • mearec/mearec_test_10s.h5
  • phy/phy_example_0/temp_wh.dat
  • spike2/Two-mice-bigfile-test000.smr
  • tridesclous/tdc_example0/channel_group_0/segment_0/processed_signals.raw

Update

On the Neo side this PR depends on https://github.com/NeuralEnsemble/python-neo/pull/1122

This PR reduces the size of the files stored in git-annex by reducing the recording time of large files. This concerns the following formats and files: - [x] Neuralynx/Cheetah_v5.6.3/original_data/*.nvt - [x] Neuralynx/Cheetah_v5.6.3/plain_data/CSC*.txt - [x] Neuralynx/Cheetah_v5.7.4/plain_data/CSC*.txt - [x] Neuralynx/Cheetah_v5.5.1/plain_data/Tet*.txt - [x] neuroshare/Multichannel_fil_1.mcd - [ ] tdt/aep_05/Block-1/aep_05_Block-1.tev - [ ] tdt/aep_05/Block-1/aep_05_Block-1.tsq - [ ] tdt/aep_05/Block-2/aep_05_Block-2.tev - [ ] tdt/aep_05/Block-2/aep_05_Block-2.tsq - [x] spikeglx/Noise4Sam_g0/Noise4Sam_g0_imec0/Noise4Sam_g0_t0.imec0.ap.bin - [x] neuroscope/test1/test1.dat (see https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/80) #### Update The current changes reduce the repo size from 1.3Gb to 1.1Gb More potential files for shortening - [x] asciisignal/File_asciisignal_1.txt - [x] asciisignal/File_asciisignal_2.txt - [x] asciisignal/File_asciisignal_3.txt - [x] axona/axona_raw.bin - [ ] axona/dataset_multi_modal/axona_sample.bin - [ ] axona/dataset_multi_modal/axona_sample.egf{,2,3,4} - [ ] blackrock/FileSpec2.3001.ns5 - [ ] blackrock/FileSpec2.3001.mat - [ ] blackrock/blackrock_2_1/l101210-001.ns5 - [ ] blackrock/blackrock_2_1/l101210-001_nev-02_ns5.mat - [ ] elan/File_elan_1.eeg - [ ] mearec/mearec_test_10s.h5 - [ ] phy/phy_example_0/temp_wh.dat - [ ] spike2/Two-mice-bigfile-test000.smr - [ ] tridesclous/tdc_example0/channel_group_0/segment_0/processed_signals.raw #### Update On the Neo side this PR depends on https://github.com/NeuralEnsemble/python-neo/pull/1122
Cody Baker commented 2 years ago
Collaborator

@sprenger Can you also add https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/neuroscope/test1/test1.dat to the list? 37 MB instead of < 10MB, super simple binary blob format and the # of frames is not encoded in the header .xml

@sprenger Can you also add https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/neuroscope/test1/test1.dat to the list? 37 MB instead of < 10MB, super simple binary blob format and the # of frames is not encoded in the header .xml
sprenger commented 2 years ago
Owner

Progress update: The neuroshare and tdt files can not be easily shortened, as the first ones has no specifications and is read via a dll and the second ones has an interdependency between tsq and tev files. I will try to replace these with smaller files from other datasets.

Update: Neuroshare has been replaced by another file with much smaller size.

Progress update: The neuroshare and tdt files can not be easily shortened, as the first ones has no specifications and is read via a dll and the second ones has an interdependency between `tsq` and `tev` files. I will try to replace these with smaller files from other datasets. Update: Neuroshare has been replaced by another file with much smaller size.
sprenger commented 1 year ago
Owner

@samuelgarcia, I moved the remaining files to be reduced to a new issue (#83), so we can merge this PR now. Currently this reduces the size of the repository to about 1.3GB to 900MB.

I ran the Neo testsuite locally with the new set of files and everything seems to work when using https://github.com/NeuralEnsemble/python-neo/pull/1122. Could you merge https://github.com/NeuralEnsemble/python-neo/pull/1122 and then this PR here?

@samuelgarcia, I moved the remaining files to be reduced to a new issue (https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/issues/83), so we can merge this PR now. Currently this reduces the size of the repository to about 1.3GB to 900MB. I ran the Neo testsuite locally with the new set of files and everything seems to work when using https://github.com/NeuralEnsemble/python-neo/pull/1122. Could you merge https://github.com/NeuralEnsemble/python-neo/pull/1122 and then this PR here?
sprenger commented 1 year ago
Owner

Hi @samuelgarcia, thanks for merging the linked neo PR. Can you review and merge this one next? Not all files in the list have been reduced, but I think we should merge this rather sooner than later. All other files have been listed in (#83).

Hi @samuelgarcia, thanks for merging the linked neo PR. Can you review and merge this one next? Not all files in the list have been reduced, but I think we should merge this rather sooner than later. All other files have been listed in (#83).
This pull request has been merged successfully!
Sign in to join this conversation.
No Label
No Milestone
No assignee
2 Participants
Loading...
Cancel
Save
There is no content yet.