#82 more complex spikeglx folder

已合併
sprenger 1 年之前 將 8 次代碼提交從 NeuralEnsemble/spikeglx_extended合併至 NeuralEnsemble/master

New dataset from Graham Findlay See issue #81.

See also https://github.com/SpikeInterface/spikeinterface/issues/628

This is needed for a patch in neo.

New dataset from Graham Findlay See issue #81. See also https://github.com/SpikeInterface/spikeinterface/issues/628 This is needed for a patch in neo.
sprenger commented 2 年之前
所有者

Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are:

  • spikeglx/README.txt: the sentence * have a more in the sub folder complete README.md needs to be completed
  • if spikeglx has an internal format versioning it would be good to use this as main folder label instead of sample_data_v2 (or maybe the spikeglx version v.20201103)
  • this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. /SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/ containing 8 .bin files. Is this duplication essential for the features you want to test or would it be possible to
    • remove some of the duplicate files
    • make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are: - `spikeglx/README.txt`: the sentence `* have a more in the sub folder complete README.md` needs to be completed - if spikeglx has an internal format versioning it would be good to use this as main folder label instead of `sample_data_v2` (or maybe the spikeglx version v.20201103) - this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. `/SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/` containing 8 `.bin` files. Is this duplication essential for the features you want to test or would it be possible to - remove some of the duplicate files - make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Samuel Garcia commented 2 年之前
所有者

Hi Julia, I will fix the naming and readme.

Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks.

In neo, this will make the segment index a bit more complicated, I am woring on it.

I know that it increase the dataset but I think this is need.

@grahamfindlay: any comments ?

Hi Julia, I will fix the naming and readme. Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks. In neo, this will make the segment index a bit more complicated, I am woring on it. I know that it increase the dataset but I think this is need. @grahamfindlay: any comments ?
sprenger commented 2 年之前
所有者

@samuelgarcia Ok, but for the bin files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all bin files of identical size with the content of a single file.

@samuelgarcia Ok, but for the `bin` files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all `bin` files of identical size with the content of a single file.
sprenger commented 2 年之前
所有者

Note: I added a commit to lock the files.

Note: I added a commit to lock the files.
Samuel Garcia commented 2 年之前
所有者

You mean with symbolic link ?

You mean with symbolic link ?
sprenger commented 2 年之前
所有者

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.
Samuel Garcia commented 2 年之前
所有者

how we can do that in gin ?

how we can do that in gin ?

Yes, if you don't care about the content of the .bin files, it would be fine to replace their values with the content of a single file.

Caveats:

  • .meta files cannot be consolidated in this way.
  • You may care about the contents of the .bin files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments.
  • .meta files contain information like hashes for the .bin files, which will obviously no longer be accurate.
  • Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these .bin filenames point to the same underlying data, meta fields like fileTimeSecs and fileSyzeBytes may be inaccurate.
Yes, if you don't care about the content of the `.bin` files, it would be fine to replace their values with the content of a single file. Caveats: - `.meta` files cannot be consolidated in this way. - You may care about the contents of the `.bin` files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments. - `.meta` files contain information like hashes for the `.bin` files, which will obviously no longer be accurate. - Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these `.bin` filenames point to the same underlying data, meta fields like `fileTimeSecs` and `fileSyzeBytes` may be inaccurate.
sprenger commented 2 年之前
所有者

@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using gin-cli):

  • unlock all bin files
  • replace the content of all files with identical size by only a single version
  • commit the files again
  • lock the files again
  • upload the locked version of the files
@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using `gin-cli`): - unlock all `bin` files - replace the content of all files with identical size by only a single version - commit the files again - lock the files again - upload the locked version of the files
Samuel Garcia commented 1 年之前
所有者

@sprenger : can we merge this ? I already merge your PR into that branch.

@sprenger : can we merge this ? I already merge your PR into that branch.
sprenger commented 1 年之前
所有者

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?
Samuel Garcia commented 1 年之前
所有者
test seams to pass!! https://github.com/NeuralEnsemble/python-neo/pull/1125
該合併請求已經成功合併!
Sign in to join this conversation.
未選擇標籤
未選擇里程碑
未指派成員
3 參與者
正在加載...
取消
保存
尚未有任何內容