#82 more complex spikeglx folder

Unito
sprenger ha unito 8 commit da NeuralEnsemble/spikeglx_extended a NeuralEnsemble/master 1 anno fa
Samuel Garcia ha commentato 2 anni fa

New dataset from Graham Findlay See issue #81.

See also https://github.com/SpikeInterface/spikeinterface/issues/628

This is needed for a patch in neo.

New dataset from Graham Findlay See issue #81. See also https://github.com/SpikeInterface/spikeinterface/issues/628 This is needed for a patch in neo.
sprenger ha commentato 2 anni fa
Proprietario

Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are:

  • spikeglx/README.txt: the sentence * have a more in the sub folder complete README.md needs to be completed
  • if spikeglx has an internal format versioning it would be good to use this as main folder label instead of sample_data_v2 (or maybe the spikeglx version v.20201103)
  • this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. /SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/ containing 8 .bin files. Is this duplication essential for the features you want to test or would it be possible to
    • remove some of the duplicate files
    • make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are: - `spikeglx/README.txt`: the sentence `* have a more in the sub folder complete README.md` needs to be completed - if spikeglx has an internal format versioning it would be good to use this as main folder label instead of `sample_data_v2` (or maybe the spikeglx version v.20201103) - this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. `/SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/` containing 8 `.bin` files. Is this duplication essential for the features you want to test or would it be possible to - remove some of the duplicate files - make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Samuel Garcia ha commentato 2 anni fa
Proprietario

Hi Julia, I will fix the naming and readme.

Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks.

In neo, this will make the segment index a bit more complicated, I am woring on it.

I know that it increase the dataset but I think this is need.

@grahamfindlay: any comments ?

Hi Julia, I will fix the naming and readme. Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks. In neo, this will make the segment index a bit more complicated, I am woring on it. I know that it increase the dataset but I think this is need. @grahamfindlay: any comments ?
sprenger ha commentato 2 anni fa
Proprietario

@samuelgarcia Ok, but for the bin files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all bin files of identical size with the content of a single file.

@samuelgarcia Ok, but for the `bin` files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all `bin` files of identical size with the content of a single file.
sprenger ha commentato 2 anni fa
Proprietario

Note: I added a commit to lock the files.

Note: I added a commit to lock the files.
Samuel Garcia ha commentato 2 anni fa
Proprietario

You mean with symbolic link ?

You mean with symbolic link ?
sprenger ha commentato 2 anni fa
Proprietario

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.
Samuel Garcia ha commentato 2 anni fa
Proprietario

how we can do that in gin ?

how we can do that in gin ?
Graham Findlay ha commentato 2 anni fa

Yes, if you don't care about the content of the .bin files, it would be fine to replace their values with the content of a single file.

Caveats:

  • .meta files cannot be consolidated in this way.
  • You may care about the contents of the .bin files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments.
  • .meta files contain information like hashes for the .bin files, which will obviously no longer be accurate.
  • Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these .bin filenames point to the same underlying data, meta fields like fileTimeSecs and fileSyzeBytes may be inaccurate.
Yes, if you don't care about the content of the `.bin` files, it would be fine to replace their values with the content of a single file. Caveats: - `.meta` files cannot be consolidated in this way. - You may care about the contents of the `.bin` files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments. - `.meta` files contain information like hashes for the `.bin` files, which will obviously no longer be accurate. - Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these `.bin` filenames point to the same underlying data, meta fields like `fileTimeSecs` and `fileSyzeBytes` may be inaccurate.
sprenger ha commentato 2 anni fa
Proprietario

@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using gin-cli):

  • unlock all bin files
  • replace the content of all files with identical size by only a single version
  • commit the files again
  • lock the files again
  • upload the locked version of the files
@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using `gin-cli`): - unlock all `bin` files - replace the content of all files with identical size by only a single version - commit the files again - lock the files again - upload the locked version of the files
Samuel Garcia ha commentato 1 anno fa
Proprietario

@sprenger : can we merge this ? I already merge your PR into that branch.

@sprenger : can we merge this ? I already merge your PR into that branch.
sprenger ha commentato 1 anno fa
Proprietario

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?
Samuel Garcia ha commentato 1 anno fa
Proprietario
test seams to pass!! https://github.com/NeuralEnsemble/python-neo/pull/1125
Questo contributo è stato incluso con successo!
Sign in to join this conversation.
Nessuna etichetta
Nessuna milestone
Nessun assegnatario
3 Partecipanti
Caricamento...
Annulla
Salva
Non ci sono ancora contenuti.