#82 more complex spikeglx folder

Fusionné
sprenger a fusionné 8 commits à partir de NeuralEnsemble/spikeglx_extended vers NeuralEnsemble/master il y a 1 an

New dataset from Graham Findlay See issue #81.

See also https://github.com/SpikeInterface/spikeinterface/issues/628

This is needed for a patch in neo.

New dataset from Graham Findlay See issue #81. See also https://github.com/SpikeInterface/spikeinterface/issues/628 This is needed for a patch in neo.
sprenger a commenté il y a 2 ans
Propriétaire

Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are:

  • spikeglx/README.txt: the sentence * have a more in the sub folder complete README.md needs to be completed
  • if spikeglx has an internal format versioning it would be good to use this as main folder label instead of sample_data_v2 (or maybe the spikeglx version v.20201103)
  • this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. /SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/ containing 8 .bin files. Is this duplication essential for the features you want to test or would it be possible to
    • remove some of the duplicate files
    • make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are: - `spikeglx/README.txt`: the sentence `* have a more in the sub folder complete README.md` needs to be completed - if spikeglx has an internal format versioning it would be good to use this as main folder label instead of `sample_data_v2` (or maybe the spikeglx version v.20201103) - this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. `/SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/` containing 8 `.bin` files. Is this duplication essential for the features you want to test or would it be possible to - remove some of the duplicate files - make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Samuel Garcia a commenté il y a 2 ans
Propriétaire

Hi Julia, I will fix the naming and readme.

Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks.

In neo, this will make the segment index a bit more complicated, I am woring on it.

I know that it increase the dataset but I think this is need.

@grahamfindlay: any comments ?

Hi Julia, I will fix the naming and readme. Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks. In neo, this will make the segment index a bit more complicated, I am woring on it. I know that it increase the dataset but I think this is need. @grahamfindlay: any comments ?
sprenger a commenté il y a 2 ans
Propriétaire

@samuelgarcia Ok, but for the bin files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all bin files of identical size with the content of a single file.

@samuelgarcia Ok, but for the `bin` files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all `bin` files of identical size with the content of a single file.
sprenger a commenté il y a 2 ans
Propriétaire

Note: I added a commit to lock the files.

Note: I added a commit to lock the files.
Samuel Garcia a commenté il y a 2 ans
Propriétaire

You mean with symbolic link ?

You mean with symbolic link ?
sprenger a commenté il y a 1 an
Propriétaire

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.
Samuel Garcia a commenté il y a 1 an
Propriétaire

how we can do that in gin ?

how we can do that in gin ?

Yes, if you don't care about the content of the .bin files, it would be fine to replace their values with the content of a single file.

Caveats:

  • .meta files cannot be consolidated in this way.
  • You may care about the contents of the .bin files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments.
  • .meta files contain information like hashes for the .bin files, which will obviously no longer be accurate.
  • Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these .bin filenames point to the same underlying data, meta fields like fileTimeSecs and fileSyzeBytes may be inaccurate.
Yes, if you don't care about the content of the `.bin` files, it would be fine to replace their values with the content of a single file. Caveats: - `.meta` files cannot be consolidated in this way. - You may care about the contents of the `.bin` files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments. - `.meta` files contain information like hashes for the `.bin` files, which will obviously no longer be accurate. - Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these `.bin` filenames point to the same underlying data, meta fields like `fileTimeSecs` and `fileSyzeBytes` may be inaccurate.
sprenger a commenté il y a 1 an
Propriétaire

@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using gin-cli):

  • unlock all bin files
  • replace the content of all files with identical size by only a single version
  • commit the files again
  • lock the files again
  • upload the locked version of the files
@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using `gin-cli`): - unlock all `bin` files - replace the content of all files with identical size by only a single version - commit the files again - lock the files again - upload the locked version of the files
Samuel Garcia a commenté il y a 1 an
Propriétaire

@sprenger : can we merge this ? I already merge your PR into that branch.

@sprenger : can we merge this ? I already merge your PR into that branch.
sprenger a commenté il y a 1 an
Propriétaire

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?
Samuel Garcia a commenté il y a 1 an
Propriétaire
test seams to pass!! https://github.com/NeuralEnsemble/python-neo/pull/1125
Cette Pull Request a été fusionnée avec succès !
Connectez-vous pour rejoindre cette conversation.
Pas d'étiquette
Aucun jalon
Pas d'assignataire
3 Participants
Chargement…
Annuler
Enregistrer
Il n'existe pas encore de contenu.