#29 Add .gitattributes / rewrite (redo) to minimize .git/objects

닫힘
yarikoptic3 년 전을 오픈 · 25개의 코멘트

not yet sure how it would interact with gin client etc, but I see that currently .git/objects is about 300MB which is the reason for a slow clone. It is because many binary files are added directly to git instead of git-annex.

To simplify/automate decision making between what files to go to git and what to git-annex, annotation in .gitattributes could be used, e.g. datalad create -c text2git ... would create a new git/git-annex repository with the following "generic" rule coded in .gitattributes:

* annex.largefiles=((mimeencoding=binary)and(largerthan=0))

which would instruct all binary files larger than 0 bytes to go into git-annex.

Since this repository is intended to be used for a variety of testing scenarios where fetching all files now residing under .git/objects is very suboptimal, I would advise to "rewrite" the history (or if history is not important -- just redo from current state) so that all binary files would go under git-annex. While at it also to migrate to use MD5E backend instead of default SHA256E -- those long paths could trip some Windows setups with a limit on the path length. That is why in DataLad (because we do not really worry about "security-grade checksumming") we use MD5E backend by default.

So commands could be something like

git annex get -J5 .  # to get all data files
git annex unlock *
chmod +w -R .git  # to make next rm possible
rm -rf .git
datalad create -c text2git -f .  # could be a few of git init; git annex init; then populate .gitattributes
datalad save -m "Repopulated dataset using new backend and .gitattributes rule for what is going to annex"  # could be git annex add *; git commit -m "..."
git gc --prune=now  # make it nice and tidy

and then "force push" and "annex copy --to" (or datalad push) to transfer data back to this repo.

I have done listed above steps and got .git/objects of 50MB, still a bit too large for my liking - I guess some data formats are "text" files, so (since git-annex is used anyways) e.g. asciisignal/File_asciisignal_. So we could make it all go to annex by default and only specific files to git, so following recipe could be used instead

git annex get -J5 .  # to get all data files
git annex unlock *
chmod +w -R .git  # to make next rm possible
rm -rf .git
datalad create -f .
echo '
LICENSE* annex.largefiles=nothing
README* annex.largefiles=nothing
config* annex.largefiles=nothing' >> .gitattributes
datalad save -m "Repopulated dataset using new backend and .gitattributes rule for what is going to annex"
git gc --prune=now

this ends up with 160kB .git/objects and all but useful text files under git.

Let me know if you need further assistance if you decide to go this route.

PS We will probably look into mirroring this data also in our DANDI bucket for faster access for testing etc

not yet sure how it would interact with `gin` client etc, but I see that currently `.git/objects` is about 300MB which is the reason for a slow clone. It is because many binary files are added directly to git instead of git-annex. To simplify/automate decision making between what files to go to git and what to git-annex, annotation in `.gitattributes` could be used, e.g. `datalad create -c text2git ...` would create a new git/git-annex repository with the following "generic" rule coded in .gitattributes: ``` * annex.largefiles=((mimeencoding=binary)and(largerthan=0)) ``` which would instruct all binary files larger than 0 bytes to go into git-annex. Since this repository is intended to be used for a variety of testing scenarios where fetching all files now residing under .git/objects is very suboptimal, I would advise to "rewrite" the history (or if history is not important -- just redo from current state) so that all binary files would go under git-annex. While at it also to migrate to use MD5E backend instead of default SHA256E -- those long paths could trip some Windows setups with a limit on the path length. That is why in DataLad (because we do not really worry about "security-grade checksumming") we use MD5E backend by default. So commands could be something like ``` git annex get -J5 . # to get all data files git annex unlock * chmod +w -R .git # to make next rm possible rm -rf .git datalad create -c text2git -f . # could be a few of git init; git annex init; then populate .gitattributes datalad save -m "Repopulated dataset using new backend and .gitattributes rule for what is going to annex" # could be git annex add *; git commit -m "..." git gc --prune=now # make it nice and tidy ``` and then "force push" and "annex copy --to" (or `datalad push`) to transfer data back to this repo. I have done listed above steps and got `.git/objects` of 50MB, still a bit too large for my liking - I guess some data formats are "text" files, so (since git-annex is used anyways) e.g. `asciisignal/File_asciisignal_`. So we could make it all go to annex by default and only specific files to `git`, so following recipe could be used instead ``` git annex get -J5 . # to get all data files git annex unlock * chmod +w -R .git # to make next rm possible rm -rf .git datalad create -f . echo ' LICENSE* annex.largefiles=nothing README* annex.largefiles=nothing config* annex.largefiles=nothing' >> .gitattributes datalad save -m "Repopulated dataset using new backend and .gitattributes rule for what is going to annex" git gc --prune=now ``` this ends up with 160kB .git/objects and all but useful text files under git. Let me know if you need further assistance if you decide to go this route. PS We will probably look into mirroring this data also in our DANDI bucket for faster access for testing etc
Samuel Garcia 코멘트됨, 3 년 전
소유자

Julia, Mihael, Andrew. What do you think of Yarik proposal ?

Julia, Mihael, Andrew. What do you think of Yarik proposal ?
sprenger 코멘트됨, 3 년 전
소유자

Hi @yarikoptic, sorry for the late reply. This sounds like a really useful proposal. Currently the majority of files is not annexed as we had issues with annexed content not being moved from forks to this repository when merging PRs. Therefore we configured gin to basically use only git for all files in this repo (see gin configuration)

I would still prefer to use the gin-client for the interaction with gin, as I don't know what kind of issues the usage of both, datalad and gin-cli, in combination could cause. Do you have any experience with this?

Hi @yarikoptic, sorry for the late reply. This sounds like a really useful proposal. Currently the majority of files is not annexed as we had issues with annexed content not being moved from forks to this repository when merging PRs. Therefore we configured gin to basically use only git for all files in this repo (see [gin configuration](https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/config.yml)) I would still prefer to use the gin-client for the interaction with gin, as I don't know what kind of issues the usage of both, datalad and gin-cli, in combination could cause. Do you have any experience with this?
Samuel Garcia 코멘트됨, 3 년 전
소유자

@sprenger From what I anderstand the recipe here with datalad is for moving everything to the annex. The gin client will still continue to work.

@sprenger From what I anderstand the recipe here with datalad is for moving everything to the annex. The gin client will still continue to work.
sprenger 코멘트됨, 3 년 전
소유자

@samuelgarcia Yes, for setting up the repository this will probably work, unless there are version conflicts between git annex versions used by datalad and gin. However, the configuration as @yarikoptic is as far as I know only recognized by datalad and not by the gin-cli. So when working on the clean repository we might encounter inconsistencies again. We should make sure that the gin-cli git annex configuration is identical to the one used by datalad (so the git annex configuration, as specified in .gitattributes)

@samuelgarcia Yes, for setting up the repository this will probably work, unless there are version conflicts between git annex versions used by datalad and gin. However, the configuration as @yarikoptic is as far as I know only recognized by datalad and not by the gin-cli. So when working on the clean repository we might encounter inconsistencies again. We should make sure that the gin-cli git annex configuration is identical to the one used by datalad (so the git annex configuration, as specified in `.gitattributes`)
Yaroslav Halchenko 코멘트됨, 3 년 전
협업자

unless there are version conflicts between git annex versions used by datalad and gin.

what conflicts you have in mind?

However, the configuration as @yarikoptic is as far as I know only recognized by datalad and not by the gin-cli.

This configuration is recognized by git-annex itself, not datalad. AFAIK gin-cli would just upload those files to the repository and then use git-annex add so they would be recognized. But you could just use git/git-annex or datalad to manage it "properly" locally and just git push && git copy --to ... or datalad push or datalad publish to it here, and everything should be splendid.

We should make sure that the gin-cli git annex configuration is identical to the one used by datalad (so the git annex configuration, as specified in .gitattributes)

I have recommended gin to initiate such a file, but am not sure if that is happening now. @achilleas-k could shine the light.

anyways -- you could easily test everything related on some throw away repository on gin and see how it all works out for you.

> unless there are version conflicts between git annex versions used by datalad and gin. what conflicts you have in mind? > However, the configuration as @yarikoptic is as far as I know only recognized by datalad and not by the gin-cli. This configuration is recognized by git-annex itself, not datalad. AFAIK gin-cli would just upload those files to the repository and then use `git-annex add` so they would be recognized. But you could just use git/git-annex or datalad to manage it "properly" locally and just `git push && git copy --to ...` or `datalad push` or `datalad publish` to it here, and everything should be splendid. > We should make sure that the gin-cli git annex configuration is identical to the one used by datalad (so the git annex configuration, as specified in .gitattributes) I [have recommended](https://github.com/G-Node/gogs/issues/14) `gin` to initiate such a file, but am not sure if that is happening now. @achilleas-k could shine the light. anyways -- you could easily test everything related on some throw away repository on gin and see how it all works out for you.
Yaroslav Halchenko 코멘트됨, 3 년 전
협업자

BTW, here is a relevant section in the datalad handbook on interoperability with Gin: http://handbook.datalad.org/en/latest/basics/101-139-gin.html

BTW, here is a relevant section in the datalad handbook on interoperability with Gin: http://handbook.datalad.org/en/latest/basics/101-139-gin.html
Samuel Garcia 코멘트됨, 3 년 전
소유자

@yarikoptic would you be interested having direct write access on this repo ?

If I understand correctly your solution don't loose the hstory it just rewrite it, no ?

@yarikoptic would you be interested having direct write access on this repo ? If I understand correctly your solution don't loose the hstory it just rewrite it, no ?
sprenger 코멘트됨, 3 년 전
소유자

@yarikoptic Did you push the resulting repositories of your two recipes already somewhere on gin, so we could run some tests before converting the main repo?

@yarikoptic Did you push the resulting repositories of your two recipes already somewhere on gin, so we could run some tests before converting the main repo?
Yaroslav Halchenko 코멘트됨, 3 년 전
협업자

@yarikoptic would you be interested having direct write access on this repo ?

up to you ;) I am not looking for more responsibilities, but should be able to help if needed to

If I understand correctly your solution don't loose the hstory it just rewrite it, no ?

the "quick&dirty" way I proposed is loosing history. If you think it is worth effort keeping, then an alternative solution based on git filter-branch should be worked out. Some references: https://git-annex.branchable.com/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/ , https://github.com/datalad/datalad/issues/4701 (on helper we might eventually work out in datalad)

> @yarikoptic would you be interested having direct write access on this repo ? up to you ;) I am not looking for more responsibilities, but should be able to help if needed to > If I understand correctly your solution don't loose the hstory it just rewrite it, no ? the "quick&dirty" way I proposed is loosing history. If you think it is worth effort keeping, then an alternative solution based on `git filter-branch` should be worked out. Some references: https://git-annex.branchable.com/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/ , https://github.com/datalad/datalad/issues/4701 (on helper we might eventually work out in datalad)
Yaroslav Halchenko 코멘트됨, 3 년 전
협업자

@yarikoptic Did you push the resulting repositories of your two recipes already somewhere on gin, so we could run some tests before converting the main repo?

I do not think so (or I hope that I would have just shared the url ;)) -- since it has been awhile I forgot where I could have potentially done this, but I do not see it on my laptop ;)

> @yarikoptic Did you push the resulting repositories of your two recipes already somewhere on gin, so we could run some tests before converting the main repo? I do not think so (or I hope that I would have just shared the url ;)) -- since it has been awhile I forgot where I could have potentially done this, but I do not see it on my laptop ;)
sprenger 코멘트됨, 3 년 전
소유자

Hi @all, I had a look at the code using datalad and set up a comparable transformation using the gin-cli, trying to keep the gin-cli and git (annex) configuration as close as possible. The result of this transformation can currently be found here: https://gin.g-node.org/sprenger/ephy_testing_data_annexed

@yarikoptic: Is this version also compatibile with your datalad workflow? I used your suggestion and pruned the git store, however, this still seems to grow relatively quickly when uploading via gin (e.g. from hundrets of kilobyte to a couple of MB) Do you have further suggestions how to improve this situation?

Here's the rough workflow I used for the transformation:

# downloading file content
gin get NeuralEnsemble/ephy_testing_data
cd ephy_testing_data
gin get-content .
gin unlock *

# remove history
chmod +w -R .git
rm -r .git

# clean up vcs configuration
echo \
'annex:
    minsize: 0
    exclude: ["*README*", "*LICENSE*", "config*", "**/.git*"]' > config.yml

echo \
'LICENSE* annex.largefiles=nothing
README* annex.largefiles=nothing
config* annex.largefiles=nothing

* annex.backend=MD5E
**/.git* annex.largefiles=nothing' > .gitattributes

# clean up file modes
chmod -x openephys/OpenEphys_SampleData_3/*
chmod -x neuralynx/Cheetah_v6.3.2/incomplete_blocks/CSC1_reduced.ncs
chmod -x neuralynx/Cheetah_v6.3.2/incomplete_blocks/Events.nev

# clean up broken README file
rm openephys/readme.txt
echo \
'OpenEphys_SampleData_1 is provided by josh.siegle@gmail.com from open ephys project.
OpenEphys_SampleData_2_(multiple_starts) is provided by josh.siegle@gmail.com from open ephys project.
OpenEphys_SampleData_3 is provided by Cristian Tatarau. Have multi semgent and have a smaller continuous file (CH32)' > openephys/README.txt

# reinitialize history
gin init
gin commit .

gin lock *

git gc --prune=now

gin add-remote origin gin:sprenger/ephy_testing_data_annexed
gin upload .

@samuelgarcia: I would like to run some tests on the new repository, to make sure this is still working for the Neo unittests. Also we should come up with an updated contribution guideline for the new annexed-format. I proposed to use feature branches in the main repository. Can you test this in the new test repo as you don't have advanced rights there? Otherwise I can also create a test account for this purpose. What do you think?

Hi @all, I had a look at the code using datalad and set up a comparable transformation using the gin-cli, trying to keep the gin-cli and git (annex) configuration as close as possible. The result of this transformation can currently be found here: https://gin.g-node.org/sprenger/ephy_testing_data_annexed @yarikoptic: Is this version also compatibile with your datalad workflow? I used your suggestion and pruned the git store, however, this still seems to grow relatively quickly when uploading via gin (e.g. from hundrets of kilobyte to a couple of MB) Do you have further suggestions how to improve this situation? Here's the rough workflow I used for the transformation: ```sh # downloading file content gin get NeuralEnsemble/ephy_testing_data cd ephy_testing_data gin get-content . gin unlock * # remove history chmod +w -R .git rm -r .git # clean up vcs configuration echo \ 'annex: minsize: 0 exclude: ["*README*", "*LICENSE*", "config*", "**/.git*"]' > config.yml echo \ 'LICENSE* annex.largefiles=nothing README* annex.largefiles=nothing config* annex.largefiles=nothing * annex.backend=MD5E **/.git* annex.largefiles=nothing' > .gitattributes # clean up file modes chmod -x openephys/OpenEphys_SampleData_3/* chmod -x neuralynx/Cheetah_v6.3.2/incomplete_blocks/CSC1_reduced.ncs chmod -x neuralynx/Cheetah_v6.3.2/incomplete_blocks/Events.nev # clean up broken README file rm openephys/readme.txt echo \ 'OpenEphys_SampleData_1 is provided by josh.siegle@gmail.com from open ephys project. OpenEphys_SampleData_2_(multiple_starts) is provided by josh.siegle@gmail.com from open ephys project. OpenEphys_SampleData_3 is provided by Cristian Tatarau. Have multi semgent and have a smaller continuous file (CH32)' > openephys/README.txt # reinitialize history gin init gin commit . gin lock * git gc --prune=now gin add-remote origin gin:sprenger/ephy_testing_data_annexed gin upload . ``` @samuelgarcia: I would like to run some tests on the new repository, to make sure this is still working for the Neo unittests. Also we should come up with an updated contribution guideline for the new annexed-format. I proposed to use feature branches in the main repository. Can you test this in the new test repo as you don't have advanced rights there? Otherwise I can also create a test account for this purpose. What do you think?
Yaroslav Halchenko 코멘트됨, 3 년 전
협업자

, this still seems to grow relatively quickly when uploading via gin (e.g. from hundrets of kilobyte to a couple of MB)

hm... I see that you create config.yml with some annex configuration -- is that for gin? if that is so, smells like it might may be ignore .gitattributes configuration and goes for "its own" config.yml may be? Just add a sample binary file, gin upload, get a fresh clone, and see what you get for that file -- a symlink or full file committed in git -- that would answer it.

If gin ignores .gitattributes -- we might need to pester gin folks to make it "right" ;)

if it is all kosher, i.e. you do get symlink not a file committed to git, then I guess all is good, and may be eventual git gc would release some .git/objects

> , this still seems to grow relatively quickly when uploading via gin (e.g. from hundrets of kilobyte to a couple of MB) hm... I see that you create `config.yml` with some `annex` configuration -- is that for `gin`? if that is so, smells like it might may be ignore `.gitattributes` configuration and goes for "its own" `config.yml` may be? Just add a sample binary file, `gin upload`, get a fresh clone, and see what you get for that file -- a symlink or full file committed in git -- that would answer it. If `gin` ignores `.gitattributes` -- we might need to pester gin folks to make it "right" ;) if it is all kosher, i.e. you do get symlink not a file committed to git, then I guess all is good, and may be eventual `git gc` would release some `.git/objects`
Benjamin K Dichter 코멘트됨, 3 년 전
협업자

I tried to clone the repo and test this myself but I do not have the permissions:

$ gin get sprenger/ephy_testing_data_annexed
 Downloading repository Repository download failed. Internal git command returned: Cloning into 'ephy_testing_data_annexed'...
git@gin.g-node.org: Permission denied (publickey,keyboard-interactive).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
 
[error] 1 operation failed
I tried to clone the repo and test this myself but I do not have the permissions: ``` $ gin get sprenger/ephy_testing_data_annexed Downloading repository Repository download failed. Internal git command returned: Cloning into 'ephy_testing_data_annexed'... git@gin.g-node.org: Permission denied (publickey,keyboard-interactive). fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. [error] 1 operation failed ```
sprenger 코멘트됨, 3 년 전
소유자

Hi @yarikoptic. Yes, I added a config.yml as the gin-cli is ignoring the .gitattributes. So to make the behaviour consistent, try to have as identical as possible configurations in the config.yml as well as the .gitattributes. The gin-cli developers are aware that the gin-cli is ignoring the .gitattributes as it is e.g. discussed in this issue.

@bendichter: That is strange, I also get the same error, but only when not being logged in the gin-cli. Does it work for you once you did gin login?

Hi @yarikoptic. Yes, I added a `config.yml` as the gin-cli is ignoring the .gitattributes. So to make the behaviour consistent, try to have as identical as possible configurations in the `config.yml` as well as the `.gitattributes`. The gin-cli developers are aware that the gin-cli is ignoring the `.gitattributes` as it is e.g. discussed in [this](https://github.com/G-Node/gin-cli/issues/302) issue. @bendichter: That is strange, I also get the same error, but only when not being logged in the gin-cli. Does it work for you once you did `gin login`?
sprenger 코멘트됨, 3 년 전
소유자

@bendichter: Indeed, I think logging in should resolve the issue since it is not possible to clone public repos anonymously. I opened a gin-cli issue here to improve the error message for these cases.

@bendichter: Indeed, I think logging in should resolve the issue since it is not possible to clone public repos anonymously. I opened a gin-cli issue [here](https://github.com/G-Node/gin-cli/issues/304) to improve the error message for these cases.
Benjamin K Dichter 코멘트됨, 3 년 전
협업자

@sprenger thanks. The following worked

$ gin login
...
$ gin get sprenger/ephy_testing_data_annexed (this was very fast)
$ cd nix
$ ls -l # everything is small
$ datalad get nixio_fr.nix  # launches nice tqdm progress bar for download
$ ls -l # nixio_fr.nix is now big

Everything seems to work for me! @yarikoptic, anything else we should check before moving forward?

@sprenger thanks. The following worked ```bash $ gin login ... $ gin get sprenger/ephy_testing_data_annexed (this was very fast) $ cd nix $ ls -l # everything is small $ datalad get nixio_fr.nix # launches nice tqdm progress bar for download $ ls -l # nixio_fr.nix is now big ``` Everything seems to work for me! @yarikoptic, anything else we should check before moving forward?
Yaroslav Halchenko 코멘트됨, 3 년 전
협업자

Files committed to annex as "unlocked". Pros: upon annex get (or datalad get) they will appear as real files and not as symlinks. Cons: you would have two copies (thus consuming twice more space; git status would be slower since file would might need to go through git smudge filtering to decide if modified, etc) -- one in the work tree, another under .git/annex/objects. Because of that some older (e.g. as of 2019/09) versions of annex might have difficulties (not getting files), but if more recent ones used, then seems to work quite fine

$> datalad install -g -J4 https://gin.g-node.org/sprenger/ephy_testing_data_annexed
[INFO   ] Scanning for unlocked files (this may take some time)                                                                  
install(ok): /mnt/scrap/tmp/ephy_testing_data_annexed (dataset)                                                                  
Total (24 ok out of 373):   6%|███▋                                                         | 69.0M/1.15G [00:52<11:10, 1.61MB/s]
axon/File_axon_4.abf:  99%|█████████████████████████████████████████████████████████████████ | 4.30M/4.36M [00:13<00:00, 251kB/s]
axon/File_axon_4.abf:  79%|████████████████████████████████████████████████████▏             | 3.45M/4.36M [00:10<00:04, 193kB/s]
axon/File_axon_6.abf:  98%|████████████████████████████████████████████████████████████████▍ | 2.51M/2.57M [00:06<00:00, 313kB/s]
blackrock/blackrock_2_1/l101210-001-02.nev:  70%|██████████████████████████████▊             | 1.04M/1.48M [00:01<00:00, 876kB/s]
blackrock/FileSpec2.3001.ccf:  92%|█████████████████████████████████████████████████████▎    | 3.88M/4.22M [00:12<00:01, 257kB/s]
blackrock/FileSpec2.3001.mat:  86%|█████████████████████████████████████████████████▉        | 8.01M/9.30M [00:10<00:02, 557kB/s]
bci2000/eeg1_3.dat:  98%|██████████████████████████████████████████████████████████████████▋ | 2.69M/2.75M [00:07<00:00, 221kB/s]
blackrock/FileSpec2.3001.ns5:  13%|███████▋                                                  | 2.38M/18.0M [00:05<00:50, 309kB/s]
blackrock/blackrock_2_1/l101210-001.mat:  37%|██████████████████                               | 323k/877k [00:00<00:00, 645kB/s]

so, unless you would like to get away from "unlocked" mode which would by default double local storage requirement (there is a "thin" mode which could be of help), into regular git-annex mode where each file is a symlink (and committed as such, pointing to .git/annex/objects/.../KEY/KEY and not a "git link" pointing to /annex/objects/KEY) -- then all set I guess

Files committed to annex as "unlocked". Pros: upon `annex get` (or `datalad get`) they will appear as real files and not as symlinks. Cons: you would have two copies (thus consuming twice more space; git status would be slower since file would might need to go through git smudge filtering to decide if modified, etc) -- one in the work tree, another under `.git/annex/objects`. Because of that some older (e.g. as of 2019/09) versions of annex might have difficulties (not getting files), but if more recent ones used, then seems to work quite fine ```shell $> datalad install -g -J4 https://gin.g-node.org/sprenger/ephy_testing_data_annexed [INFO ] Scanning for unlocked files (this may take some time) install(ok): /mnt/scrap/tmp/ephy_testing_data_annexed (dataset) Total (24 ok out of 373): 6%|███▋ | 69.0M/1.15G [00:52<11:10, 1.61MB/s] axon/File_axon_4.abf: 99%|█████████████████████████████████████████████████████████████████ | 4.30M/4.36M [00:13<00:00, 251kB/s] axon/File_axon_4.abf: 79%|████████████████████████████████████████████████████▏ | 3.45M/4.36M [00:10<00:04, 193kB/s] axon/File_axon_6.abf: 98%|████████████████████████████████████████████████████████████████▍ | 2.51M/2.57M [00:06<00:00, 313kB/s] blackrock/blackrock_2_1/l101210-001-02.nev: 70%|██████████████████████████████▊ | 1.04M/1.48M [00:01<00:00, 876kB/s] blackrock/FileSpec2.3001.ccf: 92%|█████████████████████████████████████████████████████▎ | 3.88M/4.22M [00:12<00:01, 257kB/s] blackrock/FileSpec2.3001.mat: 86%|█████████████████████████████████████████████████▉ | 8.01M/9.30M [00:10<00:02, 557kB/s] bci2000/eeg1_3.dat: 98%|██████████████████████████████████████████████████████████████████▋ | 2.69M/2.75M [00:07<00:00, 221kB/s] blackrock/FileSpec2.3001.ns5: 13%|███████▋ | 2.38M/18.0M [00:05<00:50, 309kB/s] blackrock/blackrock_2_1/l101210-001.mat: 37%|██████████████████ | 323k/877k [00:00<00:00, 645kB/s] ``` so, unless you would like to get away from "unlocked" mode which would by default double local storage requirement (there is a "thin" mode which could be of help), into regular git-annex mode where each file is a symlink (and committed as such, pointing to `.git/annex/objects/.../KEY/KEY` and not a "git link" pointing to `/annex/objects/KEY`) -- then all set I guess
Benjamin K Dichter 코멘트됨, 3 년 전
협업자

While this is much improved, it sounds to me like having one copy of large files would be even better.

@yarikoptic if using git-annex in this "locked" mode, what would be the most convenient way to check if a file has been downloaded?

@sprenger do you know if GIN supports "locked" mode?

While this is much improved, it sounds to me like having one copy of large files would be even better. @yarikoptic if using git-annex in this "locked" mode, what would be the most convenient way to check if a file has been downloaded? @sprenger do you know if GIN supports "locked" mode?
sprenger 코멘트됨, 3 년 전
소유자

@bendichter Yes, gin supports locking. I wasn't aware of this issue. Thanks @yarikoptic for pointing this out. I updated the script above and added a commit in the test repository that locks all files. Can you confirm that it behaves as you expect?

@bendichter Yes, gin supports locking. I wasn't aware of this issue. Thanks @yarikoptic for pointing this out. I updated the script above and added a commit in the [test repository](https://gin.g-node.org/sprenger/ephy_testing_data_annexed) that locks all files. Can you confirm that it behaves as you expect?
Yaroslav Halchenko 코멘트됨, 3 년 전
협업자

seems to be good to me.

seems to be good to me.
Benjamin K Dichter 코멘트됨, 3 년 전
협업자

OK, so this works with:

$ gin login
...
$ gin get sprenger/ephy_testing_data_annexed
 Downloading repository OK 
 Initialising local storage OK 
$ cd ephy_testing_data_annexed
$ datalad get nix/nixio_fr.nix
get(ok): nix/nixio_fr.nix (file) [from origin...]   
$ ls -lL nix
total 6272
-r--r--r--  1 bendichter  staff  2992692 Jan 13 09:53 nixio_fr.nix

Seems good to me! @sprenger we'd like to do something similar with ophys_testing_data. What settings did you change to accomplish this?

OK, so this works with: ```bash $ gin login ... $ gin get sprenger/ephy_testing_data_annexed Downloading repository OK Initialising local storage OK $ cd ephy_testing_data_annexed $ datalad get nix/nixio_fr.nix get(ok): nix/nixio_fr.nix (file) [from origin...] $ ls -lL nix total 6272 -r--r--r-- 1 bendichter staff 2992692 Jan 13 09:53 nixio_fr.nix ``` Seems good to me! @sprenger we'd like to do something similar with [ophys_testing_data](https://gin.g-node.org/CatalystNeuro/ophys_testing_data). What settings did you change to accomplish this?
sprenger 코멘트됨, 3 년 전
소유자

Thanks for the feedback! If @samuelgarcia and @apdavison agree I would force push this to https://gin.g-node.org/NeuralEnsemble/ephy_testing_data (and of course keeping a copy of the old version locally, just for safety)

@bendichter: I add the line gin lock * in the script I posted above. This should work at any place between gin commit and gin upload

Thanks for the feedback! If @samuelgarcia and @apdavison agree I would force push this to https://gin.g-node.org/NeuralEnsemble/ephy_testing_data (and of course keeping a copy of the old version locally, just for safety) @bendichter: I add the line `gin lock *` in the script I posted above. This should work at any place between `gin commit ` and `gin upload`
Samuel Garcia 코멘트됨, 3 년 전
소유자

Hi Julia. I agree for this change. My knowledge on git-anex and layer on top like gin are very small. I totally trust your proposal and Yarik checks.

Hi Julia. I agree for this change. My knowledge on git-anex and layer on top like gin are very small. I totally trust your proposal and Yarik checks.
sprenger 코멘트됨, 3 년 전
소유자

ITS DONE! And on a first glance it looks like it worked. Let's see if it stays like this.

ITS DONE! And on a first glance it looks like it worked. Let's see if it stays like this.
sprenger 코멘트됨, 3 년 전
소유자

First issue detected: The history rewrite broke all currently open PRs. I will try to recreate these with the new repo version.

Update: Solved via #34, #35 and #37

First issue detected: The history rewrite broke all currently open PRs. I will try to recreate these with the new repo version. Update: Solved via #34, #35 and #37
로그인하여 이 대화에 참여
레이블 없음
마일스톤 없음
담당자 없음
참여자 4명
로딩중...
취소
저장
아직 콘텐츠가 없습니다.