The issue presents as follows:
At the moment, pushing a DataLad dataset/git annex repo causes a severance of the git-annex branch, and complete divergence of my local and the remote git-annex branch on Gin. This happens with datasets I previously pushed successfully (small datasets I often use for demonstrations or ad-hoc testing).
An example is this dataset (you might see different gin repos in the errors below as I tried to pin this down to parametrization or operating system, but the errors were identical over different scenarios). Its originally from https://github.com/datalad-datasets/machinelearning-books, and contains PDFs that have a web special remote registered (i.e., files came from a git annex addurl call). If I add a new gin repository as a remote, and push it using datalad push, the push succeeds for the default branch, but fails with a non-fast-forward error for the git-annex branch, similar to the one below:
* refs/heads/master:refs/heads/master [new branch]
! refs/heads/git-annex:refs/heads/git-annex [rejected] (non-fast-forward)
Done'] [err: 'Delta compression using up to 16 threads
Total 422 (delta 198), reused 149 (delta 33), pack-reused 0 error: failed to push some refs to 'gin.g-node.org:/adswa/ml-books-only-ssh.git'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.']
Investigating the remote git-annex branch on Gin shows that the git-annex branch has been re-created from scratch (it seems), by a committer ID called "Gogs": https://gin.g-node.org/adswa/mlbooksmoretests/src/git-annex.
The local git-annex branch shows commits indicating that the branch was rewritten or otherwise vastly changed:
❱ git pull gin git-annex
From https://gin.g-node.org/adswa/mlbooksmoretests
* branch git-annex -> FETCH_HEAD
fatal: refusing to merge unrelated histories
And annexed data that should be readily available from the web special remote can't be retrieved after cloning the repository.
(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master
❱ git-annex whereis A.Shashua-Introduction_to_Machine_Learning.pdf 1 !
whereis A.Shashua-Introduction_to_Machine_Learning.pdf (0 copies) failed
whereis: 1 failed
(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master
❱ git annex get A.Shashua-Introduction_to_Machine_Learning.pdf 130 !
get A.Shashua-Introduction_to_Machine_Learning.pdf (not available)
No other repository is known to contain the file.
failed
get: 1 failed
(gooey) adina@mun
I have seen this on Linux and Windows-based operating systems with different versions of git-annex, using DataLad but also only git push and git annex sync commands. I also reproduced this with several datasets I previously pushed successfully, with data available from web special remotes, other types of special remotes, or purely local availability. Can you advise what might be wrong?
Hi! First and foremost a huge thank you for Gin! It is an immeasurably useful infrastructure for science.
I've recently noticed what I presume to be a corruption of the git-annex branch after pushing to Gin, and reported it originally at https://github.com/datalad/datalad-gooey/issues/349.
The issue presents as follows:
At the moment, pushing a DataLad dataset/git annex repo causes a severance of the git-annex branch, and complete divergence of my local and the remote git-annex branch on Gin. This happens with datasets I previously pushed successfully (small datasets I often use for demonstrations or ad-hoc testing).
An example is [this dataset](https://gin.g-node.org/adswa/mlbooksmoretests) (you might see different gin repos in the errors below as I tried to pin this down to parametrization or operating system, but the errors were identical over different scenarios). Its originally from https://github.com/datalad-datasets/machinelearning-books, and contains PDFs that have a web special remote registered (i.e., files came from a `git annex addurl` call). If I add a new gin repository as a remote, and push it using ``datalad push``, the push succeeds for the default branch, but fails with a non-fast-forward error for the ``git-annex`` branch, similar to the one below:
```
* refs/heads/master:refs/heads/master [new branch]
! refs/heads/git-annex:refs/heads/git-annex [rejected] (non-fast-forward)
Done'] [err: 'Delta compression using up to 16 threads
Total 422 (delta 198), reused 149 (delta 33), pack-reused 0 error: failed to push some refs to 'gin.g-node.org:/adswa/ml-books-only-ssh.git'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.']
```
Investigating the remote git-annex branch on Gin shows that the git-annex branch has been re-created from scratch (it seems), by a committer ID called "Gogs": https://gin.g-node.org/adswa/mlbooksmoretests/src/git-annex.
The local git-annex branch shows commits indicating that the branch was rewritten or otherwise vastly changed:
```
(gooyey) C:\Users\adina\Desktop\ml-books2>git log git-annex
commit 4e226892a69de8989b56cef5f41c49f138aee09e (git-annex)
Author: Adina Wagner <adina.wagner@t-online.de>
Date: Fri Oct 14 09:22:57 2022 +0200
continuing transition ["forget git history"]
commit 38be5a7d07b019e2a7e42c8dff0734926c276f7d
Author: Adina Wagner <adina.wagner@t-online.de>
Date: Fri Oct 14 09:17:56 2022 +0200
update
commit 72cd967f9648209aab5c55aebf5b60f1aea41099 (origin/git-annex)
Author: Adina Wagner <adina.wagner@t-online.de>
Date: Tue Apr 19 13:29:07 2022 +0200
update
```
A manual pull fails locally:
```
❱ git pull gin git-annex
From https://gin.g-node.org/adswa/mlbooksmoretests
* branch git-annex -> FETCH_HEAD
fatal: refusing to merge unrelated histories
```
And annexed data that should be readily available from the web special remote can't be retrieved after cloning the repository.
```
(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master
❱ git-annex whereis A.Shashua-Introduction_to_Machine_Learning.pdf 1 !
whereis A.Shashua-Introduction_to_Machine_Learning.pdf (0 copies) failed
whereis: 1 failed
(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master
❱ git annex get A.Shashua-Introduction_to_Machine_Learning.pdf 130 !
get A.Shashua-Introduction_to_Machine_Learning.pdf (not available)
No other repository is known to contain the file.
failed
get: 1 failed
(gooey) adina@mun
```
I have seen this on Linux and Windows-based operating systems with different versions of git-annex, using DataLad but also only git push and git annex sync commands. I also reproduced this with several datasets I previously pushed successfully, with data available from web special remotes, other types of special remotes, or purely local availability. Can you advise what might be wrong?
Hi! First and foremost a huge thank you for Gin! It is an immeasurably useful infrastructure for science.
I've recently noticed what I presume to be a corruption of the git-annex branch after pushing to Gin, and reported it originally at https://github.com/datalad/datalad-gooey/issues/349.
The issue presents as follows: At the moment, pushing a DataLad dataset/git annex repo causes a severance of the git-annex branch, and complete divergence of my local and the remote git-annex branch on Gin. This happens with datasets I previously pushed successfully (small datasets I often use for demonstrations or ad-hoc testing).
An example is this dataset (you might see different gin repos in the errors below as I tried to pin this down to parametrization or operating system, but the errors were identical over different scenarios). Its originally from https://github.com/datalad-datasets/machinelearning-books, and contains PDFs that have a web special remote registered (i.e., files came from a
git annex addurl
call). If I add a new gin repository as a remote, and push it usingdatalad push
, the push succeeds for the default branch, but fails with a non-fast-forward error for thegit-annex
branch, similar to the one below:Investigating the remote git-annex branch on Gin shows that the git-annex branch has been re-created from scratch (it seems), by a committer ID called "Gogs": https://gin.g-node.org/adswa/mlbooksmoretests/src/git-annex. The local git-annex branch shows commits indicating that the branch was rewritten or otherwise vastly changed:
A manual pull fails locally:
And annexed data that should be readily available from the web special remote can't be retrieved after cloning the repository.
I have seen this on Linux and Windows-based operating systems with different versions of git-annex, using DataLad but also only git push and git annex sync commands. I also reproduced this with several datasets I previously pushed successfully, with data available from web special remotes, other types of special remotes, or purely local availability. Can you advise what might be wrong?