FAQ

DOI questions

General GIN questions

GIN Web questions

GIN Client questions

Client setup questions

Client usage questions

General troubleshooting


DOI questions

How can I get a DOI for my data?

See details here.

How do I publish data supplementing a research paper?

For a dataset that is connected with a research paper, preprint, code repository, or other resource, you should get a DOI, and you should get it before the paper is published. The DOI registration creates a permanent record of the dataset and makes it citable.

The procedure to get a DOI is described under Obtaining a DOI. For data supplements to paper publications, the following should be considered specifically:

  1. The DOI of the dataset should be included in the paper publication. The best place to put it is usually the "Data availability" statement. Alternatively, the dataset should be mentioned in the text and the citation included in the references.
    It is important to reference the dataset by its DOI, not by the URL of the repository. While the URL may change, the DOI will always resolve to the location of the dataset.
  2. The DOI of the paper should be included in the repository. This is done by adding a reference to the paper in the publication metadata file (datacite.yml). If you do not know the DOI of the paper when you register the DOI for the dataset, enter as much citation information as is available and update the information once the paper is published.

For further details see here.

How can I modify a dataset published with GIN-DOI

Once a DOI is issued for a dataset, can this dataset be modified, for example, can new data be added?

The original GIN repository that was used to create the registered dataset can be changed after it has been used for a DOI publication. However, changes in the original GIN repository do not automatically introduce changes in the published DOI dataset. If substantial changes have been made to a repository, a new version of the dataset can be published by requesting a new DOI.


General GIN questions

How can I access the data?

At GIN we believe that there should ideally be several ways to achieve the same thing. So to access the data you can:

  • Download the data file by file using the web interface
  • Download the data using git and git-annex
  • Download the data using the GIN client
  • Download the data using WebDAV
  • Download the Data as zip, tar or gin archive (big files are not included and can be fetched later using one of the methods above)

Is there a list of all GIN client commands?

Of course there is! Here!

Can I invite other collaborators who are not registered with GIN?

Yes, you can! Here is how:

  1. Navigate to the main page of your repository.
  2. Click on the Settings button in the top right-hand corner.
  3. Within your repository settings, click on the Collaboration tab on the left side.
  4. This will leave you with two options:
    • Add a new collaborator: This is for users who are registered with GIN
    • Invite a collaborator: This is for non-registered users. Enter the email address of the person you want to invite. An account will be created automatically and the user will receive an invitation email.

Can I share a repository with a collaborator or journal using a private link or access token?

At the moment, it is not possible to share a private repository with someone who does not have a GIN account. There are, however, several options you can use to grant access without requesting that others sign up themselves.

  • Create a new GIN account for the purpose of sharing your repository. You can add that account as a collaborator of your repository and provide the password to your collaborator or along with your submission to a journal. The account may be deleted afterwards if needed.
  • Using the collaboration page of your repository settings, you can automatically set up such a sharing account and send an invitation E-mail.
  • If the visibility of a repository is set to public and the Listed option is not selected, only those who know the repository link will be able to view its contents. Please be aware that this option may leave your data vulnerable, as it can not be excluded that others may gain access to the link without your knowledge.

Do I have to use git-annex?

Strictly speaking No. A better answer, however, would probably be: It depends on the size of your data!

  • If you have only small files, just use gin as a normal git hosting provider.
  • If you have big files, download our local GIN client for your operating system. Then, you can access and upload your data (unlimited file size) using the GIN command line client. Alternatively to the GIN client, you can use git and git annex directly to upload and manage large files! Be sure to check the usage notes.

Can I use GIN as a data provider for my research consortium/institute/department/working group with many collaborators and potentially a lot of terabytes of data?

Sure! However, if you need guaranteed access to a lot of free storage we expect that you cover the additional costs associated with it. Please get in touch with us about the details. Also, consider that data transfer takes its time. You can of course also set up your own GIN server in-house. We are happy to be of assistance, if necessary.


GIN Web questions

An upload via the webpage shows no progress or has stopped

Uploads through the website should work without issue, but for large, long running uploads there is always a greater risk of the connection timing out or being interrupted and subsequent uploads will need to start over. To reduce the chance of long running uploads through the web form, we recommend submitting large repositories in small chunks and low file numbers to avoid timeouts.

As an alternative to the web upload the GIN command line client GIN-cli will resume a broken upload and makes it easier to keep track of what has been uploaded and what remains to be sent, which is important when uploading multiple files in multiple directories. To this end we recommend using the GIN client for large uploads instead of the web interface.

How to delete a branch

Deleting branches through the GIN web interface is currently not supported. Deleting a branch is possible using the GIN commandline client to send the git remote branch deletion command. If you are unfamiliar with git or the gin client you can find installation instructions for the client here and a basic usage tutorial here.

To delete the branch, you will need to have the repository cloned to a local directory, then from within the repository run the following command:

gin git push -d origin branchname

where branchname is of course the name of the branch you want to delete.


GIN Client questions

GIN Client setup

GIN CLI says nothing to do but there are files missing on the server

Sometimes due to a misconfiguration or a network issue, git-annex might fail to contact the GIN server, while git commands still work. This can make the application incorrectly assume that the GIN server does not support git-annex and disable annex support for it. If this is the case, then the configuration file for the repository, which can be found at .git/config inside the local repository directory, will have a line that reads annex-ignore=true.

Git annex support can be re-enabled for the remote by removing the line. Alternatively, the following command, when run from inside the repository, will also toggle the setting to its correct value.

git config remote.origin.annex-ignore false

If your remote is not named origin, make sure to use the correct name in the command.

GIN CLI fails to upload with general error

Under certain circumstances, the GIN CLI can fail to upload (or download) data without giving clear errors about what's going on. A few common scenarios for this happening are described below.

Outdated dependencies

GIN CLI uses git and git-annex to manage repositories. These applications also require a number of common utilities such as SSH and OpenSSL. If old versions of these are already installed on the system, they may behave in different ways than how GIN CLI expects. This is more common on Windows. The GIN CLI Bundle provides all these programs, however other versions of the same programs may exist on the system as well.

If on Windows you used the set-global.bat file to make GIN CLI available everywhere, you can try using the gin-shell.bat instead. If the problem does not occur when running gin upload or gin download --content in gin-shell.bat, then the problem may be fixed permanently by editing the set-global.bat.

Open the file in a text editor and find the following line:

echo %path%|find /I "%curdir%">nul || setx path "%path%;%ginbinpath%;%gitpaths%"

Change it to the following:

echo %path%|find /I "%curdir%">nul || setx path "%ginbinpath%;%gitpaths%;%path%"

Then double click the file to fix the system path. This change will make the GIN CLI version of the programs have higher priority than the other versions.

Diagnosing other issues

Some errors don't provide clear error messages through GIN CLI, but can point towards the root of the issue if the underlying git and git-annex commands are used on their own. If you are having trouble uploading or downloading files and the above solution didn't help, the following command might provide more information:

gin annex sync

To upload and download all annexed data as well, run:

gin annex sync --content

Feel free to contact us for further assistance in one of the following ways: Open an issue on the GIN issue tracker, or send an email to gin@g-node.org.

Using the GIN client behind a proxy

GIN CLI communicates with the GIN server in two ways:

  1. HTTP(S) for API calls to perform actions such as login, creating repositories on the server, listing repositories, etc.
  2. Git over SSH for downloading data (pulling) from and uploading data (pushing) to the repository.

Each method requires separate settings for working with proxies. For git, the proxy must support SSH communication, which isn't always the case with web proxies.

Web proxy

Since GIN CLI runs on the command line, the system proxy settings, which are typically meant for web browsing, don't apply. Instead, the environment variables HTTP_PROXY and HTTPS_PROXY need to be set. The method varies based on operating system and command shell:

  • Windows:
    • Temporarily for cmd.exe: set HTTP_PROXY=proxy.host:port and set HTTPS_PROXY=proxy.host:port (where proxy.host is the address of your proxy server and port is the port).
    • Temporarily for PowerShell: $Env:HTTP_PROXY = "proxy.host:port" and $Env:HTTPS_PROXY = "proxy.host:port" (where proxy.host is the address of your proxy server and port is the port).
    • Globally and permanently: see this guide.
  • Linux and macOS:
    • Temporarily: export HTTP_PROXY=proxy.host:port and export HTTPS_PROXY=proxy.host:port (where proxy.host is the address of your proxy server and port is the port).
    • Permanently: This depends on the shell you are using. The commands export HTTP_PROXY=proxy.host:port and export HTTPS_PROXY=proxy.host:port (where proxy.host is the address of your proxy server and port is the port) should be added to your shell's startup script, e.g., ~/.bashrc for bash, ~/.zshrc for ZSH.
Git/SSH

For git to use the proxy server through SSH, the SSH configuration settings need to be edited. There is no straightforward, single configuration for setting configuring SSH through a proxy. Please consult with your lab or institution administrator for how to configure SSH to work through the proxy.


GIN Client usage

Slow upload speed

I experience slow uploads. Can you help increase the upload speed?

Because we have no control over how the data are routed outside the GIN infrastructure, there is not really anything we can do with respect to upload speed from our end. The service provides upload speeds of up to 100MiB/s depending on the connection.

Files with a specific file ending are not uploaded

I am trying to upload files to a GIN repository, but files with a specific file ending e.g. "tif" or "nii" are not uploaded.

GIN is based on git and will respect if files have been excluded from git. Check if there is a .gitignore file at the root of your repository where these files have been excluded.

How to Unannex files

I committed one file too many. How do I get the file out of the annex before uploading?

An annexed file can be removed from the annex and from gin tracking using the following command.

gin git annex unannex [path/filename]

Note that a commit is required to fully remove a file from gin tracking. Also note that if the file content is not locally available, there will be no message at all and the commit will not change the status. Make sure to gin get-content [path/filename] first, if the content is only available remotely. You can check which files have no local content by running gin ls.

Large local directory size after file deletion

I removed large files from my project that I did not need any longer. Still my directory requires too much disk space.

When files are deleted from the project, they still remain in the history. If, for example, a file got deleted by mistake in the past, you can go back and restore it.
The deleted file will always remain on the server, but if you want to free up the space locally, there are two ways to achieve this:

  • If you have not deleted a large file yet, first remove the file content from your local gin store by using the gin remove-content [large_file] command. Now you can safely delete the file without any leftover space occupied in your local history. You can still checkout an earlier commit and retrieve this file from the server.
  • If you have already deleted this file without removing the file content first, you can free up this space by locally removing your gin directory and clone it again from the server. Just make sure you commit and upload any unsaved changes before doing this.

Dropping file content removes content from more than the specified file

I dropped the file content of one file and suddenly the content of multiple files got removed!

git annex references files only once. If multiple, identical copies of a file exist at several places within the same repository, dropping the file content of one of these files will lead to dropped file content for all of these files.

A file upload has failed

A gin upload of files has failed with an unspecified message, e.g.

gin upload data_directory/*
:: Adding file changes
"data_directory/file_one.tif" failed
"data_directory/two.tif" failed

Run gin sync and check if this resolves the current upload issue. This will download changes from the remote repositories and then upload any local changes to the remotes.

If it does not help, run gin --version and check the current GIN client version number. If it is below 1.12, it might be helpful to upgrade to the latest version of the client and run the upload again. You can also try to update the git binary on the local machine.

Further check the client logfile; it can be found at the following locations depending on the operating system:

  • Windows: C:\Users\<User>\AppData\Local\g-node\gin\gin.log
  • macOS: /Users/<User>/Library/Caches/g-node/gin/gin.log
  • Linux: /home/<User>/.cache/g-node/gin/gin.log

Before checking the log, try to run the upload command again to make sure that the failure and any pertinent information is included as the last entry in the logfile.

Broken pipe upload issue

On uploading to GIN, we encounter the following error, what is the issue and how can it be resolved?

:: Uploading
Compressing OK
Connection to gin.g-node.org closed by remote host.
fatal: the remote end hung up unexpectedly
fatal: sha1 file '<stdout>' write error: Broken pipe
fatal: the remote end hung up unexpectedly
Pushing to origin failed.
git-annex: sync: 1 failed

[error] 1 operation failed

This error can occur when too many small files (each size < 10MB) with a total size of >4GiB have been committed with a single commit. Try splitting such a commit into multiple smaller ones so that the total sum size of committed files is below 4 GiB.

The reason behind this issue is that by default only files with size > 10MB are checked into git annex. Files with a smaller size are still checked into git, which does not handle many or large files nearly as well as git annex does. In the described case, git cannot handle the sum size of files any longer and will fail on upload.

Unspecified client error message "error 1 operation failed"

When trying to upload data to the GIN server, the GIN client prompts "[error] 1 operation failed". What went wrong and how do we fix it?

This error can occur under certain unusual circumstances. One is when trying to upload many small files (individual size < 10MB) with a total size of >4GiB that have been added in one single commit, which is something git does not handle well (see above). If this is the case, please try splitting the commit into a couple of commits with fewer files in each commit and try uploading again.

If this is not the case, please check the client logfile; the logfile contains more detailed information. Depending on the operating system the logfile can be found at:

  • Windows: c:\users\{user}\appdata\local\g-node\gin\gin.log
  • Linux: /home/{user}/.cache/g-node/gin
  • MacOS: /Users/<User>/Library/Caches/g-node/gin/gin.log

If the log shows an error after git annex metadata --json --key=MD5-<hash> you can try to manually upload again using the command gin annex copy --to=origin <filename> with the file that caused the issue.

Disconnect reading sideband packet upload issue

I cannot clone a repository or upload from my machine to the GIN repository. I constantly get lots of error lines ending in:

# Example repository clone
gin get myusername/tmp
Downloading repository Repository download failed. Internal git command returned: Cloning into 'tmp'...
remote: Enumerating objects: 516, done.
remote: Counting objects: 100% (516/516), done.
remote: Compressing objects: 100% (224/224), done.
client_loop: send disconnect: Broken pipeiB | 1006.00 KiB/s
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: index-pack failed
 
[error] 1 operation failed
# Example repository upload
[stderr]
fatal: There is no merge to abort (MERGE_HEAD missing).
2022/04/21 20:16:13 The following error occured:
Connection to gin.g-node.org closed by remote host.
send-pack: unexpected disconnect while reading sideband packet
fatal: sha1 file '<stdout>' write error: Broken pipe
fatal: the remote end hung up unexpectedly
Connection to gin.g-node.org closed by remote host.send-pack: unexpected disconnect while reading sideband packetfatal: sha1 file '<stdout>' write error: Broken pipefatal: the remote end hung up unexpectedly  Pushing to origin failed.
git-annex: sync: 1 failed
2022/04/21 20:16:13 Exiting with ERROR message: 1 operation failed

This error can occur when the connection cannot handle a large upload. There is no easy option to deal with this issue from the machine the error occurs on.

  • use a wired connection (LAN) instead of a WIFI connection
  • if feasible, try working from a different machine
  • if the suggestions above are not an option, try to limit the amount of data you upload in one go. This also includes adding and uploading data in chunks:
    • create a clean repository or clone an existing repository from the gin server; at this point it is important to locally start with a repository that does not have a large amount of data waiting to be uploaded.
    • add only one smaller file to this repository, commit and upload; if the upload succeeds, you can be sure that the issue is the chunk size of the upload. Increase the size of uploaded chunks until you hit the amount of data where an upload ends with the error described above and stay below this limit when uploading data from your machine.

At the core of this issue lies a problem that git cannot prepare and provide data that is supposed to be uploaded in a reasonable timeframe; the server ready to receive the content has to wait too long and closes the connection.

Check the following threads from users experiencing this or a similar issue on a multitude of git services and potential solutions


Troubleshooting

I found a bug, something is not working, or "I don't know what to do"

You can ask questions, report problems or ask for general help by opening an issue here.

Why do I see files with some strange text like "annex" or "WORM" instead of file content?

GIN is using git-annex to manage large files. Read more here. The fact that the content of these files is not shown, but instead a link to the location where it is supposed to be is shown, might mean either that

  • the real content of these files has not been uploaded by the original authors or that
  • the repo is a fork and the "real" file can be found in the "mother-repository" accessible below the repository name.
Thomas Wachtler edited this page 1 year ago