#13 Question for the FAQ

Open
opened 5 years ago by jcolomb · 7 comments
  1. Why Gin and not GitHub/GitLab?

  2. What about metadata, is there a standard to follow (on top of datacite metadata), can one implement new ones?

  3. Can one use GIN with Rstudio (via ssh?) ? (answer is Yes. give a step by step intro?)

  4. what happens to doi if new data is pushed to the repository ?

1. Why Gin and not GitHub/GitLab? 2. What about metadata, is there a standard to follow (on top of datacite metadata), can one implement new ones? 3. Can one use GIN with Rstudio (via ssh?) ? (answer is Yes. give a step by step intro?) 4. what happens to doi if new data is pushed to the repository ?
Achilleas Koutsou commented 5 years ago
Owner

Hi Julien. Thanks for the feedback, these are great questions.

Just to clarify, are these questions to be added to the FAQ or are you seeking answers to them directly?

Regardless, here are some quick answers:

  1. The simple answer here is git-annex. While GitLab supports Git LFS, repositories on gitlab.com are limited to 10 GB, while GIN has no restrictions as long as it's used for research data.

  2. There's no standard enforced for metadata, any file can be checked in and uploaded. That said, the system recognises certain formats, including XML, odML, JSON, PDF, etc., such that the contents will be indexed and searchable. Furthermore, we are developing a validator microservice for validation of common metadata formats such as BIDS and odML.

  3. How exactly would you want ot use it with Rstudio? If Rstudio has support for working on remote machines via SSH, this would be outside the domain of GIN. At its core, GIN a Git & git-annex hosting service. If Rstudio has Git integration, this can be used with GIN the same way it can be used with any other Git server such as GitHub or GitLab, but I don't know if Rstudio has git-annex integration for versioning big data files. If there's something else you were thinking about, please don't hesitate to ask.

  4. I don't quite understand what you mean. Can you explain?

Hi Julien. Thanks for the feedback, these are great questions. Just to clarify, are these questions to be added to the FAQ or are you seeking answers to them directly? Regardless, here are some quick answers: 1. The simple answer here is git-annex. While GitLab supports Git LFS, repositories on gitlab.com are limited to 10 GB, while GIN has no restrictions as long as it's used for research data. 2. There's no standard enforced for metadata, any file can be checked in and uploaded. That said, the system recognises certain formats, including XML, odML, JSON, PDF, etc., such that the contents will be indexed and searchable. Furthermore, we are developing a validator microservice for validation of common metadata formats such as BIDS and odML. 3. How exactly would you want ot use it with Rstudio? If Rstudio has support for working on remote machines via SSH, this would be outside the domain of GIN. At its core, GIN a Git & git-annex hosting service. If Rstudio has Git integration, this can be used with GIN the same way it can be used with any other Git server such as GitHub or GitLab, but I don't know if Rstudio has git-annex integration for versioning big data files. If there's something else you were thinking about, please don't hesitate to ask. 4. I don't quite understand what you mean. Can you explain?
julien colomb commented 5 years ago
Poster

If it goes in the faq, i will get my answers :)

  1. When I find the time I will write a small go through, or send a link to an existing one. ssh works but it is not my usual workflow (using https. With github and gitlab normally, which is easier for me).

  2. The question is what will the doi link to: the version of the repo as it was when asked for the doi, or the latest version of the repo? Versioning and doi is an complexe issue... i think the github-zenodo solution is pretty nice (one doi linking to the latest version, each release get its doi.) of course, there is no release function in gin(yet) or gitlab...

If it goes in the faq, i will get my answers :) 3. When I find the time I will write a small go through, or send a link to an existing one. ssh works but it is not my usual workflow (using https. With github and gitlab normally, which is easier for me). 4. The question is what will the doi link to: the version of the repo as it was when asked for the doi, or the latest version of the repo? Versioning and doi is an complexe issue... i think the github-zenodo solution is pretty nice (one doi linking to the latest version, each release get its doi.) of course, there is no release function in gin(yet) or gitlab...
Achilleas Koutsou commented 5 years ago
Owner

The question is what will the doi link to: the version of the repo as it was when asked for the doi, or the latest version of the repo? Versioning and doi is an complexe issue... i think the github-zenodo solution is pretty nice (one doi linking to the latest version, each release get its doi.) of course, there is no release function in gin(yet) or gitlab...

I see. Yes that's a good point. We have a different approach to this in GIN. When a dataset is registered, the landing page for the DOI has three links:

  1. A zip archive of the registered data as it was at the time of registration.
  2. A link to a fork of the repository on GIN, owned by a user called DOI, which is never changed or updated.
  3. A link to the original repository with a warning that the state of the repository might be different from the state of the registered dataset.

See the most recent registered repository for an example: https://doid.gin.g-node.org/d315b3db0cee15869b3d9ed164f88cfa/

This way, if the owner makes changes to or even deletes deletes their original repository, the dataset is always available as a zip archive and as a browseable repository up to the point of registration.

Unfortunately, this means that the same repository cannot be registered more than once (at different revisions), however we have been considering a solution like the one you described for Zenodo which involves releases.

> The question is what will the doi link to: the version of the repo as it was when asked for the doi, or the latest version of the repo? Versioning and doi is an complexe issue... i think the github-zenodo solution is pretty nice (one doi linking to the latest version, each release get its doi.) of course, there is no release function in gin(yet) or gitlab... I see. Yes that's a good point. We have a different approach to this in GIN. When a dataset is registered, the landing page for the DOI has three links: 1. A zip archive of the registered data as it was at the time of registration. 2. A link to a fork of the repository on GIN, owned by a user called DOI, which is never changed or updated. 3. A link to the original repository with a warning that the state of the repository might be different from the state of the registered dataset. See the most recent registered repository for an example: https://doid.gin.g-node.org/d315b3db0cee15869b3d9ed164f88cfa/ This way, if the owner makes changes to or even deletes deletes their original repository, the dataset is always available as a zip archive and as a browseable repository up to the point of registration. Unfortunately, this means that the same repository cannot be registered more than once (at different revisions), however we have been considering a solution like the one you described for Zenodo which involves releases.
Achilleas Koutsou commented 5 years ago
Owner

When I find the time I will write a small go through, or send a link to an existing one. ssh works but it is not my usual workflow (using https. With github and gitlab normally, which is easier for me).

HTTPS GIT operations are supported on GIN, but there are limitations.

  • Cloning and pulling from public repositories is supported, but annexed data downloads wont work.
  • Accounts secured with two factor authentication can't use HTTPS for cloning and pulling private repositories, or pushing in general. In other words, HTTPS GIT operations that require authentication can't use 2FA.
  • Data versioned using git-annex can't be transferred to GIN over HTTPS, that always requires SSH.
> When I find the time I will write a small go through, or send a link to an existing one. ssh works but it is not my usual workflow (using https. With github and gitlab normally, which is easier for me). HTTPS GIT operations are supported on GIN, but there are limitations. - Cloning and pulling from public repositories is supported, but annexed data downloads wont work. - Accounts secured with two factor authentication can't use HTTPS for cloning and pulling private repositories, or pushing in general. In other words, HTTPS GIT operations that require authentication can't use 2FA. - Data versioned using git-annex can't be transferred to GIN over HTTPS, that always requires SSH.
julien colomb commented 5 years ago
Poster

That is really cool ! Ssh is fine, just need a walkthrough to make it easier. The doi strategy is perfectly fine for data. The zenodo approach make sense for other type of outputs.

I am very interested in the validator approach. I am using goodtables.io and just saw that the bielefeld uni has been developing a similar tool for gitlab. I do think it is a nice way to enforce standards and good practices, badges are always cool, too! See #16

That is really cool ! Ssh is fine, just need a walkthrough to make it easier. The doi strategy is perfectly fine for data. The zenodo approach make sense for other type of outputs. I am very interested in the validator approach. I am using goodtables.io and just saw that the bielefeld uni has been developing a similar tool for gitlab. I do think it is a nice way to enforce standards and good practices, badges are always cool, too! See #16
Achilleas Koutsou commented 5 years ago
Owner

Ssh is fine, just need a walkthrough to make it easier.

Good point. I'll make sure we add a section in our help pages for setting up GIN with popular dev environments and IDEs. As a quick guide, the process for adding GIN repositories should be the same as adding GitHub or GitLab for code and smaller files. For big files that should (preferably) be annexed, I'd have to look into whether RStudio supports git-annex integration, or if it has any support for defining general VCS integration.

> Ssh is fine, just need a walkthrough to make it easier. Good point. I'll make sure we add a section in our help pages for setting up GIN with popular dev environments and IDEs. As a quick guide, the process for adding GIN repositories should be the same as adding GitHub or GitLab for code and smaller files. For big files that should (preferably) be annexed, I'd have to look into whether RStudio supports git-annex integration, or if it has any support for defining general VCS integration.
julien colomb commented 4 years ago
Poster

It seems there is a straightforward way to create add ins in Rstudio to call gin update . in the shell. https://jozef.io/r101-addin-reproducibility/

I will look into that.

For the FAQ story, we are working on gathering that kind of info here: https://gin.g-node.org/larkumlab/Dealing_with_Gin, and I am putting that into a moodle course format... see for instance https://www.youtube.com/watch?v=DZuiCqz1KIw

It seems there is a straightforward way to create add ins in Rstudio to call `gin update .` in the shell. https://jozef.io/r101-addin-reproducibility/ I will look into that. For the FAQ story, we are working on gathering that kind of info here: https://gin.g-node.org/larkumlab/Dealing_with_Gin, and I am putting that into a moodle course format... see for instance https://www.youtube.com/watch?v=DZuiCqz1KIw
Sign in to join this conversation.
No Milestone
No assignee
2 Participants
Loading...
Cancel
Save
There is no content yet.