Obtaining a DOI (Digital Object Identifier)

When a digital resource is published with a DOI, its content and description is registered and archived. Certain requirements have to be met to ensure persistence and accessibility. Please read the specifications and instructions on this page carefully so that the publication process will run smoothly and quickly.

Contents

  1. Introduction
  2. Preparing your repository
  3. Creating a DataCite metadata file
  4. Structuring the DataCite file
  5. Finalizing the DOI request
  6. Citing your publication

1. Introduction

A DOI (Digital Object Identifier) permanently identifies a digital resource, allowing datasets or files to be citable and accessible through this identifier.

Registering a DOI constitutes a publication and therefore the repository must meet certain requirements, as described below. Furthermore, specific metadata to create the DOI record must be provided. When using the GIN DOI service, this information must be in a file called datacite.yml at the root of the repository to be published. The following sections describe how this file can be created and which information it should contain. You can find datasets already published with the GIN-DOI service at https://doi.gin.g-node.org.

Publishing supplemental data

A frequent use cases for obtaining a DOI is publishing a dataset as supplement to a research paper.

For this purpose, it is important that the DOI of the dataset is obtained before the paper is published and is included in the paper. You should always use the DOI, never the URL of the repository, to reference the dataset, since only the DOI is guaranteed not to change. Good places for referencing a supplementing datset are the Data Availability section or the References of the paper. Usually it is possible to include the DOI of the dataset in the paper even as late as with the galley proof corrections.

In addition, a reference to the paper with its DOI should be included in the datacite.yml metadata file in the repository (see below). Reciprocal DOI references result in data-paper links and will increase the findability of your research.

2. Preparing your repository

Before submitting your DOI request, ensure that all files have been uploaded and that the repository contains a comprehensive description of the contents, so that other researchers can understand and access your data or code.

The repository name

Choose a name for your repository that is brief, but specific. If a user downloads the repository or its archive from the server, its contents will be put in a folder named after the repository name. Therefore, the repository name should be unique and descriptive of your data even when it does not appear under your username on GIN. Avoid generic names (like "my_data", "plos_paper", or the like). If you create a repository specifically for a supplement to a paper publication, a name of the form <first_author>_et_al_<year>_<journal> may be useful to consider.

The README file

The usual place to present information about the repository for the user is the README.md file at the root of your repository. Depending on the structure and contents of the repository, it may be useful to include further README files in subfolders.

The description of your repository should include:

  • A title and summary of what the dataset is about.
  • Information about the repository content and structure.
  • Information on how to access and use the data and/or code, including what data formats are used, how metadata is provided, what code is available and how to use it, etc.

A good example can be found here.

It is useful to also include in the README file general information, like authors, contact information, acknowledgments, links and references, etc. Some of that information will also be provided in the DataCite file (see below). Since the DataCite information is for automated creation and registration of the DOI record, while the README file is intended for human readers, some redundancy between these files is not a problem.

3. Creating a DataCite metadata file

To create the DataCite file, click the button labelled "Add DataCite File" at the top of the repository page (see image below). This will add the datacite.yml template file into your repository. You can then start editing the template with the information about your data.

After clicking on the "Add DataCite File" button, you will be taken to an edit page where you can fill in the appropriate metadata for your dataset. The Graphical Editor tab at the top of the file text will switch to a simpler representation of the file's content. In this view, comments are removed, fields names are shown in black text, and field values are shown in green. In the original Edit file view, the comments (lines beginning with # symbol) explain the meaning and format of each section. You may remove these comments.

You can also find a description of the needed entries in the next section of this tutorial.

Clicking on the arrow next to an entry category will expand that section.

To change the order of entries, click on the dotted icon and drag to the desired position.

If you need more or fewer entries for any category, you can duplicate or delete the ones in the example file by clicking on the square icon.

Modify the example entries by replacing the green text with the information that fits your repository data. Please remove any lines with example entries from the template that do not apply to your dataset.

After adding an appropriate commit message, click "Commit Changes" at the bottom of the page to save. Once saved or uploaded to the root of a repository, a valid file will be rendered below the Readme section in the repository overview.

4. Structuring the DataCite file

The entries of the datacite.yml file represent a subset of the DataCite Metadata Schema.

Required information

The datacite.yml file must contain at least the following entries:

authors
title
description
keywords
license

Optional information

These entries should be provided if applicable

references
funding

When the dataset is supplement to a paper publication, a reference to the paper is required.

Entries are specified by providing the corresponding lowercase field name (authors, title, etc.) followed by a colon and contents formatted as described here and shown in the example file.

Authors

Authors are the main researchers involved in the underlying research. As for any publication, all authors should be aware of and have agreed to the publication of the repository.

For each author, first and last name and affiliation are required. Additionally, an ID (digital identifier, e.g. ORCID) is highly recommended. Including an ORCID will automatically include this publication in your ORCID record. Please enter the authors as list items, each item indented and prefixed with -, each of the author keywords indented as shown below.

For the ID, include a prefix followed by a colon : to indicate the type of identifier, e.g.: ORCID:.
Please note that since the proprietary ResearcherID identifier is not openly accessible, we do not consider it a valid persistent identifier for authors.

authors:
  -
    firstname: "GivenName1"
    lastname: "FamilyName1"
    affiliation: "Affiliation1"
    id: "ORCID:0000-0001-2345-6789"
  -
    firstname: "GivenName2"
    lastname: "FamilyName2"
    affiliation: "Affiliation2"

Title

The title is the descriptive name of the resource to be published.

Please use sentence case. Line breaks may not be used in the title field.

title: "Example Title"

Description

The description contains extended information about your dataset. It has a similar role as the abstract of an article. We recommend this to be at least a few sentences long.

Line breaks may be used as long as indentation is maintained. The pipe symbol | after the section heading indicates that this is a multi-line field.

description: |
 Example description
 that can contain linebreaks
 but has to maintain indentation.

Keywords

The keywords section should be used to list terms the dataset is associated with. We recommend using as many keywords as possible. Keywords will be included in the Datacite record and will be exposed to web search engines, increasing the findability of the dataset.

Each keyword should be entered on a line on its own, indented and prefixed with -.

keywords:
 - Neuroscience
 - Electrophysiology

License

The license entry specifies the license under which the dataset will be published.

Please provide both a license name and a URL to the original license text, both indented as shown in the example below. In addition, a LICENSE file with the text of the corresponding license needs to be present in the repository. The license file can either be selected when creating your repository or uploaded afterwards.

license:
 name: "CC0"
 url: "http://creativecommons.org/publicdomain/zero/1.0"

References

References should include resources associated with the dataset, such as a research article that is based on the data, a preprint, or a repository with code used for analysis.

In addition to the human-readable citation or name of the resource, please provide the relation to the dataset and, if possible, a digital identifier.
If your dataset refers to a manuscript before publication and citation information or DOI are not known yet, enter any information that you have at the time (see examples below).

For id, include a prefix (followed by a colon :) to indicate the source/type of the identifier. Supported sources are DOI, arXiv, PMID, and URL (see examples below).

If the identifier is not known yet, tba should be used as a placeholder, according to the DataCite schema. Once you know citation and DOI of the publication, please update the information in the datacite.yml file so that dataset and paper can be linked in the DataCite registry, which will increase the findability of your publications.

For reftype, the following relations may be used:

IsSupplementTo
IsDescribedBy
IsReferencedBy

The most common situation is referring to a manuscript, preprint or research paper that describes a study based on the data. For these cases the appropriate reftype to use is IsSupplementTo.
The reftype IsDescribedBy is used to reference a data descriptor paper in the case of a dataset publication.
A reference to code related to the dataset should have the reftype IsReferencedBy.
There are further options for reftype which can be found in the DataCite Metadata documentation. Those are however usually not relevant for the GIN use cases.

For citation, please enter the full citation information (authors, year, title, journal, volume, pages). While the id with the digital identifier will enable linking paper and data, the citation will greatly help human readers to recognize your publication.

Please enter the references as list items with each item indented and prefixed with a dash (-) and each of the fields indented, as shown below.

references:
  -
    id: "doi:10.123/abc123"
    reftype: "IsSupplementTo"
    citation: "Author A, Author B (2010) A study based on the data. Neuroscience Journal 12(3):1234-1245. https://doi.org/10.123/abc123"
  -
    id: "doi:tba"
    reftype: "IsSupplementTo"
    citation: "Author A, Author C: Title of a manuscript that is not published yet. Neuro Research Journal, submitted."
  -
    id: "url:https://github.com/ab-cd/greatdataanalysis"
    reftype: "IsReferencedBy"
    citation: "GreatDataAnalysis: Analysis of the dataset. Github. https://github.com/abc/greatdataanalysis"
  -
    id: "arxiv:2201.23456"
    reftype: "IsSupplementTo"
    citation: "Author A, Author D (2022) Preprint of a study based on the data. arXiv. https://doi.org/10.48550/arXiv.2201.23456.
"

Funding

Funding is a list of items indicating funding sources related to the dataset. For each item, funder name and grant number should be specified and be separated by a semicolon. Each item should be on a new line, indented and prefixed with -.

funding:
 - "DFG; AB1234/5-1"
 - "EU; 123456"
 - "NIH; R01AB123456-01"

Resource Type

The default resource type is Dataset. In the case of a code publication, this should be changed to Software. Other options are Text or Preprint. The DataCite Metadata schema specifies further types, which however are rarely relevant for GIN use cases.

resourcetype: Dataset

5. Finalizing the DOI request

Once a valid DataCite file named datacite.yml is present at the root of a public repository, a preview of the contents will be rendered below the README section on the repository main page. A DOI may now be requested by clicking the "Request DOI" button. Before requesting the DOI, please make sure that the email address used in your GIN account is still valid and that you receive emails from gin@g-node.org (you may want to whitelist the domain in your spam filter).

If the "Request DOI" button does not appear, this indicates that your repository does not fulfill the necessary requirements. Make sure that your datacite.yml file is correct, a LICENSE is present, and that your repository is publicly accessible.

The DataCite file will be automatically checked for formatting and encoding errors. A preview of the metadata to be registered is also displayed, allowing a final check for any mistakes. Once you have ensured that the information in your DataCite file is correct, click the "Request DOI Now" button to submit your request.

Please note that all data contained in your repository will be archived for DOI registration. Any private information should thus be removed before submitting a DOI request.

The DOI registration process includes automated as well as manual checks to ensure that the repository meets the minimal requirements to make the data useful for other researchers. You will be notified at the email address associated with your account on GIN if changes should be necessary, or when the publication is completed.

6. Citing your publication

When the publication has been successfully completed, you will receive an email with the registered DOI and the citation for the published resource. When you cite your publication, it is important that you cite it with the DOI. Do not use the URL linking to your repository on GIN, since this is not a persistent identifier and is not guaranteed to be valid. In contrast, the DOI will always redirect to your publication, making sure that your work can be found.

Thomas Wachtler edited this page 1 year ago