Skip to Main Content

Managing Research Data: Archiving Datasets

Advantages of a Data Repository or Archive

A data repository allows researchers to upload and publish their data, thereby making the data available for other researchers to re-use. Similarly, a data archive allows users to deposit and publish data but will generally offer greater levels of curation to community standards, have specific guidelines on what data can be deposited and is more likely to offer long-term preservation as a service. Sometimes the terms data repositories and data archives are used interchangeably.

 

A data repository or archive will provide services such as:

  • Persistent identifier such as a “digital object identifier” or DOI; the presence of a DOI facilitates discoverability and citeability
  • Assistance with metadata provision e.g. through the use of a template
  • Allow you to apply a licence to your data
  • Aid compliance with the FAIR data principles (data that are Findable, Accessible, Interoperable, and Reusable) as data are published online with appropriate metadata and are assigned a persistent identifier
  • Accept a wide range of data types
  • Long-term access and, in some cases, long-term preservation
  • Offer useful search, navigation and visualisation functionality
  • Reach a wider audience of potential users
  • Manage requests for data on your behalf

When to Select a Data Repository

Choose early so that you can familiarise yourself with the repository’s requirements. Requirements may include depositing in certain file formats, or using a specific metadata standard; and the inclusion of documentation to help describe your data. Understanding such requirements will enable you to design your data collection materials for easier metadata and documentation creation.

 

Initial Questions

  • Has a data repository been specified by my funder? E.g.

NERC Data Centre: http://www.nerc.ac.uk/research/sites/data/ for research funded by the UK’s Natural Environment Research Council

 

  • Has a data repository been specified by my publisher? E.g.

SpringerNature via their recommended repositories: http://www.springernature.com/gp/authors/research-datapolicy/repositories/12327124

PLOS Recommended Data Repositories http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories

Scientific Data Recommended Data Repositories https://www.nature.com/sdata/policies/repositories

 

  • Is there a disciplinary-specific community-recognised data repository I can submit my data to, thereby helping to preserve my data according to recognised standards in my discipline? E.g.

Irish Social Science Data Archive: www.issda.ie

Cancer Imaging Archive: http://www.cancerimagingarchive.net/

PubChem: https://pubchem.ncbi.nlm.nih.gov/

PANGAEA: https://www.pangaea.de/

How to Select a Data Repository

Ask:

  • Is it reputable? Is it listed in Re3data thereby meeting their conditions of inclusion?
  • Is it appropriate to my discipline?
  • Will it take the data you want to deposit?
  • Is there a size limit?
  • Does it provide a DOI / persistent identifier?
  • Does it provide guidance on how the data should be cited?
  • Does it provide access control, where necessary, for your research data?
  • Does it ensure long-term preservation / curation?
  • Does it provide expert help with e.g. metadata provision, curation?
  • Is there a charge?

Other questions may pertain depending on your requirements. For more information see the UK’s Digital Curation Centre’s checklist: http://www.dcc.ac.uk/resources/how-guides-checklists/where-keep-research-data/where-keep-research-data

 

Re3data.org

This is the primary place to locate a data repository. Search by specific research discipline and then filter by access categories, data usage licenses, whether the repository gives the data a persistent identifier etc. Re3data uses a series of symbols to indicate these key services. E.g.:

To be registered in re3data.org a research data repository must:

  • Be run by a legal entity, such as a sustainable institution (e.g. library, university)
  • clarify access conditions to the data and repository as well as the terms of use
  • have focus on research data

https://www.re3data.org/suggest

Multidisciplinary Data Repositories

If there is no disciplinary-specific repository in your area select a general repository. These can handle a variety of different data types. Charges may apply but can be included in a funding application. Key general repositories are listed in the table below. This list is for information purposes only and is not exhaustive:

Computer Code

GitHub is the main platform for hosting and reviewing code:

GitHub offers a number of advantages such as assigning DOIs (which facilitates discoverability and citeability) and allowing integration from Zenodo and FigShare repositories to enable the citing of your GitHub repository in academic literature.

Contact Details

Contact us:
doras@dcu.ie

Acknowledgements

Thanks to CONUL