DOI Registrations for Software

We know that software is important in research, and some of us in the scholarly communications community, for example, in FORCE11, have been pushing the concept of software citation as a method to allow software developers and maintainers to get academic credit for their work: software releases are published and assigned DOIs, and software users then cite these releases when they publish research that uses the software.

DataCite recently examined the DOIs that have been created for software, and found that the number of new DOIs created for software is growing roughly exponentially, now reaching about 2000 software DOIs per month, with spikes of around 4000 per month in some of 2017. The data and results are shown here. The source code for the R script used to generate the data and figures is available (Fenner, Katz, Smith, & Nielsen (2018)).

As of May 16, 2018, 58,301 DOIs have been registered for software. We can break down this number by repository where the software source code is hosted – most DOIs for software have been registered at Zenodo.

CERN.ZENODO - ZENODO - Research. Shared.41346
FIGSHARE.ARS - figshare Academic Research System4226
PURDUE.NCIB - National Cancer Institute, Bioconductor2769
PURDUE.EZID - Purdue University2463
OSTI.DOE - DOE Generic736
INIST.INRA - Institut National de Recherche Agronomique223
OCEAN.OCEAN - Code Ocean206
CRUI.INFNCNAF - Istituto Nazionale di Fisica Nucleare. Centro Nazionale Analisi Fotogrammi190
CDL.UCI - UC Irvine Library120
ETHZ.DA-RD - ETHZ Data Archive - Research Data88

Changes over Time

How did these numbers change over time, since the he first DataCite DOI for software was registered September 7th, 2011 by the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Germany (Colmsee, Flemming, Klapperstück, Lange, & Scholz (2011))?

We can start by looking at the Zenodo/GitHub integration, where users can archive a GitHub repository in the Zenodo data repository. The integration was launched in February 2014 and we can see a nice correlation with this data, and with a May 2014 blog post by Arfon Smith on the GitHub blog, describing (and advertising) the integration work.

In September 2016, the FORCE11 Software Citation Principles (A. M. Smith, Katz, Niemeyer, & FORCE11 Software Citation Working Group (2016)) were published, the Zenodo/GitHub integration was upgraded ((???)/), and in October 2016 the GitHub Guide to Making your Code Citable was updated. There appears to be a change of in the rate of growth around this time as well.

Looking forward

We see a nice exponential growth in the number of DOIs for software, and we don't expect this to change in 2018 and beyond. The FORCE11 Software Citation Implementation Working Group is working on implementation and adoption of the Software Citation Principles, and for a number of use cases, e.g., citation in a journal article, DOIs play an important role. The working group also tries to address the challenges in using DOIs as identifiers for software that still exist, and what is done to resolve them, including pre-registration APIs to smooth the automated push-style deposit; better semantic linkage supported by extensions to the DataCite schema, and group/collective/microcitation DOI use.

We expect initiatives such as Citation File Format and Software Heritage to have a positive impact on the number of DOIs for software. A paper on persistent identification and citation of software using DOIs by Jones et al (C. M. Jones, Matthews, Gent, Griffin, & Tedds (2017)) was published in July 2017, based on earlier work from 2015 (Gent, Jones, & Matthews (2015)), and the DataCite Metadata 4.1 schema focussing on software citation was released in September 2017 (DataCite Metadata Working Group (2017), Starr (2017)).

CodeMeta (Boettiger (2017), M. B. Jones et al. (2017)) is particularly relevant; this new standard for software metadata simplifies the crosswalk between the wide variety of metadata standards for software, and is increasingly integrated into DOI registration workflows, including the CaltechDATA repository since March 2018, the DataCite DOI registration service since May 2018 (Fenner (2018), Dasler (2018)) and is planned for the Zenodo/GitHub integration in autumn 2018. CodeMeta libraries are currently available for R (Codemetar, Boettiger et al. (2018)), Ruby (Bolognese, Fenner (2017)) and Python (CodeMetaPy).

This blog post was originally published on the DataCite Blog.

References

Boettiger C. Codemeta: A Rosetta Stone for Software Metadata. figshare; 2017:6508668 Bytes. doi:10.6084/M9.FIGSHARE.4490588

Boettiger C, Salmon M, Arfon Smith, Ross N, Leinweber K, Krystalli A. ropensci/codemetar: codemetar: Generate CodeMeta Metadata for R Packages. Published online May 5, 2018. doi:10.5281/ZENODO.1241346

Colmsee C, Flemming S, Klapperstück M, Lange M, Scholz U. A case study for efficient management of high throughput primary lab data: source code. Published online 2011:2.2 MB. doi:10.5447/IPK/2011/0

Dasler R. DOI Fabrica 1.0 is Here! Published online May 9, 2018. doi:10.5438/0YK5-B755

DataCite Metadata Working Group. DataCite Metadata Schema Documentation for the Publication and Citation of Research Data v4.1. Published online 2017:72 pages. doi:10.5438/0014

Fenner M. Bolognese: a Ruby library for conversion of DOI Metadata. Published online February 25, 2017. doi:10.5438/N138-Z3MK

Fenner M. Frontend for the DataCite DOI Fabrica service. Published online May 9, 2018. doi:10.5438/CXE5-RG55

Fenner M, Katz DS, Smith A, Nielsen LH. DOI Registrations for Software. Published online May 17, 2018. doi:10.5438/WR0X-E194

Gent I, Jones C, Matthews B. Guidelines for Persistently Identifying Software Using DataCite.; 2015. Accessed July 2, 2023. https://epubs.stfc.ac.uk/work/24058274

Jones CM, Matthews B, Gent I, Griffin T, Tedds J. Persistent Identification and Citation of Software. IJDC. 2017;11(2):104-114. doi:10.2218/ijdc.v11i2.422

Jones MB, Boettiger C, Mayes AC, et al. CodeMeta: an exchange schema for software metadata. Published online 2017. doi:10.5063/SCHEMA/CODEMETA-2.0

Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. Software citation principles. PeerJ Computer Science. 2016;2:e86. doi:10.7717/peerj-cs.86

Starr J. New DataCite Metadata Updates Support Software Citation. Published online October 23, 2017. doi:10.5438/NZHX-XX96