We know that software is important in research, and some of us in the scholarly communications community, for example, in FORCE11, have been pushing the concept of software citation as a method to allow software developers and maintainers to get academic credit for their work: software releases are published and assigned DOIs, and software users then cite these releases when they publish research that uses the software.
DataCite recently examined the DOIs that have been created for software, and found that the number of new DOIs created for software is growing roughly exponentially, now reaching about 2000 software DOIs per month, with spikes of around 4000 per month in some of 2017. The data and results are shown here. The source code for the R script used to generate the data and figures is available (Fenner, Katz, Smith, & Nielsen (2018)).
As of May 16, 2018, 58,301 DOIs have been registered for software. We can break down this number by repository where the software source code is hosted – most DOIs for software have been registered at Zenodo.
|CERN.ZENODO - ZENODO - Research. Shared.||41346|
|FIGSHARE.ARS - figshare Academic Research System||4226|
|PURDUE.NCIB - National Cancer Institute, Bioconductor||2769|
|PURDUE.EZID - Purdue University||2463|
|OSTI.DOE - DOE Generic||736|
|INIST.INRA - Institut National de Recherche Agronomique||223|
|OCEAN.OCEAN - Code Ocean||206|
|CRUI.INFNCNAF - Istituto Nazionale di Fisica Nucleare. Centro Nazionale Analisi Fotogrammi||190|
|CDL.UCI - UC Irvine Library||120|
|ETHZ.DA-RD - ETHZ Data Archive - Research Data||88|
How did these numbers change over time, since the he first DataCite DOI for software was registered September 7th, 2011 by the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Germany (Colmsee, Flemming, Klapperstück, Lange, & Scholz (2011))?
We can start by looking at the Zenodo/GitHub integration, where users can archive a GitHub repository in the Zenodo data repository. The integration was launched in February 2014 and we can see a nice correlation with this data, and with a May 2014 blog post by Arfon Smith on the GitHub blog, describing (and advertising) the integration work.
In September 2016, the FORCE11 Software Citation Principles (A. M. Smith, Katz, Niemeyer, & FORCE11 Software Citation Working Group (2016)) were published, the Zenodo/GitHub integration was upgraded ((???)/), and in October 2016 the GitHub Guide to Making your Code Citable was updated. There appears to be a change of in the rate of growth around this time as well.
We see a nice exponential growth in the number of DOIs for software, and we don't expect this to change in 2018 and beyond. The FORCE11 Software Citation Implementation Working Group is working on implementation and adoption of the Software Citation Principles, and for a number of use cases, e.g., citation in a journal article, DOIs play an important role. The working group also tries to address the challenges in using DOIs as identifiers for software that still exist, and what is done to resolve them, including pre-registration APIs to smooth the automated push-style deposit; better semantic linkage supported by extensions to the DataCite schema, and group/collective/microcitation DOI use.
We expect initiatives such as Citation File Format and Software Heritage to have a positive impact on the number of DOIs for software. A paper on persistent identification and citation of software using DOIs by Jones et al (C. M. Jones, Matthews, Gent, Griffin, & Tedds (2017)) was published in July 2017, based on earlier work from 2015 (Gent, Jones, & Matthews (2015)), and the DataCite Metadata 4.1 schema focussing on software citation was released in September 2017 (DataCite Metadata Working Group (2017), Starr (2017)).
CodeMeta (Boettiger (2017), M. B. Jones et al. (2017)) is particularly relevant; this new standard for software metadata simplifies the crosswalk between the wide variety of metadata standards for software, and is increasingly integrated into DOI registration workflows, including the CaltechDATA repository since March 2018, the DataCite DOI registration service since May 2018 (Fenner (2018), Dasler (2018)) and is planned for the Zenodo/GitHub integration in autumn 2018. CodeMeta libraries are currently available for R (Codemetar, Boettiger et al. (2018)), Ruby (Bolognese, Fenner (2017)) and Python (CodeMetaPy).
This blog post was originally published on the DataCite Blog.
Boettiger, C. (2017, January). Codemeta: A rosetta stone for software metadata. figshare. https://doi.org/10.6084/m9.figshare.4490588
Boettiger, C., Salmon, M., Arfon Smith, Ross, N., Leinweber, K., & Krystalli, A. (2018). Ropensci/codemetar: Codemetar: Generate codemeta metadata for r packages. Zenodo. https://doi.org/10.5281/zenodo.1241346
Colmsee, C., Flemming, S., Klapperstück, M., Lange, M., & Scholz, U. (2011). A case study for efficient management of high throughput primary lab data. Leibniz Institute of Plant Genetics; Crop Plant Research (IPK). https://doi.org/10.5447/ipk/2011/0
Dasler, R. (2018). DOI fabrica 1.0 is here! DataCite. https://doi.org/10.5438/0yk5-b755
DataCite Metadata Working Group. (2017). DataCite metadata schema for the publication and citation of research data v4.1. DataCite. https://doi.org/10.5438/0014
Fenner, M. (2017). Bolognese: A ruby library for conversion of doi metadata. DataCite. https://doi.org/10.5438/n138-z3mk
Fenner, M. (2018). Frontend for the datacite doi fabrica service. DataCite. https://doi.org/10.5438/CXE5-RG55
Fenner, M., Katz, D. S., Smith, A., & Nielsen, L. H. (2018). DOI registrations for software. DataCite. https://doi.org/10.5438/wr0x-e194
Gent, I., Jones, C., & Matthews, B. (2015). Guidelines for persistently identifying software using datacite. Retrieved from http://purl.org/net/epubs/work/24058274
Jones, C. M., Matthews, B., Gent, I., Griffin, T., & Tedds, J. (2017). Persistent identification and citation of software. International Journal of Digital Curation, 11(2), 104–114. https://doi.org/10.2218/ijdc.v11i2.422
Jones, M. B., Boettiger, C., Mayes, A. C., Smith, A., Slaughter, P., Niemeyer, K., … Goble, C. (2017). CodeMeta: An exchange schema for software metadata. KNB Data Repository. https://doi.org/10.5063/schema/codemeta-2.0
Smith, A. M., Katz, D. S., Niemeyer, K. E., & FORCE11 Software Citation Working Group. (2016). Software citation principles. PeerJ Computer Science, 2, e86. https://doi.org/10.7717/peerj-cs.86
Starr, J. (2017). New datacite metadata updates support software citation. https://doi.org/10.5438/NZHX-XX96
Adding References to the DataCite Blog
We launched this blog six weeks ago on a hosted version of Ghost, the open source blogging platform. Ghost doesn't have all the features of Wordpress or other more mature blogging platforms, but it is a pleasure to use....
Introducing the PID Graph
Persistent identifiers (PIDs) are not only important to uniquely identify a publication, dataset, or person, but the metadata for these persistent identifiers can provide unambiguous linking between persistent identifiers of the same type, e.g....
Using Schema.org for DOI Registration
Three weeks ago we started assigning DOIs to every post on this blog (Fenner, 2016c). The process we implemented uses a new command line utility and integrates well with our the publishing workflow, with (almost)...