A step forward for software citation: GitHub's enhanced software citation support

A step forward for software citation: GitHub's enhanced software citation support

On August 19, GitHub announced software citation support in GitHub repositories. Citation information provided by users (using a CITATION.cff YAML file in the root directory of the default branch) is parsed and made available as bibtex file or formatted citation, currently supporting the APA citation style. The APA style is a good start as it is the only popular citation style that labels software with the string [Computer software] but we hope to see support for more citation styles – including those popular in computer science – going forward. Going forward we also hope to see support for biblatex and potentially @software as the bibtex entry type.

Cite this repository GitHub example
@misc{Haines_Ruby_CFF_Library_2021,
  author = {Haines, Robert and {The Ruby Citation File
  Format Developers}},
  doi = {10.5281/zenodo.1184077},
  month = {8},
  title = {{Ruby CFF Library}},
  url = {https://github.com/citation-file-format/ruby-cff},
  year = {2021}
}

This is an important step forward for wider software citation adoption, as it allows software authors to provide the required information necessary for citing software directly in a GitHub code repository, and for tools and workflows to integrate with this information. Within days of the initial GitHub announcement via a tweet by GitHub CEO Nat Friedman, we saw support for this new workflow by the scholarly repository Zenodo and the reference manager Zotero (see below). The swh-indexer by Software Heritage (SWH) already indexed and supported searching over the CITATION.cff files available on the HEAD/master branch of a repository archived in SWH.

We also see hundreds of repositories adding CITATION.cff files every week since the initial announcement – you can track the adoption via this GitHub query.

Tracking Citation.CFF files via GitHub query

While a citation pointing to the GitHub code repository is a great start, ideally the software author wants to perform an important additional step: archive the source code in a long-term archive either via the Software Heritage universal software archive (detailed instructions here, and automatable from the GitHub repository with a GitHub Action), and/or the scholarly repository Zenodo via the the Making Your Code Citable workflow described here, which will use the metadata provided in a CITATION.cff file.

One particular challenge with citing software is versioning, where there are multiple use cases to be supported, including the need to cite a specific version, and to aggregate the citations of all versions in a single place. There is more work needed to link GitHub releases to the version information provided in this new GitHub feature, and to support the generic citation without a specific version – what Zenodo calls a concept DOI, and the Functional Requirements for Bibliographic Records (FRBR) call an expression.

Many software authors also want to link to a publication describing their software from the GitHub repository, and the Citation File Format (CFF) supports this via a ‘preferred-citation’ field. Additionally, authors can cite the software (and other works) their software builds on in a ‘references’ section in CFF files. Software authors are expected to provide the relevant information in a CITATION.cff file in YAML format that follows the Citation File Format specification. An example CITATION.cff file can be found here.

Some software authors might want to use tools to help generate the CITATION.cff file. A starting point is the CFF Initializer available here, and a list of available tools for working with CFF files here; we expect more tools to appear over time. There are many standards for describing software (we already mentioned bibtex and DOI metadata), and CodeMeta also plays a particularly important role by providing crosswalks and tools for converting between the various metadata standards for software. Going forward we expect to see more metadata conversion workflows, in particular via GitHub Actions, adding to the already existing cffconvert and CodeMeta2CFF GitHub Actions. We also hope to see similar software citation support appear in the GitLab platform.

Cross-posted from the FORCE11 blog. Authors: Martin Fenner, Stephan Druskat, Neil Chue Hong, Daniel S. Katz, Morane Gruenpeter, Arfon Smith, Tom Morell and Robert Haines

Copyright © 2021 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.