Improving software metadata conversion by adding CFF support
In August GitHub added enhanced support for citation-file-format (CFF) to all GitHub repositories. As you can see in the chart below (kindly provided by Stephan Druskat and based on GitHub queries for CFF files), this has led to a significant increase of repositories using CFF files and thus exposing software metadata that go beyond what GitHub provides via other means.
CFF support in GitHub provides an important building block that can be enhanced by a) making it easier to generate CFF files (text files using the YAML format and stored in the repository root folder), and b) converting the CFF into other metadata formats to make reuse easier elsewhere.
In 2017 I started the metadata conversion library bolognese that is heavily used internally by DataCite, focussing on the conversion of DOI metadata. During the Force2021 hackathon last week I expanded the bolognese library to support CFF conversion, and to allow the writing of DOI metadata in the Crossref XML format (not used for software metadata, but to register Crossref DOIs for this blog). As these two changes are currently not a priority for the DataCite development team, it was easier to fork the bolognese software, so I started the briard Ruby gem. For naming the new Ruby gem, I have continued the tradition I started almost 10 years ago of using dog breed names to name software libraries.
You can install briard via the command line with
gem install briard, enabling reading and writing of CFF metadata via Ruby code or the command line:
Convert CFF metadata for the ruby-cff library into a formatted citation using the Vancouver citation style and the German language locale:
briard https://github.com/citation-file-format/ruby-cff -t citation --style vancouver --locale de Haines R, The Ruby Citation File Format Developers. Ruby CFF Library [Internet]. GitHub. GitHub; 2021. Verfügbar unter: https://github.com/citation-file-format/ruby-cff
Generate a CFF file from a GitHub repository archived in Zenodo with DOI metadata:
briard 10.5281/zenodo.5217599 -t cff --- cff-version: 1.2.0 message: If you use Ruby CFF Library in your work, please cite it using the following metadata doi: https://doi.org/10.5281/zenodo.5217599 repository-code: https://zenodo.org/record/5217599 title: Ruby CFF Library authors: - given-names: Robert family-names: Haines orcid: https://orcid.org/0000-0002-9538-7919 affiliation: The University of Manchester, UK - name: The Ruby Citation File Format Developers abstract: This library provides a Ruby interface to manipulate Citation File Format files version: 0.9.0 keywords: - ruby - credit - software citation - research software - software sustainability - metadata - citation file format - CFF date-released: '2021-08-18' references: identifiers: - type: url value: https://github.com/citation-file-format/ruby-cff/tree/v0.9.0 - type: doi value: 10.5281/zenodo.1184077 - type: url value: https://zenodo.org/communities/zenodo
As a command-line tool, briard can be integrated with a number of workflows, including GitHub Actions used in GitHub software development workflows.
Ultimately the goal is to increase the adoption of richer metadata for open source software, which in turn will enable important use cases from discovery to academic credit. As of today, 251,633 DataCite DOIs have been registered for software, including 218,810 DOIs (87.0%) via the Zenodo repository. The briard library makes it straightforward to convert the metadata for these software libraries into CFF files that can then be stored with the source code repository. A good starting point would be the GitHub/Zenodo integration, where we could add CFF files to the GitHub repository (if not yet existing) in addition to writing DOI metadata.
Obviously not all source code for software DOIs is stored in GitHub, and for this reason, Stephan Druskat and I started to look into how to add similar software citation functionality to GitLab in the Force2021 hackathon last week.