Improving software metadata conversion by adding CFF support

Improving software metadata conversion by adding CFF support

In August GitHub added enhanced support for citation-file-format (CFF) to all GitHub repositories. As you can see in the chart below (kindly provided by Stephan Druskat and based on GitHub queries for CFF files), this has led to a significant increase of repositories using CFF files and thus exposing software metadata that go beyond what GitHub provides via other means.

CFF support in GitHub provides an important building block that can be enhanced by a) making it easier to generate CFF files (text files using the YAML format and stored in the repository root folder), and b) converting the CFF into other metadata formats to make reuse easier elsewhere.

In 2017 I started the metadata conversion library bolognese that is heavily used internally by DataCite, focussing on the conversion of DOI metadata. During the Force2021 hackathon last week I expanded the bolognese library to support CFF conversion, and to allow the writing of DOI metadata in the Crossref XML format (not used for software metadata, but to register Crossref DOIs for this blog). As these two changes are currently not a priority for the DataCite development team, it was easier to fork the bolognese software, so I started the briard Ruby gem. For naming the new Ruby gem, I have continued the tradition I started almost 10 years ago of using dog breed names to name software libraries.

Briard dog. From https://commons.wikimedia.org/wiki/File:Briard_R_01_Puppy.jpg, using a CC BY SA license.

You can install briard via the command line with gem install briard, enabling reading and writing of CFF metadata via Ruby code or the command line:

Convert CFF metadata for the ruby-cff library into a formatted citation using the Vancouver citation style and the German language locale:

briard https://github.com/citation-file-format/ruby-cff -t citation --style vancouver --locale de

Haines R, The Ruby Citation File Format Developers. Ruby CFF Library [Internet]. GitHub. GitHub; 2021. Verfügbar unter: https://github.com/citation-file-format/ruby-cff

Generate a CFF file from a GitHub repository archived in Zenodo with DOI metadata:

briard 10.5281/zenodo.5217599 -t cff

---
cff-version: 1.2.0
message: If you use Ruby CFF Library in your work, please cite it using the following
  metadata
doi: https://doi.org/10.5281/zenodo.5217599
repository-code: https://zenodo.org/record/5217599
title: Ruby CFF Library
authors:
- given-names: Robert
  family-names: Haines
  orcid: https://orcid.org/0000-0002-9538-7919
  affiliation: The University of Manchester, UK
- name: The Ruby Citation File Format Developers
abstract: This library provides a Ruby interface to manipulate Citation File Format
  files
version: 0.9.0
keywords:
- ruby
- credit
- software citation
- research software
- software sustainability
- metadata
- citation file format
- CFF
date-released: '2021-08-18'
references:
  identifiers:
  - type: url
    value: https://github.com/citation-file-format/ruby-cff/tree/v0.9.0
  - type: doi
    value: 10.5281/zenodo.1184077
  - type: url
    value: https://zenodo.org/communities/zenodo

As a command-line tool, briard can be integrated with a number of workflows, including GitHub Actions used in GitHub software development workflows.

Ultimately the goal is to increase the adoption of richer metadata for open source software, which in turn will enable important use cases from discovery to academic credit. As of today, 251,633 DataCite DOIs have been registered for software, including 218,810 DOIs (87.0%) via the Zenodo repository. The briard library makes it straightforward to convert the metadata for these software libraries into CFF files that can then be stored with the source code repository. A good starting point would be the GitHub/Zenodo integration, where we could add CFF files to the GitHub repository (if not yet existing) in addition to writing DOI metadata.

Obviously not all source code for software DOIs is stored in GitHub, and for this reason, Stephan Druskat and I started to look into how to add similar software citation functionality to GitLab in the Force2021 hackathon last week.

References

Fenner M. A step forward for software citation: GitHub’s enhanced software citation support. Published online August 24, 2021. doi:10.53731/r9531p1-97aq74v-ag78v

Fenner M. Join us for the Force2021 Hackathon. Published online November 16, 2021. doi:10.53731/rckvde5-tzg61kj-7zvc1

Fenner M. Registering content with Crossref or DataCite. Published online October 22, 2021. doi:10.53731/rbjgna1-97aq74v-ag811

Fenner M. Briard. Published online December 16, 2021. doi:10.5281/ZENODO.5785519

Copyright © 2021 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.