Going beyond GitHub Actions for Rogue Scholar

The science blog archive Rogue Scholar depends heavily on GitHub Actions. They are used to trigger content and metadata extraction of new blog posts and to register DOIs for these posts with Crossref. More recently they have also been used to push this content and metadata to the new InvenioRDM-based Rogue Scholar platform.

GitHub Actions are workflows that typically operate on the command line. For more complex tasks such as DOI metadata generation, they can also install software to do the task. Since May 2024, Rogue Scholar has used the commonmeta Go library for Crossref DOI metadata generation, a package I developed in the spring to complement the Python and Ruby versions of commonmeta. The advantage of Go over Python in this context is mainly the easy installation as it is a single binary with no external dependencies.

Last week Rogue Scholar reached another big milestone: 20K blog posts archived in the new InvenioRDM instance and registered with a Crossref DOI and useful metadata, available via full-text search. The continued growth of the Rogue Scholar platform highlighted the limitations of the current GitHub Actions workflow:

  • only one new blog post can be registered every 10 min,
  • information flow between the legacy Rogue Scholar API, Crossref, and the new InvenioRDM API is complicated.

To overcome these limitations I have expanded the commonmeta Go library. The new version (v.0.6.5 and higher) not only converts metadata between different formats but can now also directly register metadata with Crossref and InvenioRDM. This means less bash command-line scripting and JSON jq parsing in GitHub Actions, and leads to a more solid information flow (including lots of looping and conditional branching) and better performance, as DOIs can be registered or updated in bulk (currently 10 DOIs every 10 min). The bottleneck is now no longer the GitHub Actions but the speed at which the Rogue Scholar InvenioRDM instance can handle updates via API. As the InvenioRDM software is known to scale well in much bigger instances such as Zenodo, this is mainly an issue of adding enough powerful virtual machines and databases to power the Rogue Scholar service.

A welcomed side effect of this commonmeta Go library update is that it makes it easy to use commonmeta to manage Crossref DOIs and InvenioRDM instances. If for example a Crossref member wants to update all their DOI metadata to include the ROR identifier of their organization, they can fetch all their metadata in commonmeta format (using the Crossref annual data dump or the Crossref REST API) and then update the respective commonmeta JSON.

A typical use case for InvenioRDM is an institutional repository hosting all publications from the institution originally published elsewhere. Commonmeta can convert DataCite and Crossref metadata (from annual data dump or API calls) into the InvenioRDM format and push all metadata into the InvenioRDM instance.

Another use case is sample data for a fresh InvenioRDM instance. Instead of fake demo data provided by the InvenioRDM software, commonmeta allows you to easily bulk import metadata in various formats. One functionality is particularly useful for this: the Crossref sample API, returning a random set of DOIs with metadata. To tell commonmeta to fetch 100 DOI records from Crossref member 246 (Cold Spring Harbor Laboratory) of content type posted-content – i.e. preprints from bioRxiv or medRxiv, and upload them to the InvenioRDM developer instance on your local computer, issue the following command:

commonmeta push --sample --member 246 --type posted-content -n 100 -f crossref -t inveniordm --host localhost --token xxx

The DataCite sample API unfortunately has severe performance issues but in principle you can do the same with DataCite metadata.

Please reach out if you have questions or issues with the commonmeta Go library, e.g. if you want to use the commonmeta o library in combination with the InvenioRDM Starter.

References

Fenner, M. (2024, May 13). Going for DOI registration. Front Matter. https://doi.org/10.53731/43qt9-x6p52

Fenner, M. (2024, June 17). Announcing InvenioRDM Starter Beta. Front Matter. https://doi.org/10.53731/jxecm-0me48