Rogue Scholar migrates to InvenioRDM

Rogue Scholar migrates to InvenioRDM

In August v.12.0 of the InvenioRDM turn-key research data management repository was released, the first long-term support (LTS) release of the open source software since January 2023. This release enabled the migration of the Rogue Scholar infrastructure to the InvenioRDM platform, a process that will take the next four months.

Deployment

The first stage of the migration was setting up the InvenioRDM production infrastructure. The software depends heavily on Docker containers, which can be run on public cloud infrastructure or virtual machines on premises. The Rogue Scholar API is already running via the fly.io cloud provider and their services are developer-friendly and competitively priced compared to the hyperscalers AWS, Google Cloud, or Microsoft Azure.

Persistent data is stored in a Postgres database and S3-compatible object storage, and Rogue Scholar uses the managed services of Supabase and Tigris for these functionalities, integrated with the fly.io platform. The initial setup of redundant production infrastructure has been completed and you can see the beta version of Rogue Scholar powered by InvenioRDM at https://beta.rogue-scholar.org. The initial release uses InvenioRDM demo data and has some minor issues, please provide feedback in the comments, via email, or Mastodon.

Customization

The current Rogue Scholar infrastructure aligns well with the InvenioRDM data model: blog posts are records with metadata and downloadable content (e.g. in markdown or PDF formats), and blogs can be represented by communities that group blog posts and may provide additional functionality, e.g. funding, membership and reviews. I've started setting up the Rogue Scholar communities but it will take some time with more than 100 participating blogs. Blog posts (records) can be included in multiple communities, so it would be possible to add communities by subject area (e.g. chemistry) or broader topics (e.g. metascience) and/or communities (e.g. German language science bloggers).

InvenioRDM can be customized in many ways, and I started doing this for Rogue Scholar by providing a custom logo, color scheme, and footer.

A big topic is customizing metadata vocabularies, and that work has started with adding the community category blog. Future work will include customizing community metadata to allow for blog RSS feeds and platforms (e.g. Wordpress or Jekyll). For records metadata most of the standard metadata are also useful for blog posts, but one important metadata element for blog posts needs to be added to InvenioRDM: the feature image:

InvenioRDM supports translations of the user interface into other languages, and translations for the v12.0 release are planned for the October v12.1 release. Rogue Scholar currently supports seven languages and more work will be needed to translate the Rogue Scholar customizations into other languages.

InvenioRDM modules

One key difference between Rogue Scholar and the InvenioRDM platform is the automatic extraction of metadata and content via blog RSS feeds. To implement this in the InvenioRDM platform would require a custom InvenioRDM module, written in Python. Luckily this work can be based on the rogue-scholar-api package, also written in Python and integrated with the Quart framework (very similar to the Flask framework that InvenioRDM is using). The main question to answer is whether there is interest in the broader InvenioRDM community for this functionality, as RSS feeds can not only be used for blogs but also other content, e.g. journals or conference proceedings.

Another functionality where Rogue Scholar differs from InvenioRDM is that Rogue Scholar used Crossref for DOI registration whereas InvenioRDM has built-in registration for DataCite DOIs. InvenioRDM also accepts metadata from externally registered DOIs, but to make Crossref DOI registration part of the InvenioRDM platform – e.g. for a preprint server powered by InvenioRDM – would require yet another InvenioRDM module.

Another big change in migrating Rogue Scholar to the InvenioRDM platform is full-text search. InvenioRDM uses OpenSearch (an open source fork of Elasticsearch), whereas Rogue Scholar currently uses Typesense. Typesense is an easier to use alternative to Elasticsearch, and InvenioRDM currently does not support full-text search of content but only indexes metadata. Both Elasticsearch/Opensearch and Typesense require synchronization with the Postgres database, and this adds complexity, overhead and potential delays. An alternative would be to implement full-text search directly in Postgres via extensions such as pg_search. This is a complex topic that needs more discussion with the broader InvenioRDM community, and will not be part of the initial migration of Rogue Scholar to InvenioRDM by the end of the year.

Migrating to InvenioRDM means that Rogue Scholar will become part of the InvenioRDM community, an active and growing international community started in 2019 by CERN based on their Zenodo repository. This starts the next chapter in the short history of Rogue Scholar, and I am excited by the opportunities that lie ahead. If you have questions or ideas regarding this migration to InvenioRDM, please reach out via the comments, email, or Mastodon.

Copyright © 2024 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.