Rogue Scholar learns about communities

Rogue Scholar learns about communities
Photo by Mario Purisic / Unsplash

The Rogue Scholar infrastructure started migrating to InvenioRDM infrastructure a few weeks ago. This first phase of the migration will conclude on November 4 with the switch of the Rogue Scholar frontend (rogue-scholar.org) to InvenioRDM (to what is currently hosted at beta.rogue-scholar.org).

InvenioRDM record

For the most part and not by coincidence, InvenioRDM is a very good fit for Rogue Scholar. Showing the metadata and content of Rogue Scholar blog posts can use more or less the default InvenioRDM record view, which you can see for example in Zenodo. We show optional metadata for the journal title (i.e. the blog name) and ISSN. We display previews of the markdown and PDF version of the blog post, automatically generated by the Rogue Scholar API. The Rogue Scholar DOI is displayed as identifier, and clicking the DOI link brings you to the original blog post hosted on the blog participating in Rogue Scholar (here Science in the Open).

In contrast to Zenodo and most other InvenioRDM instances, Rogue Scholar DOIs point to content hosted elsewhere (by the participating science blog), as Rogue Scholar is an archive and not the primary place where content is hosted and metadata generated. Further work is needed to change what is displayed prominently, including the Rogue Scholar DOI, language, and subject areas (the latter two are also used as search filters/facets). In contrast, the resource-type and license status don't need to be displayed prominently as they are the same – resource-type preprint and license open (either CC-BY or CC0) – for all blog posts.

The only Rogue Scholar metadata that are not easy to show in InvenioRDM are feature images of blog posts, but this requires more work on the workflow for storing these images (not just the references) in Rogue Scholar, and then showing them as custom metadata.

Again in contrast to most other InvenioRDM instances, the record content and metadata are not generated by users filling out a web form, but are automatically generated from blog RSS feeds by the Rogue Scholar API. In the coming weeks, I will focus on integrating the Rogue Scholar API with the InvenioRDM API and bring all 100+ participating science blogs with more than 15,000 blog posts into the Rogue Scholar InvenioRDM instance.

Communities

One core functionality of InvenioRDM that enables exciting new features for Rogue Scholar is communities. Communities allow the grouping of blog posts into collections, and the most obvious use case is the grouping of blog posts by blog:

This enables functionality similar to what Rogue Scholar currently provides with blog-specific views. More work is needed to customize the blog community metadata to include the RSS feed URL and blogging platform (both needed for automated content and metadata extraction).

These blog communities – like everything else in InvenioRDM – can be managed by the InvenioRDM API, and integration with the Rogue Scholar API is a top priority for October.

One important task for blogs participating in Rogue Scholar is integrating the DOIs registered by Rogue Scholar in the blog itself. For blogs publishing at low-frequency, this can be done manually, but for blogs publishing multiple times per month, this process should be automated. With the InvenioRDM API this can now be done with an API that is much more widely used and adopted, for the example above via the endpoint https://beta.rogue-scholar.org/api/communities/ropensci/records.

InvenioRDM records can be included in more than one community, and the obvious communities to use in addition to blogs are topics. The Rogue Scholar beta instance has started with 12 topic communities, which include these six Open Science communities:

In addition, there are six other topic communities covering topics that are popular with multiple Rogue Scholar blogs:

These topic communities have just started and only include a few blog posts each. Below is a screenshot of part of the Open Infrastructure community, where you can see that several Rogue Scholar blog posts talk about adopting the Principles of Open Scholarly Infrastructure, itself originally a blog post and included in Rogue Scholar (see also screenshot above).

A topic that has been very popular with Rogue Scholar bloggers since the launch of ChatGPT 3.5 two years ago is Artificial Intelligence, covering aspects such as practical advice, background research, and legal implications.

More experience and feedback regarding topic communities is needed, please reach out with comments, or if you want to help curate a new or existing community. As Rogue Scholar generates all metadata automatically from blog metadata, the question of changing blog post metadata by communities also needs more discussion.

As of today, 78 Rogue Scholar blog posts have been added to the Beta version of the InvenioRDM-based service. In the coming weeks, all currently 17,468 blog posts will be added via API, and added to appropriate topic communities. Until November 4, I will also work on interface glitches and bugs I encounter, and make sure the InvenioRDM Rogue Scholar instance scales appropriately with traffic. Please reach out via email if you encounter bugs or performance issues.

The next InvenioRDM release (v12.1, planned for the next few weeks) will improve language support, and Rogue Scholar will benefit as it currently supports seven languages and 31% of blog posts are currently written in languages other than English (predominantly German and Spanish). Some work will be needed to translate the Rogue Scholar customizations.

Future Work

After re-launching the Rogue Scholar frontend on November 4, the next phase of developing work will focus on consolidating duplicated resources, including two separate Postgres databases, Search indexes and APIs. Rogue Scholar currently uses Typesense for search versus OpenSearch used by InvenioRDM. In contrast to the Rogue Scholar Typesense integration, InvenioRDM currently does not offer full-text search of content, so substantial development work is needed and will not be completed until 2025.

Both the Rogue Scholar and InvenioRDM API are written in Python and integrate with the similar Quart and Flask frameworks, respectively. Consolidating them is mostly about deciding what part of the Rogue Scholar functionality is of broader interest to the InvenioRDM community, e.g. automatic metadata extraction from RSS feeds or automatic generation of markdown, PDF, and other document formats using the Pandoc document converter.

DOI registration of Rogue Scholar DOIs and metadata is done with Crossref. InvenioRDM supports DOIs registered externally, but for a deeper integration with fewer dependencies more work is needed. Full-text search, support for Crossref DOI registration, and RSS support are features of interest to other members of the InvenioRDM community, so this work will be coordinated.

References

Fenner, M. (2024, September 2). Rogue Scholar migrates to InvenioRDM. Front Matter. https://doi.org/10.53731/sdazp-kzn55

Fenner, M. (2023, January 16). RSS, Atom, JSON Feed. Front Matter. https://doi.org/10.53731/d6vdvbt-tffmezj

Tay, A. (2024, September 16). Primo Research Assistant launches- a first look and some things you should know. Aaron Tay’s Musings about Librarianship. https://doi.org/10.59350/66xvc-kjy06

Research Graph. (2024, August 2). What is Gemma2? Stories by Research Graph on Medium. https://doi.org/10.59350/bgk5r-gaw57

Pooley, J. (2024, January 2). Large Language Publishing. Upstream. https://doi.org/10.54900/zg929-e9595

Copyright © 2024 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.