Tracking references of Rogue Scholar blog posts

Tracking references of Rogue Scholar blog posts
Photo by Simon Berger / Unsplash

The Rogue Scholar science blog archive has been collecting the references of blog posts since June 2023, and has registered Crossref DOIs for 1,114 blog posts with references as of today. As RSS feeds – which Rogue Scholar uses to fetch content and metadata – have not standardized how metadata for references should be described, Rogue Scholar is extracting them from the full text of posts and takes advantage of the fact that there is a standard format: a section called References followed by a listing of metadata (author, title, publication year, etc.) that includes a DOI or URL link. Crossref Directory of Technology Geoff Bilder uses this image in presentations to make a similar point – scholarly audiences immediately understand how the references section of a publication looks:

Image by Geoff Bilder

An example is the references in this post. The citation style (in this case APA) doesn't matter as Rogue Scholar only looks for the links. If the link is a DOI, the Rogue Scholar looks up the DOI metadata via DOI content negotiation and stores DOI, title, and publication year. This information is then included in the metadata registered with Crossref.

Improvements to the references workflow

In the past few weeks I have started to improve this workflow. The first step was to store all references in the Rogue Scholar API, using a new /works API endpoint, and the commonmeta-py JSON schema. Commonmeta-py can import metadata in Crossref and DataCite formats, and the latest version has improved support for extracting metadata from web pages using and HTML meta tags. This allows Rogue Scholar to find DOI metadata for webpages, e.g. journal article landing pages (similar to what Zotero and other reference managers are doing), and to store additional metadata about web links, mainly the title and publication year. This information is now included in the references registered with Crossref.

To show the references that Rogue Scholar has found for a blog post, I changed the layout of Rogue Scholar pages which now include pages for individual posts, e.g. this page:

You navigate to these pages by clicking on the title in a Rogue Scholar search result. Going forward these dedicated pages will show more metadata, e.g. funding and relationship information.

The next step is adding the ability to display the references in a variety of citation styles and export them in formats such as BibTex or CSL, both straightforward as these are core functionalities of the commonmeta-py library.

By showing the references in Rogue Scholar I hope to motivate blog authors to include more references in their posts, and I will continue to work with them on how to improve reference parsing, e.g. when references don't include a DOI or URL link. I am a big supporter of the Initiative for Open Citations (I4OC) and helped get it started. In the case of Rogue Scholar, the challenge is not opening up existing but closed reference metadata, but finding ways to collect and register them with Crossref that scale for science blogs.


Fenner, M. (2023). Starting to include references in DOI metadata for blog posts.

Fenner, M. (2024, February 20). Commonmeta-py now supports metadata lists.

Shotton, D. M. (2017). The Initiative for Open Citations.

Copyright © 2024 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.