Starting to register DOIs for all blog posts included in the Rogue Scholar
The Rogue Scholar archive of scholarly blogs has grown to 34 blogs with about 420 blog posts. In order to implement DOI registration for these blog posts, I needed two things:
- Content and metadata, ideally without requiring blogs to implement anything special.
- A way to track the DOIs that have been registered
Initial work on DOI registration for blog posts focussed on exposing the relevant metadata on the blog landing page, using schema.org and/or HTML meta tags. While this approach worked well for this and similar blogs, it was too complicated and didn't scale well for the large and diverse number of blogs the Rogue Scholar aims to cover.
Therefore I implemented a different workflow taking advantage of the fact that all blogs come with RSS feeds that include content and metadata. More work was needed because there are different formats for these feeds (multiple flavors of RSS, as well as Atom, and the newer JSON Feed). Luckily, libraries in multiple programming languages exist to simplify the parsing of the various RSS Feed formats (I use the Javascript library feed-extractor).
The main challenge with metadata for blog posts – and with DOI metadata more general – is author names. They might not be natural names (for example mfenner instead of Martin Fenner), might be names for organizations and not people, the blogging platform might not support multiple authors, and some work is required to include the ORCID author identifier (or ROR institutional. identifier). The Atom format supports an author URL, which can hold the ORCID ID (or ROR ID), and Wordpress can be enhanced with the popular Co-Authors Plus plugin to support multiple authors.
The other challenge with DOI registration is keeping track of the content that has already been registered, and for this I launched a database, with one record for each post. I need the database also to enable full-text search across all blog posts, something I will implement in the coming weeks.
Will all the required pieces coming together, I was finally able to start DOI registrations yesterday. You will easily detect blog posts with a DOI on the Rogue Scholar website (there is a DOI icon next to the title, and the underlying link to the blog post is a DOI):
The process of DOI registration for all included blog posts should be concluded by the end of the month. There is more work needed to resolve issues with some author names, and DOI registration can be further automated (I am currently using GitHub Actions and a cronjob).
What also needs more work is getting the DOIs displayed on the blogs (the DOIs resolve to the blog post and not the Rogue Scholar archive). This is probably straightforward when using a static site generator, but requires more work when a database is involved (e.g. Wordpress). For Ghost blogs like this one, I found the canonical_url field to be a good place to store the DOI.