The Rogue Scholar science blog archive launched two new features today: GUIDs (globally unique identifiers) and support for OECD Fields of Science and Technology.
Globally unique identifiers (GUIDs)
GUIDs are used to globally identify a blog post and are part of the RSS, Atom, and JSON Feed specifications. They can be the same as the link
(or URL), but don't have to – the Ghost blogging platform for example uses an identifier string and Blogger uses a combination of tags. Ideally, they don't change over time. They are internal identifiers generated by the blogging platform, so related but different from digital object identifiers (DOIs).
Until this release, Rogue Scholar was using the blog post URL to identify a blog post, but I have repeatedly run into issues with duplicate posts, because URLs for the same content can easily differ, e.g. with or without query parameters such as utm-medium,
using http
or https
, or with or without trailing slashes. And despite spending significant time on URL normalization, first in Javascript and now with the new API in Python.
All Rogue Scholar blogs have an RSS, Atom, or JSON Feed and thus a GUID (called ID
in Atom and JSON Feed), and in contrast to the URL the GUID is never changed by the Rogue Scholar platform, and ideally does not change over time (e.g. when switching blogging platforms).
OECD Fields of Science and Technology
Since its launch, Rogue Scholar supported the classification of blogs using the 43 OECD Fields of Science and Technology. The main subject area of the blog can be picked by the blog maintainer in the Rogue Scholar configuration for the blog, and since last week all 43 subject areas have been translated into the six languages currently supported by Rogue Scholar.
New this week is the addition of the subject classification to individual blog posts. The classification is shown for each post and can be used for filtering search results, e.g. only showing blog posts in chemistry (trigger filtering by clicking on the subject area label):
The category filter can of course be combined with a language filter and/or a query string.
In the current release, all blog posts use the category given to the blog. In a few months, I hope to independently classify each blog post, as some blogs cover more than one subject area. This will be done using machine learning, which is faster and typically more accurate than classification by the author or an editor.