Dog food, persistent identifiers, and metadata

Dog food, persistent identifiers, and metadata
Photo by M Burke / Unsplash

I am a big fan of dog food, and I wrote about this topic already seven years ago:

Eating your own dog food is a slang term to describe that an organization should itself use the products and services it provides.

One of the major projects I am working on right now is the Rogue Scholar science blog archive that launched at the beginning of the month. As part of this work – but also because I am very interested in this – I read a lot of science blogs. And today I released an update of the Rogue Scholar that makes this easier.

Persistent identifiers for science blogs

People who know me know that I care about persistent identifiers for scholarly resources. I have worked for seven years for DataCite, a DOI registration to register datasets, software, and other non-textual resources. I was involved in the launch of ORCID (identifiers for researchers) in 2012 and ROR (identifiers for research organizations) in 2019. So it shouldn't surprise anyone that I am officially announcing the Rogue Scholar identifier for science blogs today. Each blog that has registered with the Rogue Scholar is uniquely identified, e.g.

Persistent identifiers should not have any semantic meaning (e.g. the blog name) in them, as names can change over time. And they should not be linked to a domain name, (e.g. upstream.force11.org) as those might also change. The Rogue Scholar identifier uses a 7-digit random string generated by the base32 algorithm and a two-digit checksum (the Front Matter identifier for example was generated with the random number 16127113320). DataCite, ROR, and the repository Zenodo use similarly constructed unique identifiers. Their main advantage over UUIDs is that they are easier to handle because of their compact size – there are still more than three billion unique strings for the Rogue Scholar identifier. Finally, persistent identifiers should be actionable, which means expressed as URLs that a human or machine can follow.

Why did I not use International Standard Serial Numbers (ISSNs), well-established identifiers that also work for blogs (the Front Matter blog has ISSN 2749-9952)? Why ISSN registration can be easy and cheap, registration can become an issue, especially for new blogs that are just beginning to publish. And ISSNs have only the most basic metadata (e.g. title, country). And why not use digital object identifiers (DOIs)? They have traditionally been used for scholarly outputs such as journal articles, datasets, and blog posts. While you can register DOIs for serials such as journals, conference proceedings, or blogs, there is currently no standard practice to do so.

Metadata for science blogs

Persistent identifiers are not really useful without meaningful metadata. For science blogs, this means at least the following:

  • Blog name
  • Blog short description
  • Blog URL
  • Alternate identifiers, e.g ISSN and/or DOI
  • Blog editor(s)
  • License for the content, e.g Creative Commons Attribution (CC-BY)
  • Subject area(s) for the content, e.g. aligned with the OECD Fields of Science and Technology

For the blogs participating in the Rogue Scholar, I am collecting this information and will make it available in the Rogue Scholar search. To not start from scratch, I am using the metadata available from most blogs via RSS or Atom feed. For some information, e.g. license or subject area, I need to ask additional questions to the blog editor.

RSS and Atom both use XML, rather than JSON, which is much more pleasant to work with. Therefore – after the initial conversion of RSS or Atom XML – I can use JSON Feed to describe blog metadata, and the format can be extended to the needs of the Rogue Scholar. To fetch the JSON Feed of a blog included in the Rogue Scholar, use the identifier. Either by appending .json to the identifier (e.g. https://rogue-scholar.org/h56tk29.json) or by entering the identifier (https://rogue-scholar.org/h56tk29) in your RSS reader. The reader will automatically find the JSON Feed via the link tag in the page header:

<link rel="alternate" title="Jabberwocky Ecology" type="application/feed+json" href="https://rogue-scholar.org/h56tk29.json"/>

The RSS Reader (assuming it supports JSON Feed, as most readers do) will subscribe you to the JSON Feed of the blog, simplifying the reading of science blogs. More work is needed to polish the RSS/Atom Feed conversion to JSON Feed done by the Rogue Scholar and streamline subscribing to multiple blogs at once, e.g. using OPML.

JSON Feed can also be used for the metadata and content of blog posts, so again I don't need to use XML, e.g. Journal Article Tag Suite (JATS). For blog posts, I will continue to use DOIs, as they work well, and I am making progress with Rogue Scholar integration (see for example this blog using DOIs already: https://rogue-scholar.org/f4wdg32)

Bringing everything together

How does the above help with finding, reading, sharing, or otherwise reusing science blogs? The work released today should make it easier to find interesting science blogs via the Rogue Scholar and subscribe to them via your RSS reader of choice. Over time we will hopefully see evolving community standards regarding blog persistent identifiers and metadata, following the FAIR Principles, while at the same time pushing hard for Diamond Open Access, keeping the cost and technical complexity affordable.

Copyright © 2023 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.