Building Blocks for a Scholarly Blog Archive

Building Blocks for a Scholarly Blog Archive
James Brown. Photo by Roger Woolman, CC BY 3.0, via Wikimedia Commons

Another follow-up post, extending three earlier posts (see references), on the Scholarly Blog Archive that Front Matter is building and that I plan to launch in the first half of 2023. I have been thinking about the building blocks that make this blog archive work:

Diamond Open Access

Diamond open access (OA) is an open access business model in which no fees are charged to either authors or readers. German Research Foundation

Using this term sounds strange in the context of scholarly blog posts, but it means that scholarly blog infrastructure should be free to publish and free to read. One challenge with Open Access for publications, particularly in disciplines such as medicine and life sciences where there is a lot of money, is that there are no drivers for driving down cost, and subscription fees have often been converted to article processing charges (APC). And instead of technological advances making scholarly publishing cheaper over time, the costs for authors and readers (and their institutions and funders who ultimately pay for this) are only increasing.

There is of course already a lot of Diamond Open Access, and infrastructures for research data and research software also typically don't charge authors or readers. This causes other problems in terms of sustainable scholarly infrastructure and innovation, but I think it is an essential building block for the science blog archive Front Matter is building. A lot of work is needed in 2023 to come up with a strategy for sustaining the Front Matter science blog archive in the long run, all I can say now is that it will not use advertising.

Creative Commons License

For content that is free to read we need a license that specifies that. The blog archive needs clear conditions for what it can do with the content, and the same is true for downstream users and services. History tells us that licenses should be clear and simple, so for scholarly blog posts I will aim to use the Creative Commons Attribution 4.0 License (CC-BY 4.0) for all content.

Central Blog Archive

As I explained in a post last week, a central blog archive for blog content published in many different places makes the most sense for science blog posts – a model also used by PubMed Central for a free full-text archive of biomedical and life sciences journal articles. The InvenioRDM Open Source software is a good fit for this use case.

Starting a science blog is straightforward. There are plenty of cheap and free options available from Wordpress to GitHub Pages. You might run your blog as part of a larger platform, together with collaborators, or all for yourself.

Digital Object Identifier (DOI) and Metadata

DOIs are frequently used as persistent identifiers for scholarly content and are integrated into the InvenioRDM platform. The blog archive can either archive blog posts with DOIs, or it can issue DOIs for existing blogs not using DOIs. In the latter case it is important that the DOI resolves to the original content in the hosting blog platform, and redirects to the blog platform only when the original blog is no longer available.

DOIs (e.g. from DataCite or Crossref) have a required set of metadata that makes sense for scholarly blogs. Optional metadata that are desired for the blog archive are license (see above), abstract, subject area (using the 43 OECD Fields of Science and Technology), keywords, language, and persistent identifiers for the blog (ISSN), author (ORCID) and affiliated institution (ROR).

Rich Site Summary (RSS)

RSS is the standard protocol for distributing and consuming blog content. It is actually a group of protocols (Atom and multiple flavors of the RSS format), but they have been around for so long that the popular tools and services support the various protocols. RSS will be the standard way how content is ingested by the blog archive, and probably also how in turn content in the central blog archive is consumed, e.g. as an automated feed of all new science blog posts in a particular subject area and language.

Because RSS is so widely supported, other ways of registering content – e.g. via web form, API, or webhook – are less critical for the blog archive. Work is needed on the InvenioRDM software to add strong support for RSS feeds, but would allow the automation of a lot of the work needed to build and maintain the blog archive.

Markdown and PDF

Markdown is a markup language popular with many blogging platforms. It is typically used for editing blog posts and other documents in online environments but is not really used for consuming blog content via RSS. Markdown has been extended to support features needed for scholarly documents, e.g. tables and references, but the uptake of this added functionality in science blogs has been slow.

PDF is commonly used for reading scholarly publications. The workflows for submitting manuscripts to journals and preprint archives in PDF format are broken because it is tricky to extract structured documents from PDFs. The blog archive will support PDF as an output format at some point but is not a high priority. Blog posts are typically consumed via blog reader or email (if the blog produces a newsletter) rather than as PDF printed out on paper. There is work needed on the InvenioRDM platform to display full-text content rendered as HTML.

Curation and Community

Science blog posts typically see a lightweight review workflow before publication, and often receive feedback in the form of comments and/or social media mentions. For the Front Matter science blog archive, I want to keep that approach and not build any hurdles for inclusion. Some level of curation is needed, not only to check for quackery and hate speech but also to improve metadata that help with discovery, and to find blogs that should be included. Ideally we can build a community around the science blog archive, taking advantage of the communities (focussing on different languages and subject areas) feature recently added to the InvenioRDM software.


If reading this post feels like it is 2006 – the year James Brown (used for the feature image of this post) died – again with talk about blogs, RSS, Markdown, Creative Commons, and related technologies (I for example didn't mention Zotero, XML, or Wordpress), you are right. This is intentional, these technologies are not as sexy as using artificial intelligence or cryptocurrencies to drive this, but I want the Science Blog archive to become a scholarly resource that is useful, open, and inclusive.


Fenner, M. (2022, September 28). Starting Work on the Front Matter Archive. Front Matter.

Fenner, M. (2022, December 12). Building an archive for scholarly blog posts. Front Matter.

Fenner, M. (2022, December 19). Launching the Front Matter Roadmap. Front Matter.

Fenner, M. (2010, October 6). Beyond the PDF – it is time for a workshop. Front Matter.

Fenner, M. (2013, June 19). Citations in Scholarly Markdown. Front Matter.

Copyright © 2022 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.