Guidelines for Scholarly Blogs

These guidelines are recommendations for authors of scholarly blogs to help with long-term archiving, discoverability, and citation of blog content.
They are modeled after the publication A Data Citation Roadmap for Scholarly Data Repositories, where many of the same guidelines apply, and where I was the first author and co-chair of the corresponding Force11 working group.

These guidelines focus on the required or recommended work for scholarly blog authors. For scholarly blog archives such as the Rogue Scholar, additional guidelines are in development.

Level	#	Guideline
Required	1	The full-text content must be made available via public RSS feed (in RSS, Atom or JSON Feed format).
Required	2	Each blog post in the RSS feed must have a title, author(s), and publication date.
Required	3	Each blog post must have a URL that resolves to a public landing page specific for that blog post.
Required	4	The full-text content must be made available via a Creative Commons Attribution (CC-BY) license.
Required	5	The blog must provide documentation about long-term archiving, discoverability, and citation.
Recommended	6	Each blog post in the RSS feed should have a persistent identifier, description, language, and last updated date.
Recommended	7	The landing page should include metadata required for citation, and ideally also metadata facilitating discovery, in human-readable and machine-readable format.
Recommended	8	The machine-readable metadata should use schema.org markup in JSON-LD format.
Recommended	9	Metadata should be made available via HTML meta tags to facilitate use by reference managers.
Recommended	10	Metadata should be made available for download in BibTeX and/or another standard bibliographic format.

The requirement for full-text content via RSS feed and with a CC-BY license comes from the need to make archiving and indexing as simple (and cheap) as possible. Dealing with multiple licenses, private feeds, and private content adds an extra level of complexity and is not supportive of Open Science.

Metadata via HTML meta tags and JSON-LD (using schema.org markup) are two main strategies to embed metadata in web pages, to support reference managers but also indexers. Schema.org is simpler to work with, e.g. for more complex author information such as separate given and family names, author identifiers such as ORCID, and affiliation information. On the other hand, reference managers and Google Scholar currently use HTML meta tags, and it is sometimes easier to add this information to a blog.

Registration of DOIs as other persistent identifiers for blog posts is something that I want to provide via the Rogue Scholar archive, as the effort required is not trivial. The information required (mainly title, author(s), publication date, and URL) is readily available via the RSS feed. Of course, displaying these DOIs on the blog is recommended, and for the DOIs to resolve to the blog itself rather than the blog archive at the Rogue Scholar or elsewhere.

The recommended or optional metadata for science blog posts is of course a big topic that needs more discussion. Description, language, and last updated date seem desired and readily available. References used in blog posts would be fantastic to be included in the metadata, but there is currently no easy and standard way of doing this. For better discoverability, it would make sense to provide geo coordinates and/or temporal information, and all blogs would benefit from using subject classification such as the OECD Fields of Science and Technology, but all this would require significantly more effort.

These guidelines are a work in progress and are made available as part of the Rogue Scholar Documentation. Feedback is greatly appreciated.