Every Rogue Scholar blog post now available in Markdown, ePub, and PDF formats

Every Rogue Scholar blog post now available in Markdown, ePub, and PDF formats

The Rogue Scholar science blog archive starts 2024 with an important release: all blog posts (more than 13,000) are now available for download in Markdown, ePub, and PDF formats. This builds on work completed in December to store the full text of every Rogue Scholar blog post in Markdown format in the Rogue Scholar backend. Combined with the metadata in YAML format, these posts can now be downloaded via the Rogue Scholar API, e.g. https://api.rogue-scholar.org/posts/10.53731/5vvnh-tnm55?format=markdown.

This format is compatible with all static site generators used by Rogue Scholar blogs, including Hugo, Jekyll, and Quarto. And it can be used to import blog posts into blogs powered by a database, including Wordpress and Ghost. This addresses an important Rogue Scholar use case: making it easier to migrate from one blogging platform to another. More work is needed to update the Rogue Scholar documentation, and to enable downloading content and metadata in bulk, i.e. all posts from a blog.

The availability of content and metadata in easily downloadable form enables another important use case, described by Ross Mounce in November:

Resilience: another advantage of openly-licensed content
As I allude to in the title of this post. The key differentiators here in open access systems, that make open access more resilient to downtime overall are a) availability on multiple independent platforms and b) open licencing that makes it legally easier to host content in multiple places.

This nicely complements the integration with the Internet Archive that Rogue Scholar started in November, archiving all posts from blogs participating in Rogue Scholar via the Archive-It service. The Rogue Scholar API makes it easy to programmatically download content and metadata from scholarly blog posts, and all content is available with an open license (CC-BY) that enables unrestricted reuse.

One limitation of the Markdown format is that for most users it is not the preferred format to read content. Markdown is typically used to safely render content on the web, but for reading scholarly content stored on their computers, most users prefer PDF. That is why Rogue Scholar also offers blog posts converted to PDF format, using the Pandoc universal document converter. An important use case is storing important blog posts in a reference manager. Storing the metadata has always been supported by Rogue Scholar, but now you can also easily attach the full text in PDF format and for example add annotations.

Reading Rogue Scholar posts in Zotero

Automated PDF generation is not trivial, and the initial implementation still has issues. Some PDFs are not generated properly, and the default layout can be improved. One major challenge is the images attached to blog posts - they are still stored with the respective blogs.

More fundamentally PDF has two important major limitations: it is an output format and can't be used to convert to other formats, and it doesn't easily adapt to smaller screen sizes. ePub overcomes both these limitations and therefore Rogue Scholar also provides an ePub version of every blog post, again using Pandoc. As with PDF, there is a lot of work to do to improve the look and feel of the ePub output, and to fix bugs. Look for many small improvements in the coming three months – and please provide feedback via comments and issues at the Rogue Scholar API open source repository. The popular open source reference manager Zotero is adding ePub support in its upcoming version 7, available in Zotero 7 Beta since May 2023, and thus ePub can also be used to store the full-text content of Rogue Scholar blog posts.

Reading Rogue Scholar posts in Zotero (ePub version)

Blog posts in ePub format have been discussed on this blog before. In February 2011 I released a Wordpress plugin that turns blog posts into ePub files. Has the time for ePub finally arrived or will PDF remain the preferred format for (shorter) scholarly documents? Or should Rogue Scholar support other output formats, e.g. Journal Article Tag Suite (JATS), the standard format for scholarly articles?

References

Fenner, M. (2023). Archiving individual science blog posts. https://doi.org/10.53731/5vvnh-tnm55

Mounce, R. (2023). Resilience: Another advantage of openly-licensed content. https://doi.org/10.59350/psmbr-f6p84

Fenner, M. (2023). Archiving Rogue Scholar blogs with the Internet Archive. https://doi.org/10.53731/g60vh-3ng48

Fenner, M. (2011). ePub WordPress plugin released today. https://doi.org/10.53731/r294649-6f79289-8cw76

Copyright © 2024 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.