Today, the Rogue Scholar science blog archive launched an important new feature: showing the full-text content (in addition to metadata) of all participating blogs on blog post pages. Rogue Scholar has always stored the full-text internally and made it available via the REST API, as the full-text is needed for archiving and full-text search.

The display of full-text content on Rogue Scholar blog post pages gives blog authors immediate feedback on how their blog posts look outside of their blogging platform and how they will be archived. This is especially important for included images and advanced metadata such as references.

Screenshot of https://rogue-scholar.org/records/macsk-y9124

The references that Rogue Scholar detects and registers with the Crossref metadata for the post are shown on the same page, giving immediate feedback.

Screenshot of https://rogue-scholar.org/records/macsk-y9124

The display of images helps detect broken image links (images are not yet stored with Rogue Scholar) and wrong image sizes. It also shows best practices such as providing figure legends and alt text (both missing in this example):

Screenshot of https://rogue-scholar.org/records/5dgfh-cdh66

The display of the full-text also gives quick feedback on full-text search results, e.g. for the term Xanadu. Future versions might also show the highlights returned by the Opensearch search index.

This new feature requires re-indexing of all blog posts, as full-text is now stored in HTML instead of markdown format (as both RSS feeds and Rogue Scholar use HTML). About 70% of blog posts are already processed; the remaining posts will be stored as HTML full-text until the end of this week.

With nearly 150 participating blogs, not all formatting edge cases, mostly around custom CSS, could be addressed initially. I hope to resolve the outstanding issues by the end of May. Please reach out via Slack or email if you find an issue with the HTML full-text.

Older content such as the first post on my personal blog from August 2007 has seen several platform migrations (four in this case), so archiving content independent of any platform-specific formatting is important. I hope that the display of the full-text helps with that goal. One positive additional outcome could be that migrating to a different blogging platform becomes easier.

References

  1. Fenner, M. (2014, March 3). Six Misunderstandings about Scholarly Markdown. Front Matter. https://doi.org/10.53731/r294649-6f79289-8cw0j
  2. Fenner, M. (2020, August 27). DataCite Commons—Exploiting the Power of PIDs and the PID Graph. Front Matter. https://doi.org/10.53731/kx45q-14h82
  3. Fenner, M. (2007, August 3). Open access may become mandatory for NIH-funded research. Front Matter. https://doi.org/10.53731/r294649-6f79289-8cw1q