Archiving Rogue Scholar blogs with the Internet Archive

Archiving Rogue Scholar blogs with the Internet Archive
Image via Commons Knowledge Blog.

Blogs participating in the Rogue Scholar science blog archive are now archived in the Internet Archive. Starting November 1st, Rogue Scholar is participating in the Internet Archive Archive-It service and all archived blogs can be found here. Archiving of all blog posts, associated HTML pages, and media will automatically happen every six months, with the first round of archiving well underway.

Rogue Scholar science blog archived via the Internet Archive Archive-It service

The URLs of the archived blog posts are stored in the Rogue Scholar service and made available via the Rogue Scholar API. If a blog is no longer available via the public internet, Rogue Scholar can update the DOI metadata to point to the archived version. This is the theory, but a lot of hard work is needed to make the different pieces work seamlessly together.

One important element is regularly checking all Rogue Scholar URLs (10,558 as of today) and making sure they resolve to the expected blog posts. Archiving with the Internet Archive Archive-It service captures more than one version of a blog post, so the Rogue Scholar service needs to determine which version to treat as the archived version when the blog post is no longer available via the public internet. Another challenge is the links in archived posts, particularly image links. They are included in the standard archiving settings, but more checks are needed to understand what exactly is archived, in particular content not hosted on the science blogs server. I expect this process to take until the end of the year.

Archiving costs money. Prices for the Internet Archive Archive-It service are very reasonable (800 USD per year for the basic service, archiving up to 256 GB), and Rogue Scholar is not charging extra for the new service (still a one-time fee of one USD per blog post, with up to 50 free blog posts per year). But donations to Rogue Scholar (via the "Buy me a coffee" link in the menu bar) are always appreciated.

More science blogs joining the Rogue Scholar service – use the "Sign in" link in the menu bar to register your blog – are of course also appreciated. The latest blog to join Rogue Scholar – Everything is Connected by Ernesto Priego – includes the oldest blog post in Rogue Scholar yet: Day Barry White died published in July 2003, more than 20 years ago.

References

Fenner, M. (2023). Starting November, all Rogue Scholar blog posts will be archived by the Internet Archive. https://doi.org/10.53731/hhtx0-wb293

Copyright © 2023 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.