My contribution to Open Access Week 2024

My contribution to Open Access Week 2024

From October 21 to 27 is International Open Access Week 2024, and this blog post summarizes my contribution for 2024.

Open Access Week 2024 will continue the call to put “Community over Commercialization” and prioritize approaches to open scholarship that serve the best interests of the public and the academic community.
From Open Access Week Website

The title of this post was taken from a blog post by my friend and colleague Heinz Pampel on Monday, and we again have many events related to International Open Access Week this week, as well as some blog posts. This brings me to Rogue Scholar, the science blog archive I launched in 2023 which is the main focus of my work. Just in time for Open Access Week 2024, I am happy to announce that I have migrated the metadata and full-text content of more than 17K blog posts from 100+ participating science blogs to the InvenioRDM open source repository platform and made all content available via full-text search. Rogue Scholar powered by InvenioRDM will launch to production on November 4, and the beta at https://beta.rogue-scholar.org has all metadata and content, and most of the functionalities of the current Rogue Scholar website.

Rogue Scholar previously provided full-text search but this is a new functionality for the InvenioRDM platform that powers a growing list of repositories, including Zenodo at CERN where the software was started. Providing full-text search is difficult for primarily three reasons, all very relevant for Rogue Scholar and science blog posts:

  • Distributed search for content hosted in many different places is technically very difficult and thus limited to a small number of very big organizations such as Google. One solution is to aggregate distributed full-text content in a central repository, which is for example done by PubMed Central for the biomedical literature. Rogue Scholar provides a central place to search for science blog posts, with content published in many different places and platforms.
  • Access to full-text content is not only technically difficult but also a legal challenge, as a lot of scholarly content is still not available with an open license. That is the main reason that full-text search is available in PubMed Central only for a subset of PubMed (currently 10.3 out of 37 million publications). All content in Rogue Scholar is available via a Creative Commons CC-BY license, or in some cases with a Creative Commons CC0 waiver.
  • Scholarly content can be made available in many formats, some of which are not a good fit for full-text search. This includes research data but also formats such as PDF, which is a big challenge for preprint serves such as bioRxiv, but also other InvenioRDM instances such as Zenodo. Formats such as HTML and JATS XML also have challenges because the tags break text strings. All full-text content in Rogue Scholar is available in markdown format and has been used by Rogue Scholar full-text search since July 2023.

With full-text search in place, let's try some queries with the InvenioRDM-powered Rogue Scholar Beta. To query the full-text content, use the field contentand put the query string Open Access Weekin quotes to only find blog posts with the exact string, not all blog post that contain Open or Access or Week: content:"Open Access Week". This query today returns 139 hits from several blogs participating in Rogue Scholar.

You can do the same query in the current Rogue Scholar. The results are slightly different (126 hits today), as InvenioRDM uses different technologies for full-text search (OpenSearch vs. Typesense), and they can be configured in many different ways. One new functionality that InvenioRDM provides (also supported by Typesense but not implemented in the current Rogue Scholar) is faceted search. You can filter the results by language or subjects and you can sort them not only by best match, but also by publication date or time last updated. You find several posts from 2009 (the first year of International Open Access Week) by Heinz Pampel, Mike Taylor, Europe PMC, and myself. In 2009 I was working as medical oncologist and cancer researcher at Hannover Medical School and wrote about my researcher perspective on Open Access.

You can use full-text search in all kinds of ways via the user interface and InvenioRDM API. One typical use case is to search for specific text strings not available in the metadata, e.g. people, places, or unique identifiers. Searching for Christiane Nüsslein-Volhard (who won the Nobel Prize in Medicine in 1995) finds a blog post I wrote in 2008 when I attended the International Conference of Genetics where she spoke.

Another example would be a search for the string Sussex Open Access Week, which brings up a blog post by Martin Eve from September 2010, where he announced that he will be speaking about Open Access at the University of Sussex.

Searching for a specific paper via its DOI mentioned in the full-text (not necessarily the metadata) finds a 2013 blog post from Scott Chamberlain explaining how you can use the Public Library of Science (PLOS) API and the R programming language for programmatic access to PLOS full text.

You can also search for specific strings within the full-text, for example content:"Panel discussion: PIDs in different communities". What is special about the blog post found (the program for the final event of the grant-funded THOR Project in which I also participated) is that the project website is no longer publicly available (the grant ended seven years ago), so the DOI resolves to the copy archived by the Internet Archive, as Rogue Scholar uses their ArchiveIt service since November 2023.

Together with the full-text search this week I also refined how you can use persistent identifiers to search for blog posts in the Rogue Scholar Beta. You can now search for blog post DOI (doi:10.59350/rqp9y-vqk63, the oldest Rogue Scholar post from 2003), ORCID (orcid:0000-0003-1419-2405, my ORCID, use the ORCID of any author Rogue Scholar author), or ROR (ror:008zgvp64, PLOS, 30% of Rogue Scholar blog posts have an author with affiliation with ROR, I worked at PLOS from 2012-2015).

Extracting the full-text content from science blog posts via their RSS feeds is specific to Rogue Scholar and a lot of work, but the functionalities mentioned in this post (full-text search, search via persistent identifier) are implemented via the InvenioRDM configuration (the invenio.cfg file), and can be used and adapted by other InvenioRDM installations. In fact, the streamlined persistent identifier search in Rogue Scholar Beta modifies part of the Zenodo configuration.

Another functionality that the migration to InvenioRDM brings to Rogue Scholar is communities, which I explained in a blog post two weeks ago. Rogue Scholar uses them to aggregate all posts from a blog together, but also started to explore topic communities. Open Access Week is a great topic to aggregate Rogue Scholar blog posts, and you can the community via this new shortcut: communities:oaweek (again configuration taken from Zenodo).

This is an improvement over Open Access Week last year, when I asked Heinz Pampel to write a blog post aggregating interesting blog posts talking about Open Access Week.

I think the work reported in this blog post aligns well with the theme of this year's Open Access Week: Community over Commercialization. Happy Open Access Week!

References

Pampel, H. (2024, October 20). Our Contribution to Open Access Week 2024. Research Group Information Management @ Humboldt-Universität Zu Berlin. https://doi.org/10.59350/9n4et-tkf55

Fenner, M. (2023, July 10). Rogue Scholar full-text search improvements. Front Matter. https://doi.org/10.53731/6r1dx-wdp04

Taylor, M. (2009, October 20). Futalognkosaurus was one big-ass sauropod. Sauropod Vertebra Picture of the Week. https://doi.org/10.59350/b7eda-df395

Europe PMC Team. (2009, October 19). Wellcome Trust calls for greater transparency from journals on open access publishing costs. Europe PMC News Blog. ,

Fenner, M. (2009, October 18). Open Access Week: A researcher’s perspective. Front Matter. https://doi.org/10.53731/r294649-6f79289-8cw3b

Fenner, M. (2008, July 13). In which I became a conference blogger. Front Matter. https://doi.org/10.53731/r294649-6f79289-8cwat

Eve, M. P. (2010, September 28). Speaking of Open Access... Martin Paul Eve. https://doi.org/10.59348/ex3tn-g7q84

Chamberlain, S. (2013, October 22). OA week—A simple use case for programmatic access to PLOS full text. rOpenSci - Open Tools for Open Science. https://doi.org/10.59350/w4jr7-pmt65

Duine, M. (2017, October 12). THOR Final Event programme is out! Project THOR. https://doi.org/10.59350/hnegw-6rx17

Fenner, M. (2023, October 30). Starting November, all Rogue Scholar blog posts will be archived by the Internet Archive. Front Matter. https://doi.org/10.53731/hhtx0-wb293

Fenner, M. (2024, October 7). Rogue Scholar learns about communities. Front Matter. https://doi.org/10.53731/dv8z6-a6s33

Pampel, H. (2023, October 24). The Open Access Week in the Scholarly Blogosphere. Syldavia Gazette. https://doi.org/10.53731/xs2mj-epe20

Copyright © 2024 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.