Rogue Scholar full-text search improvements

Rogue Scholar full-text search improvements
Photo by saeed mhmdi / Unsplash

Two weeks ago I added a first version of full-text search to the Rogue Scholar blog archive. This was a good start, as blogs typically only have the timeline, tags, and metadata like titles and authors to help readers find relevant content. Today I launched an improved version of full-text search with these improvements:

Sorting by relevance helps with search terms that produce many hits, e.g. RDA (for Research Data Alliance). Fuzzy search helps with typos and synonyms, e.g. Proprint (for preprint), Open Scholarship (which also finds blog posts about Open Science), or Iain Hry, which finds blog posts from or about Iain Hrynaszkiewicz (who works at the Public Library of Science).

These and further improvements in the future are of course only meaningful because the Rogue Scholar is a central archive of scholarly blogs, so users don't have to go to a long list of different places. And because the Rogue Scholar has archived the full text of these science blogs, rather than only metadata or abstracts.

While the initial implementation of the full-text search built into the Postgres database that powers the Rogue Scholar backend, this new version uses Typesense, a dedicated open source search engine. Adding another layer of technology complicates the Rogue Scholar technology stack, but full-text search using the functionality built into Postgres also can be challenging for more complex use cases. Sorting search results by relevance for example is possible, but more difficult compared to a dedicated search engine such as Typesense.

Faceted search will become more important as the Rogue Scholar archive continues to grow, for example to allow filtering by language or subject area. Instantsearch is a popular open source library that supports search interfaces built directly into blogs, and that can take advantage of the Rogue Scholar full-text search.

References

Fenner M. Full-text search added to the Rogue Scholar science blog archive. Published online June 27, 2023. doi:10.53731/80awr-zcc48

Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG. Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ. 2010;340(jan28 1):c181-c181. doi:10.1136/bmj.c181

Copyright © 2023 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.