Rogue Scholar full-text search improvements
Two weeks ago I added a first version of full-text search to the Rogue Scholar blog archive. This was a good start, as blogs typically only have the timeline, tags, and metadata like titles and authors to help readers find relevant content. Today I launched an improved version of full-text search with these improvements:
- Sorting of search results by relevance
- Support for fuzzy search or approximate string matching
- Added the backend for an updated search interface supporting faceted search and/or InstantSearch
Sorting by relevance helps with search terms that produce many hits, e.g. RDA (for Research Data Alliance). Fuzzy search helps with typos and synonyms, e.g. Proprint (for preprint), Open Scholarship (which also finds blog posts about Open Science), or Iain Hry, which finds blog posts from or about Iain Hrynaszkiewicz (who works at the Public Library of Science).
These and further improvements in the future are of course only meaningful because the Rogue Scholar is a central archive of scholarly blogs, so users don't have to go to a long list of different places. And because the Rogue Scholar has archived the full text of these science blogs, rather than only metadata or abstracts.
While the initial implementation of the full-text search built into the Postgres database that powers the Rogue Scholar backend, this new version uses Typesense, a dedicated open source search engine. Adding another layer of technology complicates the Rogue Scholar technology stack, but full-text search using the functionality built into Postgres also can be challenging for more complex use cases. Sorting search results by relevance for example is possible, but more difficult compared to a dedicated search engine such as Typesense.
Faceted search will become more important as the Rogue Scholar archive continues to grow, for example to allow filtering by language or subject area. Instantsearch is a popular open source library that supports search interfaces built directly into blogs, and that can take advantage of the Rogue Scholar full-text search.
References
Fenner M. Full-text search added to the Rogue Scholar science blog archive. Published online June 27, 2023. doi:10.53731/80awr-zcc48
Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG. Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ. 2010;340(jan28 1):c181-c181. doi:10.1136/bmj.c181
Comments ()