One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them.
Yesterday 60 years ago the first volume of the Lord of the Rings trilogy by J.R.R. Tolkien was published. The quote above obviously doesn’t quiet apply to scholarly publishing, but one recurring theme that I have often heard in the last few years is that of a need for a canonical digital document format for scholarly content that rules all other formats.
A few years ago almost everyone you would have said that
xml is that format, with the NLM Archiving and Interchange Tag Suite - which has evolved into JATS - probably the most commonly used Document Type Definition (DTD).
xml does many things really well, but also has important shortcomings, most importantly that it is probably not a good format for authors (and don’t tell me that
odt are XML-based). We therefore don’t really expect authors to submit manuscripts in JATS
xml, but rather convert documents into this format after a manuscript has been accepted for publication. This conversion step is often time-consuming and labor-intensive.
html has become the most interesting candidate for a canonical scholarly document format. The big advantage over
xml is that
html - or at least
html5 which is most popular today - is an attractive format for online authoring tools (that is why
html is listed both as input and output format) The downside of this flexibility is that it is much harder to embed structure and metadata into
html5 compared to
xml. There are initiatives such as schema.org and HTMLBook that hope to change that, but we aren’t quite there yet.
Or maybe we should learn from Tolkien and give up on the idea of a canonical document format and rather spend our energy on building tools that make it easier to transition from one format to another. Pandoc is such as tool, but can’t do all the required conversions, e.g. it can’t yet use
docx as input. The downside here is that every file conversion runs the risk of loosing important information. But the increase in flexibility hopefully outweights these shortcomings.
ORCID has launched. What’s next?
Last week has been busy. I went to Berlin for the launch of the Open Researcher & Contributor ID (ORCID) service. ORCID allows researchers to obtain a persistent identifier that can be used to claim publications and other scholarly works. ...
Support open source software as a GitHub sponsor
Two years ago GitHub introduced the ability to sponsor an open source contributor – person or organization. They handle (and pay for) the payment logistics for a one-time or regular contribution. A blog post from June 2019 describes the thinking of the ...
The Trouble with Bibliographies
The bibliography of a scholarly paper is interesting and important reading material. You can see whether the authors have cited the relevant literature, and you often find references to interesting papers you didn’t know about. ...