In the comments on Monday’s blog post about the Markdown for Science workshop, Carl Boettiger had some good arguments against the proposal for how to do citations that we came up with during the workshop. As this is a complex topic, I decided to write this blog post.
Citations of the scholarly literature are an essential part of scholarly texts and therefore have to be supported by scholarly markdown. Both the Pandoc and Multimarkdown flavors of markdown support citations, using a bibtex file that contains citations, placeholders for citekeys –
[@smith04] for Pandoc and
[#smith04] for Multimarkdown – and the Citation Style Language for citation formatting (Pandoc). A very reasonable approach would therefore be to use this functionality, with a preference for Pandoc because of the Citation Style Language support. All reference managers can export to the bibtex format, and some of them (e.g. Papers) make it very easy to copy and paste citekeys.
Ten days after the workshop I’m not so sure anymore this is the best approach. For four reasons:
[@smith04], and it is too easy get out of sync with the reference manager. She therefore prefers to put the complete reference information into the text while writing.
Part of the motivation for using scholarly markdown is that we can come up with best practices that make sense for digital content and don’t need to support conventions from an era when articles were still printed on paper. Reference information in the form of volumes and pages, and 1000s of citation styles certainly have outlived their purpose. Citation styles are a particular pain point, as they are nothing more than a visual representation of a citation - we should care much more about the machine-readable metadata, in particular the DOI or other identifier.
The best practice for scholarly markdown could therefore be to treat citations as links, using DOIs or other standard identifiers (PMID, ArXiV, etc.) where possible. Because we typically want to list the citations as references at the end of the document, reference-style links should be preferred over inline links. From the markdown syntax documentation:
This is [an example][id] reference-style link. This is [an example](http://example.com/ "Title") inline link. [id]: http://example.com/ "Optional Title Here"
It might be tempting to use sequential numbers as id for the reference-style links, but the order of links can of course change during writing. It may make sense to think of the id in reference-style links as a citekey, and people should be free use that functionality of their reference manager. The citekey is used to link to the reference list at the bottom of the document, different from linking to the citekey in a separate bibtex file.
All of the above can be done in any text editor. This also includes the text editor that scholars spend most of their time with - their email program. Reference-style citations in an email are very readable, and also actionable since they are links and not text with bibliographic information.
One problem with this approach is of course that all links are inline in the resulting HTML, without a references section at the end of the document. This may be fine, as we can provide citation information in the title attribute, available upon hovering over the link (try hovering over this link, the journal eLife is doing something similar). The markdown could look like this (using the Vancouver citation style):
[@Ioannidis2005]: http://dx.doi.org/10.1371/journal.pmed.0020124 "Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Medicine. Public Library of Science; 2005;2(8):e124. Available from: http://dx.doi.org/10.1371/journal.pmed.0020124"
The title attribute now of course uses a citation style, but this is optional information and can easily be reformatted as we have the DOI.
Or we break away from standard markdown and display reference-style links at the end of the document - similar to footnotes, which are also not part of standard markdown. But this is just a display issue that can be solved, and the solution might look different depending on whether the output is HTML, PDF or XML. This document for example contains 14 reference-style citations.
There is obviously a need for tools that make adding citations to scholarly markdown easier. This could be accomplished by relatively small changes to existing reference managers (enabling copy/paste of citations in reference-style markdown format), or by tools similar to the knitcitations and kcite mentioned above.
Making the most out of available Metadata
Metadata are essential for finding, accessing, and reusing scholarly content, i.e. to increase the FAIRness [Wilkinson et al. (2016)] of datasets and other scholarly resources. A rich and standardized metadata schema that is widely used is the first step, ...
Introducing the PID Graph
Persistent identifiers (PIDs) are not only important to uniquely identify a publication, dataset, or person, but the metadata for these persistent identifiers can provide unambiguous linking between persistent identifiers of the same type, e.g. ...
Digging into Metadata using R
In the first post of this new blog a few weeks ago I talked about Data-Driven Development, and that service monitoring is an important aspect of this. The main service DataCite is providing is registration of digital object identifiers (DOIs) ...