Citation Style Language: Interview with Rintze Zelle and Ian Mulvany

Citation styles are one of the greater mysteries for the novice manuscript writer. There are numerous ways that authors, title, journal, etc. can be arranged and formatted (see examples below), and in bibliographies citations can be ordered either alphabetically or by order of appearance in the text.

Laemmli UK. Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4. Nature. 1970;227:680-685.

U. K. Laemmli (1970). ‘Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4′. Nature 227(5259):680-685.

U. K. Laemmli, Nature 227, 680 (1970).

Because of this complexity, it has long become impractical to format citations manually, and formatting of citations and bibliographies is one of the main reasons for using reference management software. I interviewed Rintze Zelle (scientist and open source contributor) and Ian Mulvany (vice president of new product development at Mendeley) to better understand citation styles in general and the open source Citation Style Language (CSL) in particular. CSL co-developers Bruce D’Arcus and Frank Bennett provided important feedback.

1. What is the Citation Style Language?

Rintze Zelle: Scientific literature depends heavily on proper referencing. However, when writing a manuscript, manually editing citations and bibliographies is time consuming and error prone. Citation styles also differ between scientific journals, so authors often have to switch citation styles when they submit their manuscript to a different journal than originally anticipated.

The Citation Style Language (CSL) is an open XML based language meant to automate the formatting of citations and bibliographies. When provided with the metadata (title, year, authors, etc.) of the cited items (journal articles, books, etc.), and a CSL style, a CSL processor can automatically generate the bibliography and in-text citations. CSL is currently used by Zotero and Mendeley, and both programs offer word processor plug-ins for Microsoft Word and OpenOffice.org. Several other projects are working on or exploring CSL support.

2. Do we really need hundreds of citation styles?

Rintze Zelle: We have asked ourselves this question many times over. Some variability is certainly warranted: numeric, author-date and footnote styles are very different, and each type has its own advantages and disadvantages. However, small variations in citation styles result in a situation where almost every journal or publisher has its unique style. We think that even with the use of automated tools like CSL, reducing the number of citation styles in use could result in significant cost and time savings in scientific publishing. But this is a problem beyond the scope of CSL, so our goal is simply to support all the variability that currently exists in citation styles.

Ian Mulvany: After working for many years at Springer, and then Nature, I was well aware that most large publishers just push submitted manuscripts out to companies in India where the formatting of the paper happens. The input format and citation formatting really doesn’t matter to most publishers. They just tear the submitted manuscript to pieces and rebuild it in their chosen XML schema.

However, most people using citations are not actually submitting manuscripts for publication, but rather are writing term papers, or theses, or reports. So the weird thing is that citations started off as a required identifier for the literature. Google Scholar and HTTP URIs such as the DOI have almost totally made formatted citations redundant as identifiers, and yet there is still a huge user need to be able to format citations according to a huge variety of styles, and since that need is going to continue for quite a long time, it’s a need that we have to support.

3. What is the difference between CSL and other citation style systems?

Rintze Zelle: Our main “competition” arguably comes from BibTeX and EndNote/Reference Manager. BibTeX is a popular choice for those working with the LaTeX typesetting system, but the user base of LaTeX is relatively small and mostly limited to the sciences. EndNote and Reference Manager are commercial tools offered by Thomson Reuters. While large collections of citation styles are available for each program, the use of these styles is limited to licensed users.

CSL was designed with three main goals:

  1. to create an open system that is independent of the operating system, application or document format,
  2. to cover the full range of citation formatting rules in use, extending from the sciences to fields in the humanities as well as law,
  3. to free end users from the complex task of formatting citations.

Citation styles should be freely available, up to date, and complete. Switching between styles should be easy, and citation output should automatically localize to the desired language.

We think we’ve come quite far toward reaching these goals with our most recent release, CSL 1.0. The CSL 1.0 specification covers a wide range of citation rules, and offers advanced features like automatic localization of date formats, terms and punctuation, support for in-field rich text and extensive support for the rendering and disambiguation of names. The first standalone CSL 1.0 processor (the JavaScript citeproc-js) is currently being integrated into both Zotero and Mendeley, and is receiving attention for deployment on both the client and the server.

4. Can you tell me how CSL was developed?

Rintze Zelle: CSL is the brainchild of Bruce D’Arcus, an associate professor of Geography at Miami University of Ohio. The language was initially implemented for integration into OpenOffice.org, but only became popular in 2006 when Zotero, the first reference manager to use CSL, was released. In these early days major contributions were made to CSL by Zotero developer Simon Kornblith. Subsequently, the Zotero project successfully fostered an active user community, with many users contributing styles to a growing repository of CSL styles.

The year 2008 was a watershed of new developments. Mendeley was released, the second reference manager to use CSL for its citation formatting. Andrea Rossato released the first standalone CSL processor (citeproc-hs) for use with the Pandoc text processing system. Also in 2008, two Zotero users who enjoyed the program but felt that CSL could be further improved joined Bruce in CSL development: myself, at that time a PhD researcher in biotechnology at Delft University of Technology in the Netherlands, and Frank Bennett, Jr., an associate law professor at Nagoya University in Japan. Together with Andrea, our different academic and geographic backgrounds proved very useful in CSL development. In preparation for major backward-incompatible changes, CSL 0.8 was released in 2009, and in Spring 2010 CSL 1.0 saw the light of day. The 1.0 release was accompanied by a move to a new website at citationstyles.org, and included improved documentation in the form of a full language specification. CSL development has now calmed down a bit as we await the integration of CSL 1.0 by Zotero and Mendeley.

Ian Mulvany: We at Mendeley have been using the Citation Style Language for quite a while now. We think it is an amazing project and we are very strongly committed to working with the CSL community in encouraging uptake. We get a lot of feedback from our users and one area that they constantly run into problems with is the need to be able to format a citation in just such a manner. The CSL project is the best way for us to be able to support the needs of our users with these kinds of requests. Our developers have been pushing patches upstream to the citeproc-js project, particularly Carles Pina.

We have added a cut and paste stylebox on our article pages. If you have a look at a sample paper you will now see a little citeproc-js driven “Cite this document” box that lets you copy and paste formatted citations in several popular citation styles. We have also been supporting the creation of a WISYWIG citation style editor. The status of the project is that most of the code is complete and we just need to work on getting it integrated into our client, and figuring out the best way to manage the creation of more styles, and how that will work with the CSL community.

One of the things that we have been discussing with Bruce D’Arcus is how to manage the redistribution of new styles, and how to make sure that corrupt styles don’t propagate, and that people get the style that they are looking for. If people want to contribute there is a lot of activity on the mailing list of the CSL project. One thing we think we hope Mendeley can help with is reporting usage statistics on specific style files, so at least people can find the most popular version of a CSL file for a given style.

5. Where can a user find (more) CSL citation styles? Is it easy to modify a CSL style?

Rintze Zelle: While anyone is free to write and host their own CSL styles, most CSL styles that are in use are available through the Zotero Style Repository (many of these styles are licensed under a Creative Commons license). We have to admit that editing CSL styles currently requires some technical skill and knowledge of XML. This hasn’t kept members of the Zotero user community from creating over a thousand CSL styles, but we do recognize that user friendly editing of styles is a very important feature. We therefore applaud Mendeley’s effort to create an online CSL editor.

6. Should publishers care about CSL?

Ian Mulvany: As I pointed out the big publishers don’t care about the submission format, but they have not really done a good job of communicating that to their editorial boards. Smaller publishers don’t have the resources to totally reformat submissions, and beyond academic publishing there are a huge number of people who just need to format citations. There is a huge waste of people’s time in reformatting papers for submissions, in fixing styles according to changing requirements from departments, when what should matter is the content. I’d love to get to a point where every publisher accepted the same type of XML input, and our authoring tools all created content conforming to that input format. Citations should be a DOI or other HTTP URI that can be rendered into the appropriate format using CSL and an API.

Martin Fenner: The Open Access publishers BioMed Central and PLoS plan to add a CSL style download link to their author instruction pages. I hope that more publishers follow this example.

7. Do you want to talk about future plans for CSL?

Rintze Zelle: We’re very excited about the work Zotero and Mendeley developers are doing to update their programs to support CSL 1.0. The update path should be relatively smooth for users as styles can be automatically updated to the CSL 1.0 format, although styles will often need to be edited to take full advantage of all the new features. Zotero Everywhere was announced earlier this week and will include a web citation formatting service based on citeproc-js.

There are two things we consider crucial to the further development of CSL: one, we still lack an easy way for users to modify existing styles, although we’re hopeful that Mendeley’s CSL editor will soon fill this gap. Secondly, we feel there is a need for a more full-featured online style repository which allows users to find their style of choice, to add comments, and to propose style changes.

The goal of CSL is to make citing formatting easier at a general level, across all fields and in all languages. This can only be achieved through a collaborative endeavor, and here we think publishers also share some responsibility. By providing high quality item metadata through robust standards (like unAPI or COinS), by freely providing clear, correct and complete style guidelines or opting for a standard citation style (like APA, Chicago or MHRA), and perhaps even by creating and hosting their own CSL styles, publishers can make our work, and that of authors, so much easier.

With broad participation and support, we believe that CSL can benefit all fields of scholarship in a similar way Oren Patashnik’s BibTeX helped the sciences, by (further) streamlining the publication process, improving access to metadata of materials of all kinds, and by allowing scholars to spend more of their time on the core of their research.