On Tuesday the journal PLOS ONE celebrated its 10th anniversary (see blog post by PLOS ONE Editor-in-Chief Jörg Heber and blog post by PLOS ONE Managing Editor Iratxe Puebla and PLOS Advocacy Director Catriona MacCallum). PLOS ONE (and PLOS) have changed scholarly publishing in many ways, from a DataCite perspective probably most importantly via the data policy updated in February 2014 that states that
PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.
PLOS ONE was not the first journal with a Joint Data Archiving Policy (Whitlock, 2011), but this policy update moved proper archiving in public repositories of data used in papers into the mainstream, given that PLOS ONE had become the largest journal in the world by number of papers published.
Publishing so many papers, and doing this only in electronic form, means that PLOS ONE doesn't really have journal issues, and that papers are published daily as they become ready – a common pattern with online-only journals. This means that the traditional way of referencing a scholarly article via journal name, volume, issue and page numbers isn't really useful anymore, as a proxy PLOS ONE uses the publication year as volume, month as issue and an electronic location identifier instead of page numbers, for example PLOS ONE, 9(8), e105948. But of course we don't use this information to uniquely identify and locate a PLOS ONE article, we use the DOI instead – https://doi.org/10.1371/journal.pone.0105948 for this example.
This particular DOI is for an interesting article by Norris et al. (2014) that describes how rocks in Death Valley slide with the help of thin layers of ice and wind, a phenomenon known since the 1940s, but in this paper for the first time systematically analyzed using GPS.
Before joining DataCite in 2015 I worked for PLOS for three years, helping with they Article-Level Metrics initiative (Fenner, 2013). The sliding rocks paper has been a fascinating ALM example, as the paper drew a lot of attention beyond traditional citations.
But the sliding rocks paper has of course also been cited, and I counted 12 citations this week. These citations are listed below, together with what is shown in the reference list:
You can see something interesting in these references: only 4 in 12 include the DOI, only one reference includes a URL, but all 12 include the volume and 8 each include the issue and the electronic location identifier. In other words, the references are formatted for a journal article that can be found as physical copy in a library, sorted and shelved by publication name, volume and issue. Only one in three references use a DOI and/or URL, even though that is the only way to fetch the online-only article.
Journal name, volume, issue and electronic location identifier also don't work too well in uniquely identifying the journal article, as this information is human-readable, but difficult for a machine to extract from the reference list. Many publishers of course link to referenced articles using the DOI behind the scenes even when not displaying the DOI, but that is a brittle implementation. Reference lists are for example also used by authors in manuscript submissions, and without DOIs displayed in the reference list it becomes much harder for the author to provide machine-readable information about the references used.
Reference lists remain one of the mysteries in scholarly publishing. Not only does this small example demonstrate that they still have not been fully adapted to how journal articles are published, read and cited in 2016, but you can also see that the examples use multiple citation styles. There are thousands of them, displaying the same information in so many different ways that generating and consuming references has become a business in itself. Another mystery is the limitation of the number of references in an online-only journal. While it makes sense to set some limit, these numbers sometimes seem arbitrarily low, coming from a time when every extra page printed was costly.
It seems unrealistic in the near future to ever agree on a common citation style. But what we can do is at least use a citation style that is widely used instead of reinventing one for every journal, and use a style that includes the DOI in the reference. PLOS switched from a PLOS-specific style to the Vancouver or NLM style (Patrias, 2015) in 2015. This style is widely used by biomedical journals, and PLOS ONE articles now include DOIs in reference lists, as you can see for example in this 2015 PLOS ONE paper (Tenopir et al., 2015) – the sliding rocks paper was published in 2014.
Vancouver/NLM allows the inclusion of DOIs in references, but isn't really urging users to do so, and it still recommends a traditional style of referencing:
This blog uses the APA style, a style that is not only widely used and documented, but also includes DOIs in references (as you can see in the reference list of this blog post), and is one of the few citation styles with specific support for data citations (adding [Data set] after the title). And in contrast to the Vancouver style this style uses the Author-Date format for in-text citations, providing more context to the reader of the article.
Using journal name, volume, issue and page numbers in a reference poses a particular challenge for DataCite, as the DataCite metadata schema doesn't support them. The main reason for this is DataCite's focus on providing metadata for datasets. Also, the DataCite metadata schema is based on Dublin Core, which also doesn't have these properties (bibliographicCitation can be used instead). More than 1.5 million text documents have been registered with a DataCite DOI, and many of them probably would have had journal name, volume, issue and pages information available. Should we add support for these properties to the DataCite metadata schema, or should we see these properties as no longer essential for citation information in a reference and leave them out of the metadata schema? I would argue that resources that have a digital object identifier don't require volume, issue and pages information to uniquely identify and/or locate the resource.
The other challenge for DataCite is that the current state of reference lists in journal articles makes it harder than needed to include data citations in them. When DOIs are not included in reference lists since the citation style doesn't want them displayed, then manuscripts with data citations submitted by authors need special treatment, which limits adoption because of the extra effort required. Publishers routinely rebuild reference lists from scratch by fetching the DOI and associated metadata based on the citation information provided, and these tools are built around citation metadata typically found in journal articles (including volume, issue and page information) and Crossref DOIs.
The conclusions from the above are simple:
As a publisher, require a citation style that includes the DOI.
The X-Files is an American TV series about FBI special agents Fox Mulder and Dana Scully who investigate unsolved cases involving paranormal phenomena. And in an episode aired in February 2016 (The X-Files: Mulder & Scully Meet the Were-Monster, via IMDB), Fox Mulder refers to the PLOS ONE Sliding Rocks paper when he says:
Scully, since we've been away, much of the "unexplained" has been explained. The "Death Valley Racetrack"? Turns out it was just ice formations, moving the rocks around as it melted. Yeah, ice.
To refer to this episode you can again use a DOI, which in this case is not for scholarly content but is an EIDR – A universal unique identifier for movie and television assets. And now we have come full circle.
This blog post was originally published on the DataCite Blog.
Asorey, H., Nunez, L. A., & Sarmiento-Cano, C. (2015). Exposicion temprana de nativos digitales en ambientes, metodologias y tecnicas de investigacion en la universidad. arXiv [Physics]. Retrieved from https://arxiv.org/abs/1501.04916
Baumgardner, G. D., & Shaffer, B. S. (2015). Sliding Bones: Movement of Skeletal Material Over Smith Creek Playa in Nevada and Its Taphonomic and Paleontologic Implications. Western North American Naturalist, 75(2), 236–243. https://doi.org/10.3398/064.075.0213
El-Maarry, M. R., Watters, W. A., Yoldi, Z., Pommerol, A., Fischer, D., Eggenberger, U., & Thomas, N. (2015). Field investigation of dried lakes in western United States as an analogue to desiccation fractures on Mars. Journal of Geophysical Research: Planets, 120(12), 2015JE004895. https://doi.org/10.1002/2015JE004895
Fenner, M. (2013). What Can Article-Level Metrics Do for You? PLOS Biology, 11(10), e1001687. https://doi.org/10.1371/journal.pbio.1001687
Grayson, D. K., & Meltzer, D. J. (2015). Revisiting Paleoindian exploitation of extinct North American mammals. Journal of Archaeological Science, 56, 177–193. https://doi.org/10.1016/j.jas.2015.02.009
Hannigan, S., Raphael, J.-A., White, P., Bragg, L., & Cripps Clark, J. (2016). Collaborative reflective experience and practice in education explored through self-study and arts-based research. Creative Approaches to Research, 9(1), 84–110. Retrieved from http://hdl.handle.net/10536/DRO/DU:30088291
Jones, R., & Hooke, R. L. (2015). Racetrack Playa: Rocks moved by wind alone. Aeolian Research, 19, Part A, 1–3. https://doi.org/10.1016/j.aeolia.2015.08.001
Li, M., Zhou, S., & Wang, G. (2016). 3D identification and stability analysis of key surface blocks of rock slope. Transactions of Tianjin University, 22(4), 317–323. https://doi.org/10.1007/s12209-016-2596-z
Lorenz, R. D., Norris, J. M., Jackson, B. K., Norris, R. D., Chadbourne, J. W., & Ray, J. (2014). Trail formation by ice-shoved “sailing stones” observed at Racetrack Playa, Death Valley National Park. Earth Surface Dynamics Discussions, 2(2), 1005–1022. https://doi.org/10.5194/esurfd-2-1005-2014
Norris, R. D., Norris, J. M., Lorenz, R. D., Ray, J., & Jackson, B. (2014). Sliding Rocks on Racetrack Playa, Death Valley National Park: First Observation of Rocks in Motion. PLOS ONE, 9(8), e105948. https://doi.org/10.1371/journal.pone.0105948
Patrias, K. (2015). Citing Medicine (2nd ed.). National Library of Medicine (US). Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK7256/
Rambo, R. P., & Tainer, J. A. (2015). Modeling Macromolecular Motions by X-Ray-Scattering-Constrained Molecular Dynamics. Biophysical Journal, 108(10), 2421–2423. https://doi.org/10.1016/j.bpj.2015.04.023
Sanz-Montero, M. E., Cabestrero, Ó., & Rodríguez-Aranda, J. P. (2015). Sedimentary effects of flood-producing windstorms in playa lakes and their role in the movement of large rocks. Earth Surface Processes and Landforms, 40(7), 864–875. https://doi.org/10.1002/esp.3677
Sanz-Montero, M. E., Cabestrero, Ó., & Rodríguez-Aranda, J. P. (2016). Comments on Racetrack playa: Rocks moved by wind alone. Aeolian Research, 20, 196–197. https://doi.org/10.1016/j.aeolia.2016.01.003
Tenopir, C., Dalton, E. D., Allard, S., Frame, M., Pjesivac, I., Birch, B., … Dorsett, K. (2015). Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLOS ONE, 10(8), e0134826. https://doi.org/10.1371/journal.pone.0134826
Whitlock, M. C. (2011). Data archiving in ecology and evolution: Best practices. Trends in Ecology & Evolution, 26(2), 61–65. https://doi.org/10.1016/j.tree.2010.11.006
Using YAML Frontmatter with CSV
CSV (comma-separated values) is a popular file format for data. It is popular because it is very simple: CSV is text-based and any application that can open text files can read or write CSV. This makes it a good fit for digital preservation. ...
Data catalog cards: simplifying article/data linking
Data citation is core to DataCite's mission and DataCite is involved in several projects that try to facilitate data citation, including THOR, Data Citation Implementation Pilot (DCIP), Research Data Alliance (RDA), and COPDESS. ...
Using Jupyter Notebooks with GraphQL and the PID Graph
Two weeks ago DataCite announced the pre-release version of a GraphQL API [Fenner (2019)]. GraphQL simplifies complex queries that for example want to retrieve information about the authors, funding and data citations for a dataset with a DataCite DOI. ...