Data citation is core to DataCite's mission and DataCite is involved in several projects that try to facilitate data citation, including THOR, Data Citation Implementation Pilot (DCIP), Research Data Alliance (RDA), and COPDESS. The biggest roadblock for wider data citation adoption might be insufficient incentives for individual researchers, but another major challenge is that implementing data citation is still too complicated.
When we talk about data citation, we typically mean two related, but different scenarios:
The first scenario is not conceptually different from an article citing another article, where the common practice is to put everything that is cited into the reference list.
The second scenario is probably not only more common, but also requires more complex workflows, e.g. coordination of issuing persistent identifiers for article and data and linking them together via metadata. And we as a community are still working on common practices for doing this. Assuming again that incentives are the biggest driver of change, I would argue that researchers, publishers, and funders are all interested in making this work, but that data repositories have the strongest motivation to improve the current situation. If this is true then we should give data repositories a bigger role in the publication of data associated with an article.
While many publishers host supplementary information for articles, they leave the hosting of more complex research data to external data repositories specialized in this task. Properly referencing all associated data in the article is currently the job of the publisher, and I propose that we give more of this responsibility to the data repository. The data repository can create a data catalog card (with associated persistent identifier and metadata) that describes all data associated with an article. The data catalog card is a collection of metadata, and different from a data paper. The data described in the catalog card can be hosted in that repository or elsewhere.
The publisher then links to this data catalog card via the article metadata and can display the catalog card formatted as a data availability statement. The publisher could (and should) still link to individual data where appropriate, but the proposed solution helps solve several important issues:
Several general purpose data repositories already provide most or all of this functionality, I am most familiar with Dryad, BioStudies (McEntyre, Sarkans, & Brazma, 2015) and Figshare (Hyndman, 2016). Data catalog cards probably work best for repositories that a flexible in the kinds of data they take, and repositories that already have integrations with publishers. Not every data repository needs to support this functionality. Data catalog cards are also an opportunity for differentiation, e.g. by providing data curation, help with data review, etc.
My thinking about this topic was triggered by a conversation with Tim Clark in the context of the DCIP project. The guest post by Dan S. Katz (Katz, 2016) and the discussion around it was another important motivation, and a DataCite blog post from last August (Fenner, 2015) contains some of the ideas expressed here. Obviously this topic is of great interest to DataCite, as we hope that data catalog cards use DataCite DOIs, and that we can help both with making article/data publishing workflows easier, and with discovering data associated with an article.
This blog post was originally published on the DataCite Blog.
Fenner, M. (2015). Reference lists and tables of content. DataCite Blog. Retrieved from https://blog.datacite.org/reference-lists-and-tables-of-content
Hyndman, A. (2016). Unveiling figshare ’collections’ - a new way to group content. Figshare Blog. Retrieved from https://figshare.com/blog/Unveiling_figshare_Collections_a_new_way_to_group_content/202
Katz, D. S. (2016). To better understand research communication, we need a groid (group object identifier). DataCite Blog. Retrieved from https://blog.datacite.org/to-better-understand-research-communication-we-need-a-groid-group-object-identifier
McEntyre, J., Sarkans, U., & Brazma, A. (2015). The BioStudies database. Molecular Systems Biology, 11(12), 847–847. https://doi.org/10.15252/msb.20156658
Introducing the PID Graph
Persistent identifiers (PIDs) are not only important to uniquely identify a publication, dataset, or person, but the metadata for these persistent identifiers can provide unambiguous linking between persistent identifiers of the same type, e.g. ...
Differences between ORCID and DataCite Metadata
One of the first tasks for DataCite in the European Commission-funded THOR project, which started in June, was to contribute to a comparison of the ORCID and DataCite metadata standards. Together with ORCID, CERN, the British Library and Dryad we looked at how contributors, ...
Overcoming Development Pain
Today DataCite received an email from a user alerting us that there are some small inconsistencies with our recommended data citation format:Creator (PublicationYear): Title. Publisher. Identifierat https://www.datacite.org/services/cite-your-data.htmlCreator; (PublicationYear): Title; Publisher. ...