One of the first tasks for DataCite in the European Commission-funded THOR project, which started in June, was to contribute to a comparison of the ORCID and DataCite metadata standards. Together with ORCID, CERN, the British Library and Dryad we looked at how contributors, organizations and artefacts - and the relations between them - are described in the respective metadata schemata, and how they are implemented in two example data repositories, Archaeology Data Service and Dryad Digital Repository.
The focus of our work was on identifying major gaps. Our report was finished and made publicly available last week (Fenner et al., 2015). The key findings are summarized below:
While a single input field for contributor names is common, separate fields for given and family names are required for proper formatting of citations. As long as citations to scholarly content rely on properly formatted text rather than persistent identifiers, services holding bibliographic information have to support these separate fields. Further work is needed to help with the transition to separate input fields for given and famliy names, and to handle contributors that are organizations or groups of people.
The currently existing vocabularies for contributor type (DataCite) and contributor role (ORCID) provide a high-level description, but fall short when trying to describe the author/creator contribution in more detail. Project CRediT is a multi-stakeholder initiative that has developed a common vocabulary with 14 different contributor roles, and this vocabulary can be used to provide this detail, e.g. who provided resources such as reagents or samples, who did the statistical analysis, or who contributed to the methodology of a study.
CRediT is complementary to existing contributor role vocabularies such as those by ORCID and DataCite. For contributor roles it is particularly important that the same vocabulary is used across stakeholders, so that the roles described in the data center can be forwarded first to DataCite, then to ORCID, and then also to other places such as institutional repositories.
Capturing relations between scholarly works such as datasets in a standardized way is important, as these relations are used for citations and thus the basis for many indicators of scholarly impact. Currently used vocabularies for relation types between scholarly works, e.g. by CrossRef and DataCite, only partly overlap. In addition we see differences in community practices, e.g. some scholars but not others reserve the term citation for links between two scholarly articles. The term data citation is sometimes used for all links from scholarly works to datasets, but other times reserved for formal citations appearing in reference lists.
Both ORCID and DataCite not only provide persistent identifiers for people and data, but they also collect metadata around these persistent identifiers, in particular links to other identifiers. The use of persistent identifiers for organizations lags behind the use of persistent identifiers for research outputs and people, and more work is needed.
Research projects are collaborative activities among contributors that may change over time. Projects have a start and end date and are often funded by a grant. The existing persistent identifier (PID) infrastructure does support artefacts, contributors and organisations, but there is no first-class PID support for projects. This creates a major gap that becomes obvious when we try to describe the relationships between funders, contributors and research outputs.
Both the ORCID and DataCite metadata support funding information, but only as direct links to contributors or research outputs, respectively. This not only makes it difficult to exchange funding information between DataCite and ORCID, but also fails to adequately model the sometimes complex relationships, e.g. when multiple funders and grants were involved in supporting a research output. We therefore not only need persistent identifiers for projects, but also infrastructure for collecting and aggregating links to contributors and artefacts.
We identified significant differences between the ORCID and DataCite metadata schema, and these differences hinder the flow of information between the two services. Several different approaches to overcome these differences are conceivable:
The first approach is the linked open data approach, and was designed specifically for scenarios like this. One limitation is that it requires persistent identifiers for all relevant attributes (e.g. for every creator/contributor in the DataCite metadata). One major objective for THOR is therefore to increase the use of persistent identifiers, both by THOR partners, and by the community at large.
A common metadata schema between ORCID and DataCite is neither feasible nor necessarily needed. In addition, we have to also consider interoperability with other metadata standards (e.g. CASRAI, OpenAIRE, COAR), and with other artifacts, such as those having CrossRef DOIs. What is more realistic is harmonization across a limited set essential metadata.
The third approach to improve interoperability uses a common API format that includes all the metadata that need to be exchanged, but doesn’t require the metadata schema itself to change. This approach was taken by DataCite and CrossRef a few years ago to provide metadata for DOIs in a consistent way despite significant differences in the CrossRef and DataCite metadata schema. Using HTTP content negotiation, metadata are provided in a variety of formats.
This blog post was originally published on the DataCite Blog.
Fenner, M., Demeranville, T., Kotarski, R., Vision, T., Rueda, L., Dasler, R., … THOR Consortium. (2015). D2.1: Artefact, contributor, and organisation relationship data schema. Zenodo. https://doi.org/10.5281/ZENODO.30799
Mysteries in Reference Lists
On Tuesday the journal PLOS ONE celebrated its 10th anniversary (see blog post by PLOS ONE Editor-in-Chief Jörg Heber and blog post by PLOS ONE Managing Editor Iratxe Puebla and PLOS Advocacy Director Catriona MacCallum). PLOS ONE (and PLOS) ...
Explaining the DataCite/ORCID Auto-update
This Monday ORCID, CrossRef and DataCite announced (ORCID post, CrossRef post, DataCite post) the new auto-update service that automatically pushes metadata to ORCID when an ORCID identifier is found in newly registered DOI names.This is the first joint announcement by the three organizations, ...
Data catalog cards: simplifying article/data linking
Data citation is core to DataCite's mission and DataCite is involved in several projects that try to facilitate data citation, including THOR, Data Citation Implementation Pilot (DCIP), Research Data Alliance (RDA), and COPDESS. ...