All DataCite DOIs have associated metadata, described in the DataCite Metadata Schema Documentation (DataCite Metadata Working Group (2017)), validated and stored as XML in the DataCite Metadata Store (MDS). These metadata are then made available via DataCite APIs and services. For these services XML is not always the best format, and we are thus providing the metadata in other formats, most notably JSON. The problem with our approach so far has been that this JSON was not properly defined, creating overhead and ambiguity both for our internal services and for our users. To change this situation, and to make it easier to work with metadata for DataCite DOIs, we today are announcing DataCite JSON.
DataCite JSON represents all metadata elements and attributes available in DataCite XML, and can be converted from and to DataCite XML via several DataCite services (MDS API, EZ API, DOI Fabrica, Content Negotiation) that internally all use the bolognese metadata conversion library (Fenner (2017)), which also provides a command-line utility. Both our new Elasticsearch Search index and the updated JSON REST API (more on those in another blog post) use DataCite JSON. The bolognese metadata conversion library uses DataCite JSON as the intermediary format, for example when converting BibTeX to schema.org JSON-LD or JATS XML.
There are minor differences between DataCite JSON and DataCite XML, mainly to make working with the metadata easier. This includes an identifiers object that combines the identifier and alternateIdentifier properties, and a types object that not only stores resourceTypeGeneral and resourceType information, but also the type information from RIS, BibTeX, Citeproc and schema.org, to avoid losing type information when converting between these formats. There is also a new container property that stores information about the repository or journal where the content is located. We can provide this information in DataCite XML via the relatedidentifier (with relationType isPartOf) and description (via descriptionType SeriesInformation) elements, but the process is cumbersome. DataCite JSON also includes information not available in DataCite XML, including the url registered for the DOI, and the date the DOI was registered.
To see DataCite JSON in action, lookup the DOI metadata of your favorite DOI in our JSON REST API, e.g. https://api.datacite.org/dois/10.5438/0014, or - if you are a DataCite member or client - in DOI Fabrica. Alternatively install bolognese (via
gem install bolognese) and fetch metadata via the command
bolognese 10.5438/0014 -t datacite_json. Documentation of DataCite JSON is unfortunately still sparse, in early 2019 we will provide better documentation via our support site, and this will also include updated documentation of the JSON REST API and a JSON Schema to validate the metadata, aligned with our XSD Schema for DataCite XML.
We hope that DataCite JSON makes it easier to work with DataCite metadata, helping to improve metadata quality and re-use. We encourage users to adapt their tools to take advantage of DataCite JSON, and to consider DataCite JSON also when working with metadata not associated with a DataCite DOI, but when a description of scholarly resources with standard metadata and using JSON is needed. Watch out for more information about DataCite JSON in 2019, or reach out to us with questions or feedback via mailto:firstname.lastname@example.org.
This blog post was originally published on the DataCite Blog.
DataCite Metadata Working Group. (2017). DataCite metadata schema for the publication and citation of research data v4.1. DataCite. https://doi.org/10.5438/0014
Fenner, M. (2017). Bolognese: A ruby library for conversion of doi metadata. DataCite. https://doi.org/10.5438/n138-z3mk
Front Matter officially launches today
Front Matter describes the content preceding the main text of a book or journal. In science, several research journals, including PLOS Biology, Nature and Science, have Front Matter sections, used for news, opinions, and other content that are not a research articles. ...
Announcing the Organization Identifier Project: a Way Forward
The scholarly research community has come to depend on a series of open identifier and metadata infrastructure systems to great success. Content identifiers (through DataCite and Crossref) and contributor identifiers (through ORCID) ...
Auto-Update Has Arrived!
This post has been cross-posted from the ORCID blog. We will follow up with a blog post later this week explaining the DataCite auto-update implementation.Since ORCID’s inception, our key goal has been to unambiguously identify researchers and provide ...