DataCite Commons - Exploiting the Power of PIDs and the PID Graph

DataCite Commons - Exploiting the Power of PIDs and the PID Graph

Today DataCite is proud to announce the launch of DataCite Commons, available at https://commons.datacite.org. DataCite Commons is a discovery service that enables simple searches while giving users a comprehensive overview of connections between entities in the research landscape. This means that DataCite members registering DOIs with us will have easier access to information about the use of their DOIs and can discover and track connections between their DOIs and other entities. DataCite Commons was developed as part of the EC-funded project Freya and will form the basis of new DataCite services.

Content

DataCite Commons has a lot of content to search for. One of the most important features is the ability to search for all DOIs, no matter whether registered with DataCite, Crossref, or one of the other scholarly DOI registration agencies. Users want to search for content or look up metadata for a particular DOI, and not worry about where to look. DataCite initially focused on registering DOIs for datasets (approaching 8 million DOIs so far), but our members to date have also registered almost 6 million DOIs for text publications. At the same time, Crossref members have given almost 2 million DOIs to datasets in addition to the DOIs for journal articles, book chapters, and other text publications. Other content types can be equally found at both DataCite and Crossref, e.g. dissertations or preprints. And there are 6 more DOI registration agencies that register DOIs for scholarly content. Including the more than 110 million Crossref DOIs in DataCite Commons is a huge undertaking. We currently have 10 million Crossref DOIs in DataCite Commons with the import of many more DOIs ongoing, together with 20 million DOIs from DataCite.

Connections

DataCite Commons not only has a lot more content to search for but also exposes the connections between DOIs in the form of citations, versions, and collections. DataCite Commons also shows the connections between content with DOIs and people, research organizations, and funders – what we together call the PID Graph of scholarly resources identified via persistent identifiers (PIDs) and connected in standard ways. We integrate with both the ORCID and ROR (Research Organization Registry) APIs to enable a search for (10 million) people and (100,000) organizations and to show the associated content. For funding, we take advantage of the inclusion of Crossref Funder IDs in ROR metadata. We combine these connections, showing a funder, research organization, or researcher not only their content but also the citations and views and downloads if available, together with aggregate statistics such as numbers by year or content type.

For a single work, e.g. the dataset registered with DOI https://doi.org/10.5061/dryad.234, we show views, downloads and citations if available:

Metadata

By mapping all Crossref metadata to corresponding metadata in DataCite, we can support much more granular search queries compared to just mapping basic metadata. With this release, we are also launching a new set of filters for content search. We added license type, fields of science, primary language, and DOI registration agency to the existing filters publication year and work type. As described in a July blog post (Fenner, 2020a), we are using existing controlled vocabularies for these filters (license type: SPDX, fields of science: OECD, and language: ISO639-1), and are re-indexing all our metadata (almost completed) to align with these standard vocabularies where possible. We encourage our members to use these standard vocabularies when registering content. This should help to find content that has a license that allows unrestricted re-use, and that is in the research field and language we are interested in. Using these widely used vocabularies should help with interoperability with other services.

Technology

To make DataCite Commons possible, we built a technology platform that can properly handle the metadata from multiple sources and the rich connections between them. The underlying technology is GraphQL. Our GraphQL API launched in May 2020 (Fenner, 2020b) and uses the graphql-ruby library that also powers the GitHub GraphQL API. The DataCite Commons web frontend is built with React (together with Apollo Client), a popular Javascript Framework, to interact with this GraphQL API. Everything we have built is based on open source software and is made available (API and web frontend) with a permissive open source license. As always, we welcome contributions to our source code and are more than happy to help others work with GraphQL.

Project FREYA

The work on DataCite Commons is part of the FREYA project that is helping build the European Open Science Cloud (EOSC), funded by the European Commission. DataCite Commons fulfills the specific project goals of delivering one Common DOI Search for DOIs from all DOI registration agencies, and of providing an easy to use interface for the PID Graph powered by GraphQL. FREYA will end in November 2020, and we will use the remaining three months to improve the service based on the input we collected so far, and the feedback we will receive with this release. We will focus on supporting researchers and other end-users, building on the schema.org metadata export (Cousijn, Cruse, & Fenner, 2018), citation formatting, and ORCID claiming (Fenner, 2015) available in DataCite Search. What we released today is the first public version of the service, we will continue adding many more Crossref DOIs and more connections, and work on improving the performance as the system scales. Beyond FREYA, DataCite Commons will be maintained and further developed by DataCite in coordination with other PID providers and the broader open science community. Watch out for a FREYA webinar on DataCite Commons in September.

Next steps

While DataCite Commons is open to everyone to help with the discovery of scholarly resources and its connections that are part of the PID Graph, it provides particular value to DataCite members. The service makes their metadata and content available to a wide audience, helps them discover and report connections such as citations and affiliation and funding information, and provides an open platform for further integrations with other services going forward. We will closely work with DataCite members to further align the new service with their needs and the needs of the communities they serve. We will be actively seeking input as we continue to build on the DataCite Commons and there will be a dedicated Open Hours session in the coming months. If you have feedback at this point, please reach out to DataCite Support or post a message in the DataCite channel of the PID Forum.

Acknowledgments

This blog post was originally published on the DataCite Blog. This work was funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777523.

References

Cousijn H, Cruse P, Fenner M. Taking discoverability to the next level: datasets with DataCite DOIs can now be found through Google Dataset Search. doi:10.5438/5AEP-2N86

Fenner M. Announcing the DataCite Profiles Service. doi:10.5438/15X1-BJ6R

Fenner M. Making the most out of available Metadata. doi:10.5438/1DGK-1M22

Fenner M. Powering the PID Graph: announcing the DataCite GraphQL API. Published online May 6, 2020. doi:10.5438/YFCK-MV39

Copyright © 2020 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.