After six years as DataCite Technical Director, I am both sad and excited to announce that I will be leaving DataCite, beginning a new adventure as an independent developer for the invenioRDM project on August 1st. My focus will remain on research data management, but with a different angle.
A lot has changed since 2015 at DataCite in general, and the DataCite technical architecture in particular. Rather than describe the work on DataCite infrastructure over the past six years in more detail, I want to provide a snapshot of where the DataCite infrastructure is with its core services, and what considerations the team is taking into account going forward.
DataCite members can register DOIs and metadata for content using the Metadata Store (MDS). The MDS API hasn’t changed much for users since 2012, although the technology powering the API has been replaced more than once, and the metadata schema is constantly evolving. We have added a JSON REST API and web frontend (Fabrica) starting in 2017. Going forward we hope to see more adoption of the JSON REST API, e.g. by finalizing and promoting the JSON schema. And we may want to explore other ways to register content, namely by embedding metadata in landing pages using schema.org in combination with sitemaps files.
In October 2020 DataCite launched DataCite Commons as a new discovery platform, followed by an announcement to retire DataCite Search by the end of 2021. DataCite Commons enables the discovery of connections between content, people, and organizations. DataCite Commons uses the existing DataCite backend infrastructure with relational databases and Elasticsearch in combination with a new GraphQL API. Going forward we will see whether this approach scales appropriately, or whether a different technology is needed to power DataCite Commons. This would include the exploration of graph database technologies such as neo4j with or without GraphQL. Further, we will work to track the adoption of GraphQL and our REST API architecture.
DataCite as an infrastructure provider has always focussed on backend APIs and related services. As part of this work, DataCite has migrated to a Docker container-based cloud architecture. There is still work ahead, from migrating to Kubernetes to service meshes and better monitoring and handling of service loads.
DataCite is in a good position to handle our technology projects during this transition. I have been working closely with the DataCite team to transition responsibilities and will continue to be involved in community initiatives. Matt will publish a blog post next week that will cover the future team structure.
This blog post was originally published on the DataCite Blog.
Explaining the DataCite/ORCID Auto-update
This Monday ORCID, CrossRef and DataCite announced (ORCID post, CrossRef post, DataCite post) the new auto-update service that automatically pushes metadata to ORCID when an ORCID identifier is found in newly registered DOI names.This is the first joint announcement by the three organizations,...
We need your feedback: Aligning the CodeMeta vocabulary for scientific software with schema.org
Metadata that describes scientific software in standard ways – in particular citation metadata such as title, authors, publication year, and venue – is essential for proper software citation implementation. The metadata should be generated by the software author,...
Exposing DOI metadata provenance
DOI metadata provenance is describing the history of a particular DOI metadata record, i.e. what changes were made when and by whom. This information is now stored and provided via an API for all DOI registrations since March 10,...