The open source research data management platform InvenioRDM today announced the first Long-Term Support (LTS) release, usable on production services. And I am joining the effort as a participating partner via Front Matter, the organization I started this week.
InvenioRDM was first announced in April 2019:
Our vision in the next five-years, is to make InvenioRDM a world-leading extensible research data management platform used by research institutions all around the world and with businesses providing services, support and customizations on top of InvenioRDM.
The first concrete set of goals was defined as
- A stable InvenioRDM platform - A research data management platform based on Zenodo and the Invenio v3 Framework.
- A community of public and private institutions to sustain InvenioRDM.
- Minimum two existing repositories migrated to InvenioRDM, with Zenodo being one of them.
Today's release brings invenioRDM much closer to achieving these goals. The next major milestone for InvenioRDM is to migrate Zenodo to run on top of InvenioRDM.
In the coming two months I will not only try to get up to speed with the invenioRDM project and start working with the CERN team and the other participating partners, but I also have the specific task of making sure invenioRDM fully supports the data citation roadmap for scholarly data repositories, work done by the Force11 DCIP project with Merce Crosas and me as co-leads, and described in a 2019 Scientific Data paper (Fenner et al. 2019):
Guidelines for Repositories (1-5 required, 6-9 recommended, 10-11 optional)
- All datasets intended for citation must have a globally unique persistent identifier that can be expressed as an unambiguous URL.
- Persistent identifiers for datasets must support multiple levels of granularity, where appropriate.
- The persistent identifier expressed as an URL must resolve to a landing page specific for that dataset, and that landing page must contain metadata describing the dataset.
- The persistent identifier must be embedded in the landing page in machine-readable format.
- The repository must provide documentation and support for data citation.
- The landing page should include metadata required for citation, and ideally also metadata facilitating discovery, in human-readable and machine-readable format.
- The machine-readable metadata should use schema.org markup in JSON-LD format.
- Metadata should be made available via HTML meta tags to facilitate use by reference managers.
- Metadata should be made available for download in BibTeX and/or another standard bibliographic format.
- Content negotiation for schema.org/JSON-LD and other content types may be supported so that the persistent identifier expressed as URL resolves directly to machine-readable metadata.
- HTTP link headers may be supported to advertise content negotiation options
Several of these recommendations are of course already addressed by invenioRDM, but there is more work needed in the details, e.g. how metadata are exposed in dataset landing pages using schema.org. And these recommendations have evolved, e.g. as described in the output of the Research Data Alliance (RDA) Research Metadata Schemas Working Group published in June (Wu et al. 2021).
Please reach out to me in the comments or via email if you have any questions or suggestions regarding this upcoming work, or more generally my new involvement in invenioRDM.
Fenner, M., Crosas, M., Grethe, J. S., Kennedy, D., Hermjakob, H., Rocca-Serra, P., Durand, G., Berjon, R., Karcher, S., Martone, M., & Clark, T. (2019). A data citation roadmap for scholarly data repositories. Scientific Data, 6(1). https://doi.org/10.1038/S41597-019-0031-8
Wu, M., Juty, N., RDA Research Metadata Schemas WG , Collins, J., Duerr, R., Ridsdale, C., Shepherd, A., Verhey, C., & Castro, L. J. (2021). Guidelines for publishing structured metadata on the Web. https://doi.org/10.15497/RDA00066