This week I start as the new DataCite Technical Director. While I get up to speed with existing DataCite services and infrastructure, and we start to launch new services (e.g. this blog), this is also a good time to communicate the overall approach I am taking. I like to call it Data-Driven Development, or DDD as we all love acronyms.
Data-Driven Development and related terms are in use in several contexts, in particular economics, and programming. The term sounds similar to test-driven development and behavior-driven development, two related software development processes. Business intelligence and data science are of course closely related. My definition is as follows:
We develop and maintain our services based on data.
This shouldn't come as a surprise as DataCite's mission is Helping you to find, access and reuse data. And my last job at the Open Access publisher PLOS was all about collecting and presenting data about the reuse of scholarly articles (citations, downloads, social media mentions, etc.). But here I mean data in a much broader sense.
While the overall strategic direction is determined by the Board together with the DataCite working groups and members, we can collect data that help with decisions in product development, for example
Compared with the next two sections, tools for data-driven product development are less commonplace (unless I missed them, in which case please provide feedback).
The data generated during software development are increasingly made available through automated tools. We can
Any web-based service can and should be monitored for
We don't want to stop at collecting all these data, we also need a strategy for providing them to the DataCite Board, DataCite working groups, DataCite members and data centers, DataCite staff, and everyone else who cares about these data. The default should be open, exceptions are mostly data that would raise privacy or security concerns, e.g. IP addresses in usage stats. Most of the services mentioned in this post are open for everyone to look at.
Good data-driven development should not only collect lots of data and make them available, but we also need to aggregate the information in meaningful ways. Service monitoring is a good example where staff needs to understand exactly what is going on, but the typical DataCite user only cares about whether all services are running as expected. A status dashboard would be a good solution here.
The data we are generating also need to be put into the broader context. We need
Of course I am aware that this is an ambitious agenda, in particular since DataCite is a small non-profit that has limited staff and financial resources. But I don't think that data-drive development should be left to for-profit organizations and/or to organizations of a certain size. There are several things DataCite can do:
This blog post was originally published on the DataCite Blog.
Unknown. (1931). Hannover, blick auf hannover. ETH-Bibliothek Zürich, Bildarchiv. https://doi.org/10.3932/ETHZ-A-000159123
Overcoming Development Pain
Today DataCite received an email from a user alerting us that there are some small inconsistencies with our recommended data citation format:Creator (PublicationYear): Title. Publisher. Identifierat https://www.datacite.org/services/cite-your-data.htmlCreator; (PublicationYear): Title; Publisher. ...
Starting 2022 with a new feature: full-text search for the Front Matter blog
Fresh into 2022, the Front Matter blog today is launching an important new feature: full-text search of all blog posts. An example query would be for reference manager.As the Front Matter blog has a lot of posts about reference managers, ...
The DataCite Technology Stack
DataCite is a DOI registration agency that enables the registration of scholarly content with a persistent identifier (DOI) and metadata. This content can then be searched for, reused, and connected to other scholarly resources. ...