Data-Driven Development

Data-Driven Development

This week I start as the new DataCite Technical Director. While I get up to speed with existing DataCite services and infrastructure, and we start to launch new services (e.g. this blog), this is also a good time to communicate the overall approach I am taking. I like to call it Data-Driven Development, or DDD as we all love acronyms.

Definition

Data-Driven Development and related terms are in use in several contexts, in particular economics, and programming. The term sounds similar to test-driven development and behavior-driven development, two related software development processes. Business intelligence and data science are of course closely related. My definition is as follows:

We develop and maintain our services based on data.

This shouldn't come as a surprise as DataCite's mission is Helping you to find, access and reuse data. And my last job at the Open Access publisher PLOS was all about collecting and presenting data about the reuse of scholarly articles (citations, downloads, social media mentions, etc.). But here I mean data in a much broader sense.

Product Development

While the overall strategic direction is determined by the Board together with the DataCite working groups and members, we can collect data that help with decisions in product development, for example

Compared with the next two sections, tools for data-driven product development are less commonplace (unless I missed them, in which case please provide feedback).

Software Development

The data generated during software development are increasingly made available through automated tools. We can

Service Monitoring

Any web-based service can and should be monitored for

Communication

We don't want to stop at collecting all these data, we also need a strategy for providing them to the DataCite Board, DataCite working groups, DataCite members and data centers, DataCite staff, and everyone else who cares about these data. The default should be open, exceptions are mostly data that would raise privacy or security concerns, e.g. IP addresses in usage stats. Most of the services mentioned in this post are open for everyone to look at.

Synthesis

Good data-driven development should not only collect lots of data and make them available, but we also need to aggregate the information in meaningful ways. Service monitoring is a good example where staff needs to understand exactly what is going on, but the typical DataCite user only cares about whether all services are running as expected. A status dashboard would be a good solution here.

The data we are generating also need to be put into the broader context. We need

  • the DataCite Board to use them for strategic planning
  • to provide these data to the DataCite working groups to feed into their work (e.g. stats on what metadata are submitted by data centers for the Metadata Working Group
  • the DataCite staff to integrate them in their work (e.g. the Communications Director utilizing the website usage stats)
  • these data to adapt the software development roadmap and service infrastructure

Implementation

Of course I am aware that this is an ambitious agenda, in particular since DataCite is a small non-profit that has limited staff and financial resources. But I don't think that data-drive development should be left to for-profit organizations and/or to organizations of a certain size. There are several things DataCite can do:

  • implement DDD practices over time, starting with one service and one aspect
  • use service providers wherever it makes sense (there is a future where you yourself are running less servers). This means anything that is not core to the DataCite mission and where the service provider is better and/or cheaper than what you could do internally. This evaluation can of course change over time
  • collaborate with other scholarly non-profits on infrastructure, including DataCite members and data centers, and other persistent identifier providers such as CrossRef and ORCID

This blog post was originally published on the DataCite Blog.

References

Unknown. Hannover, Blick auf Hannover. ETH-Bibliothek Zürich, Bildarchiv; 1931. doi:10.3932/ETHZ-A-000159123

Copyright © 2015 Martin Fenner. Distributed under the terms of the Creative Commons Attribution 4.0 License.