Data visualization is all about telling stories with data, something that is of course not only important for scholarly content, but for example increasingly common in journalism. This is a big and complex topic, but I hope the following will get you started.
Work on visualization of scientific data should start with a good understanding of the best practices and pitfalls of data visualization in general, as well as the specific aspects of visualizing scientific data. The following resources have helped me get started - please suggest more in the comments:
In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system. Faceting can be used to generate the same plot for different subsets of the dataset. It is the combination of these independent components that make up a graphic.
There are many great tools available, pick one and learn it well. Some options include:
I do most visualizations in either R or d3.js. Both are open source tools with a large community and a rich set of libraries, examples and documentation, and both take a systematic approach to data visualization (see grammar of graphics above).
Unless your interest is more in information design - see Information is beautiful for some great examples - data visualization is tightly coupled with data analysis. You need to know at least the basics of data analysis to do proper data visualizations, e.g. how to handle wrongly formatted data (e.g. text in a number column), missing values and outliers. The most time-consuming step in my experience is data transformation, i.e. bringing data into the format that you want for the analysis and visualization.
R, Python and the relatively new Julia are popular languages for data analysis available as open source. There are many packages for these languages that help with common data analysis problems. One additional advantage of using a proper language over a set of tools cobbled together is that it is easy to automatically recreate a visualization with a new set of data - convenient when you need to analyze and visualize an ongoing experiment that repeatedly produces new data.
Too many scientific data are still visualized using bitmap graphic formats such as
png. These formats are not appropriate for charts and only make sense for images. They don’t scale to the screen resolution, and it is very hard to impossible to reuse or even modify them. Use vector graphic formats such as
svg is my preferred format because in contrast to
At the end of the day data visualization is all about telling a story with data. Unfortunately the current state of affairs for scientific visualizations is very different. In my opinion most graphs and figures used in publications don’t provide the data underlying the visualization (Datawrapper is a great example how this can be done), focus too much on detail rather than the overall message, don’t take advantage of the different chart types available, and are sometimes even misleading. And I’m not even talking about the fact that figures in scholarly papers are almost never interactive. It rarely happens that I read a paper and get excited by looking at a figure - if I do it is usually because the underlying data are so compelling that even the simplest visualization will convey the right message.
We should become more creative with visualizing data in scholarly documents, and one important step towards that goal is publishers accepting more reasonable file formats in manuscript submissions - instead of just
eps (PLOS), or
Visualizing tweets linking to a paper
DNA Barcoding the Native Flowering Plants and Conifers of Wales has been one of the most popular new PLoS ONE papers in June. In the paper Natasha de Vere et al. describe a DNA barcode resource that covers the 1143 native Welsh flowering plants and ...
Crowdsourcing the analysis of scholarly tweets
In December Euan Adie and I started the CrowdoMeter project, an analysis of the semantic content of tweets linking to scholarly papers. Because classifying almost 500 tweets is a lot of work, we turned this into a crowdsourcing project. We got help from 36 people, ...
Creating charts with Datawrapper
Figures are an important part of any scientific document. While the kind of figure commonly used obviously varies between disciplines, charts are an important part of many publications. There are two problems in how charts are currently used:the data ...