Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer. The program is also viewed as a hypertext document, rather like the World Wide Web.
Literate Programming by Donald Knuth (1983) is a seminal book that introduces the concept of literate programming. Using technology available in 2014 we can make a small but important change to the last sentence:
The program is also viewed as a hypertext document on the World Wide Web.
This blog post is an example for such a document. The page is written in markdown (markdown file available here), and all embedded code was executed when this page was generated, i.e. when the markdown was converted to HTML and the blog post was published. To demonstrate this I have embedded code in three different languages below - the output is the second code block.
In R you have
cat('Hello, R world!\n')
Hello, R world!
print "Hello, Python world!"
Hello, Python world!
puts 'Hello, Ruby world!'
Hello, Ruby world!
You can also embed code within text blocks (inline), so that
3.48 * 723 becomes 2516.04. Another important option is to generate figures using the embedded code, e.g. the following figure taken from a recent publication.
# code for figure 1: density plots for citation counts for PLOS Biology # articles published in 2010 # load May 20, 2013 ALM report alm <- read.csv("data/alm_report_plos_biology_2013-05-20.csv", stringsAsFactors = FALSE) # only look at research articles alm <- subset(alm, alm$article_type == "Research Article") # only look at papers published in 2010 alm$publication_date <- as.Date(alm$publication_date) alm <- subset(alm, alm$publication_date > "2010-01-01" & alm$publication_date <= "2010-12-31") # labels colnames <- dimnames(alm)[] plos.color <- "#1ebd21" plos.source <- "scopus" plos.xlab <- "Scopus Citations" plos.ylab <- "Probability" quantile <- quantile(alm[, plos.source], c(0.1, 0.5, 0.9), na.rm = TRUE) # plot the chart opar <- par(mai = c(0.5, 0.75, 0.5, 0.5), omi = c(0.25, 0.1, 0.25, 0.1), mgp = c(3, 0.5, 0.5), fg = "black", cex.main = 2, cex.lab = 1.5, col = plos.color, col.main = plos.color, col.lab = plos.color, xaxs = "i", yaxs = "i") d <- density(alm[, plos.source], from = 0, to = 100) d$x <- append(d$x, 0) d$y <- append(d$y, 0) plot(d, type = "n", main = NA, xlab = NA, ylab = NA, xlim = c(0, 100), frame.plot = FALSE) polygon(d, col = plos.color, border = NA) mtext(plos.xlab, side = 1, col = plos.color, cex = 1.25, outer = TRUE, adj = 1, at = 1) mtext(plos.ylab, side = 2, col = plos.color, cex = 1.25, outer = TRUE, adj = 0, at = 1, las = 1) par(opar)
All this functionality is provided by knitr, a package for the R statistical programming language. knitr has been around for a while, but integration into the Jekyll blogging platform is still fragile. Earlier this week at the rOpenSci hackathon (more on this later) a group of us worked hard to improve this integration. We are still not completely done, but the source code is available here. Most importantly, all the conversion happens on the server, and we are only using freely available tools. I have now enabled this functionality for this blog, so expect more code embedded examples in the future.
Fenner, M. (2013). What can article-level metrics do for you? PLoS Biol, 11(10), e1001687. doi:10.1371/journal.pbio.1001687
Knuth, D. E., Stanford University, & Computer Science Department. (1983). Literate programming. Stanford, CA: Dept. of Computer Science, Stanford University.
Using Schema.org for DOI Registration
Three weeks ago we started assigning DOIs to every post on this blog (Fenner, 2016c). The process we implemented uses a new command line utility and integrates well with our the publishing workflow, with (almost) ...
What Can Article-Level Metrics Do for You?
Article-level metrics (ALMs) provide a wide range of metrics about the uptake of an individual journal article by the scientific community after publication. They include citations, usage statistics, discussions in online comments and social media, ...
ORCID or how to build a unique identifier for scientists in 10 easy steps
ORCID stands for Open Researcher and Contributor ID and was announced in early December. This blog post tries to summarize some of the problems that have to be solved to develop a unique identifier for scientists.1. ...