CSV in many ways is for data what Markdown is for text documents: a very simple format that is both human- and machine-readable, and that – despite a number of shortcomings - is widely used. Given the popularity of Markdown for writing blog posts, using CSV to publish blog posts with tabular data should be an obvious thing to do, and we have just published our first blog post using CSV data. The blog post shows Table 3 from the DataCite Metadata Schema (DataCite Metadata Working Group, 2014), describing the mandatory properties.
The DataCite blog uses the Jekyll static site generator, and all blog posts are written in Markdown format. All posts have their metadata in YAML format at the beginning of the file (separated by
--- from the main text).
--- layout: post title: Publishing tabular data as blog post author: mfenner tags: - csv - metadata - blog ---
Markdown is a nice format for writing texts, but doesn't work so well for tabular data, as the current Markdown table implementations are difficult to edit and read for humans for all but the simplest tables. CSV is a much better fit for tabular data, and can be written both with a general text editor, or with a spreadsheet program or other specialized tool.
To add the metadata required for every Jekyll blog post we are again adding a YAML header, the resulting file format is CSVY, about which we have talked before (Fenner, 2016b). Jekyll can be extended to understand many file formats beyond Markdown. As a
CSVY converter doesn't exist yet, we have written this converter and released jekyll-csvy as Ruby gem (Fenner, 2016a), so that
CSVY support can be easily added to every Jekyll-powered blog.
In HTML tabular data are typically displayed as HTML tables, and this is what we are doing with the
CSVY converter. This works well for tables that are not too wide, and the converter supports inline Markdown formatting (bold, italic, links, etc.) in table cells. Block formatting (e.g. lists) is on our list of future improvements, and we will polish the converter based on user feedback. We are of course also interested in embedding CSV tables within Markdown documents, as this is a common use case.
One important feature of using CSVY for blog posts is that the CSV remains available, and can be ingested and processed by tools that can read CSVY, e.g. using the R rio (Becker et al., 2016) package.
This blog post was originally published on the DataCite Blog.
Becker, J., Chan, C.-h., Chan, G. C., Leeper, T. J., Gandrud, C., MacDonald, A., & Zahn, I. (2016). Rio: A swiss-army knife for data I/O. CRAN. Retrieved from https://cran.r-project.org/web/packages/rio/index.html
DataCite Metadata Working Group. (2014). DataCite metadata schema for the publication and citation of research data v3.1. DataCite. Retrieved from https://doi.org/10.5438/0010
Fenner, M. (2016a). jekyll-csvy: Jekyll converter for CSVY files. GitHub. Retrieved from https://github.com/datacite/jekyll-csvy
Fenner, M. (2016b). Thinking about CSV. DataCite Blog. Retrieved from https://blog.datacite.org/thinking-about-csv
It's all about Relations
In a guest post two weeks ago Elizabeth Hull explained that only 6% of Dryad datasets associated with a journal article are found in the reference list of that article, data she also presented at the IDCC conference in February (Mayo, Hull, & Vision,...
Citeproc YAML for bibliographies
The standard local file formats for bibliographic data are probably bibtex and RIS. They have been around for a long time, and are supported by all reference managers and many other tools and services. Unfortunately these formats are far from perfect:neither...