CSV in many ways is for data what Markdown is for text documents: a very simple format that is both human- and machine-readable, and that – despite a number of shortcomings - is widely used. Given the popularity of Markdown for writing blog posts, using CSV to publish blog posts with tabular data should be an obvious thing to do, and we have just published our first blog post using CSV data. The blog post shows Table 3 from the DataCite Metadata Schema (DataCite Metadata Working Group, 2014), describing the mandatory properties.
The DataCite blog uses the Jekyll static site generator, and all blog posts are written in Markdown format. All posts have their metadata in YAML format at the beginning of the file (separated by
--- from the main text).
--- layout: post title: Publishing tabular data as blog post author: mfenner tags: - csv - metadata - blog ---
Markdown is a nice format for writing texts, but doesn't work so well for tabular data, as the current Markdown table implementations are difficult to edit and read for humans for all but the simplest tables. CSV is a much better fit for tabular data, and can be written both with a general text editor, or with a spreadsheet program or other specialized tool.
To add the metadata required for every Jekyll blog post we are again adding a YAML header, the resulting file format is CSVY, about which we have talked before (Fenner, 2016b). Jekyll can be extended to understand many file formats beyond Markdown. As a
CSVY converter doesn't exist yet, we have written this converter and released jekyll-csvy as Ruby gem (Fenner, 2016a), so that
CSVY support can be easily added to every Jekyll-powered blog.
In HTML tabular data are typically displayed as HTML tables, and this is what we are doing with the
CSVY converter. This works well for tables that are not too wide, and the converter supports inline Markdown formatting (bold, italic, links, etc.) in table cells. Block formatting (e.g. lists) is on our list of future improvements, and we will polish the converter based on user feedback. We are of course also interested in embedding CSV tables within Markdown documents, as this is a common use case.
One important feature of using CSVY for blog posts is that the CSV remains available, and can be ingested and processed by tools that can read CSVY, e.g. using the R rio (Becker et al., 2016) package.
This blog post was originally published on the DataCite Blog.
Becker, J., Chan, C.-h., Chan, G. C., Leeper, T. J., Gandrud, C., MacDonald, A., & Zahn, I. (2016). Rio: A swiss-army knife for data I/O. CRAN. Retrieved from https://cran.r-project.org/web/packages/rio/index.html
DataCite Metadata Working Group. (2014). DataCite metadata schema for the publication and citation of research data v3.1. DataCite. Retrieved from https://doi.org/10.5438/0010
Fenner, M. (2016a). jekyll-csvy: Jekyll converter for CSVY files. GitHub. Retrieved from https://github.com/datacite/jekyll-csvy
Fenner, M. (2016b). Thinking about CSV. DataCite Blog. Retrieved from https://blog.datacite.org/thinking-about-csv
The DataCite Technology Stack
DataCite is a DOI registration agency that enables the registration of scholarly content with a persistent identifier (DOI) and metadata. This content can then be searched for, reused, and connected to other scholarly resources. ...
Overcoming Development Pain
Today DataCite received an email from a user alerting us that there are some small inconsistencies with our recommended data citation format:Creator (PublicationYear): Title. Publisher. Identifierat https://www.datacite.org/services/cite-your-data.htmlCreator; (PublicationYear): Title; Publisher. ...
Differences between ORCID and DataCite Metadata
One of the first tasks for DataCite in the European Commission-funded THOR project, which started in June, was to contribute to a comparison of the ORCID and DataCite metadata standards. Together with ORCID, CERN, the British Library and Dryad we looked at how contributors, ...