With the December 18 issue Nature started to support XMP markup in article PDFs (reported last week on the Nascent blog by Tony Hammond)1. XMP stands for Extensible Metadata Platform and is a technology to embed metadata in files, including PDFs2. XMP was created by Adobe (with XMP support in PDF files since 2001), but is an open standard with backing by others, including Creative Commons3. The Digital Object Identifier (DOI) is the most important piece of information in the metadata, as the DOI provides a link to the journal publisher website where more metadata can be retrieved. XMP support in scientific PDFs is unfortunately still very uncommon and probably hasn't changed much since Pierre Lindenbaum checked last year4.
Adding metadata to PDFs seems to be a no-brainer. We have done the same with music (mp3 ID) and photos (IPTC and EXIF) for years and it has been a tremendous help in organizing these files stored on our computers. Unfortunately there aren't too many tools that can extract the DOI or other metadata from the XMP in article PDFs. But I expect more desktop software to support XMP, once XMP support in scientific articles is more widespread. We will then be able to add a journal PDF to our reference manager of choice and have the relevant metadata (including authors, title, journal and issue) automatically filled in. As well as many other creative uses. Until then we need tools like Papers or Mendeley that can extract metadata from PDF files without this XMP information.
Making the most out of available Metadata
Metadata are essential for finding, accessing, and reusing scholarly content, i.e. to increase the FAIRness [Wilkinson et al. (2016)] of datasets and other scholarly resources. A rich and standardized metadata schema that is widely used is the first step, ...
Using YAML Frontmatter with CSV
CSV (comma-separated values) is a popular file format for data. It is popular because it is very simple: CSV is text-based and any application that can open text files can read or write CSV. This makes it a good fit for digital preservation. ...
Introducing the PID Graph
Persistent identifiers (PIDs) are not only important to uniquely identify a publication, dataset, or person, but the metadata for these persistent identifiers can provide unambiguous linking between persistent identifiers of the same type, e.g. ...