Markdown metadata format

2019-04-03 10:26发布

问题:

Is there a standard or convention for embedding metadata in a Markdown formatted post, such as the publication date or post author for conditional rendering by the renderer?

Looks like this Yaml metadata format might be it.

There are all kinds of strategies, e.g. an accompanying file mypost.meta.edn, but I'm hoping to keep it all in one file.

回答1:

There are two common formats that look very similar but are actually different in some very specific ways. And a third which is very different.

YAML Front Matter

The Jekyll static site generator popularized YAML front matter which is deliminated by YAML section markers. Yes, the dashes are actually part of the YAML syntax. And the metadata is defined using any valid YAML syntax. Here is an example from the Jekyll docs:

---
layout: post
title: Blogging Like a Hacker
---

Note that YAML front matter is not parsed by the Markdown parser, but is removed prior to parsing by Jekyll (or whatever tool you're using) and could actually be used to request a different parser than the default Markdown parser for that page (I don't recall if Jekyll does that, but I have seen some tools which do).

MultiMarkdown Metadata

The older and simpler MultiMarkdown Metadata is actually incorporated into a few Markdown parsers. While it has more recently been updated to optionally support YAML deliminators, traditionally, the metadata ends and the Markdown document begins upon the first blank line (if the first line was blank, then no metadata). And while the syntax looks very similar to YAML, only key-value pairs are supported with no implied types. Here is an example from the MultiMarkdown docs:

Title:    A Sample MultiMarkdown Document  
Author:   Fletcher T. Penney  
Date:     February 9, 2011  
Comment:  This is a comment intended to demonstrate  
          metadata that spans multiple lines, yet  
          is treated as a single value.  
CSS:      http://example.com/standard.css

The MultiMarkdown parser includes a bunch of additional options which are unique to that parser, but the key-value metadata is used across multiple parsers. Unfortunately, I have never seen any two which behaved exactly the same. Without the Markdown rules defining such a format everyone has done their own slightly different interpretation resulting in a lot of variety.

The one thing that is more common is the support for YAML deliminators and basic key-value definitions.

Pandoc Title Block

For completeness there is also the Pandoc Title Block. If has a very different syntax and is not easily confused with the other two. To my knowledge, it is only supported by Pandoc (if enabled), and it only supports three types of data: title, author, and date. Here is an example from the Pandoc documentation:

% title
% author(s) (separated by semicolons)
% date

Note that Pandoc Title Blocks are one of two style supported by Pandoc. Pandoc also supports YAML Metadata as described above. Neither extension is enabled by default.



回答2:

Most Markdown renderers seem to support this YAML format for metadata at the top of the file:

---
layout: post
published-on: 1 January 2000
title: Blogging Like a Boss
---

Content goes here.


回答3:

This is not a standard way, but works with Markdown Extra.

I wanted something that worked in the parser, but also didn't leave any clutter when I browse the files on Bitbucket where I store the files.

So I use Abbreviations from the Markdown Extra syntax.

*[blog-date]: 2018-04-27
*[blog-tags]: foo,bar

then I parse them with regexp:

 ^\*\[blog-date\]:\s*(.+)\s*$

As long as I don't write the exact keywords in the text, they leave no trace. So use some prefix obscure enough to hide them.



回答4:

A workaround use standard syntax and compatible with all other viewers.

I was also looking for a way to add application specific metadata to markdown files while make sure the existing viewers such as vscode and github page will ignore added metadata. Also to use extended markdown syntax is not a good idea because I want to make sure my files can be renderred correctly on different viewers.

So here is my solution: at beginning of markdown file, use following syntax to add metadata:


  [_metadata_:author]:- "daveying"
  [_metadata_:tags]:- "markdonw metadata"

This is the standard syntax for references, and they will not be renderred while your application can extract these data out.

The - after : is just a placeholder for url, I don't use url as value because you cannot have space in urls, but I have scenarios require array values.



回答5:

I haven't seen this mentioned elsewhere here or in various blogs discussing the subject, but in a project for my personal website, I've decided to use a simple JSON object at the top of each markdown file to store metadata. It's a little more cumbersome to type compared to some of the more textual formats above, but it's super easy to parse. Basically I just do a regex such as ^\s*({.*?})\s*(.*)$ (with the s option on to treat . as \n) to capture the json and markdown content, then parse the json with the language's standard method. It allows pretty easily for arbitrary meta fields.