What to do about pulldown and commonmark?

In order to help everyone understand the type of breakage that will occur, I will share some more or less representative examples from crates.io crates. I found these all by compiling the clap crate, turning the docs into just trees of tags, and diffing for differences in them (ignoring tag content).

Here is an example of breakage in a crates.io crate from current stable to current nightly:

Crate: ansi-term, 0.9.0, last release 27 Aug 2016

Breakage: link reference and definition, lines 401-402

Stable rustdoc (hoedown) output:

<p>It might make more sense to look at a <a href="https://upload.wikimedia.org/wikipedia/en/1/15/Xterm_256color_chart.svg">colour chart</a>.</p>

Nightly rustdoc (pulldown-cmark) output:

<p>It might make more sense to look at a [colour chart][cc].
[cc]: https://upload.wikimedia.org/wikipedia/en/1/15/Xterm_256color_chart.svg</p>

The crate was last published over half a year ago and crates.io reports it is downloaded about 2,000 times per day right now. Someone is probably generating documentation that includes this crate’s documentation.

The markup is perfectly fine and accepted in hoedown (and Markdown.pl, since hoedown is pretty faithful to that) as a link reference and definition. However, CommonMark requires a blank line between the end of a paragraph and a link reference definition, so pulldown-cmark correctly leaves it as a paragraph.

Here is another instance of the same issue, this time from the clap crate.

Breakage: three link reference definitions, lines 57-61

Stable rustdoc (hoedown) output:

<p>Various settings that apply to arguments and may be set, unset, and checked via getter/setter
methods <a href="./struct.Arg.html#method.set"><code>Arg::set</code></a>, <a href="./struct.Arg.html#method.unset"><code>Arg::unset</code></a>, and <a href="./struct.Arg.html#method.is_set"><code>Arg::is_set</code></a></p>

Nightly rustdoc (pulldown-cmark) output:

<p>Various settings that apply to arguments and may be set, unset, and checked via getter/setter
methods [<code>Arg::set</code>], [<code>Arg::unset</code>], and [<code>Arg::is_set</code>]
[<code>Arg::set</code>]: ./struct.Arg.html#method.set
[<code>Arg::unset</code>]: ./struct.Arg.html#method.unset
[<code>Arg::is_set</code>]: ./struct.Arg.html#method.is_set</p>

In this case, the hoedown one doesn’t even seem to agree with Markdown.pl. The pulldown-cmark one does agree with CommonMark, but of course that’s not the output we’d want to see.

There are a number of other differences just from the clap crate and the other docs that come along when its docs are generated. I won’t go into detail on every one here, but some other examples I found multiple occurrences of in this rather small collection of docs are:

  • “automatic links” in pulldown-cmark need <> around them, while hoedown didn’t require that
  • lists in pulldown-cmark can occur immediately after paragraphs, but hoedown won’t treat those as a list unless there was a blank line before (usually this one would be “fixed” by switching to pulldown-cmark)
  • emphasis marks around a word need a space after the closing one in commonmark ("**NOTE:**Words words" stays as-is in pulldown-cmark, but is “**NOTE:**Words words” in hoedown and Markdown.pl)

EDIT: The reasoning behind the final bullet point above regarding emphasis marks was incorrectly worded. If the text was **NOTE**Words words, it would be fine, but since it was **NOTE:**Words words (note the : between NOTE and the **), it didn’t work as expected. CommonMark does not count the second ** as a right-flanking delimiter run because it is preceded by a punctuation character and not followed by whitespace.

9 Likes