Proposal: Migrate the syntax of rustdoc markdown footnotes to be compatible with the syntax used in GitHub

Summary

Migrate the syntax of rustdoc markdown footnotes to be compatible with the syntax used in GitHub.

I've opened a pull request against pulldown-cmark #654 to implement GitHub-compatible footnote syntax, which should probably just be deployed to docs.rs if it's merged. If #544 is merged instead, it will probably be usable in docs.rs as well, but deploying it to rustdoc may be a bit more complicated.

Motivation

This change should reduce confusion with syntax not working the way people have come to expect. There are two major reasons to want rustdoc to be consistent with GitHub here:

  • While doc comments themselves usually aren't rendered by GitHub, other tools like docs.rs and crates.io share README.md files with it. This means if those tools parse the file differently than GitHub does, it's almost always a mistake. It's also reasonably common for mdBook chapters to be read in GitHub, so mistakes happen when it diverges in its parsing (for example, where an RFC relies on GitHub's auto-linking behavior by accident and needs a fixup PR). It would be very silly for docs.rs README files to follow different markdown syntax from rustdoc.
  • The current behavior of pulldown-cmark is often implicitly considered a bug by end-users. See issues on pulldown-cmark's issue tracker: #20 #530 #623

Rustdoc's Markdown Footnote syntax was designed in a weekend in 2015 to make pulldown-cmark suitable for use with existing documentation. In particular, GitHub didn't add support for footnotes until 2021, and other markdown parsers of the time didn't seem to have converged on a common behavior for corner cases like this (showdown-footnotes does treat indentation as special, but unlike GitHub it doesn't require necessarily four spaces of indentation).

Guide-level explanation

Edition Guide: Footnote syntax in rustdoc

Summary

  • Footnote syntax is now compatible with GitHub-Flavored Markdown:
    • If a footnote definition (like [^this]: contents) is followed by text indented four spaces or one tab, that text will be treated as part of the footnote instead of being an indented code block.
    • Footnote definitions no longer need to be separated by blank lines.
    • If a footnote definition is immediately followed by a list, block quote, or table, it needs to be indented by four spaces to be considered part of the footnote.
    • If a footnote reference has no corresponding footnote definition, it is rendered literally instead of creating a broken link.
  • When rustdoc runs under Edition 2021, it will warn about any Markdown syntax that will be parsed differently in Edition 2024.

Details

If a footnote definition is followed by text indented four spaces or one tab, that text will be treated as part of the footnote instead of being an indented code block. To preserve the current behavior, writing code that will be interpreted the same way under both editions, separate the code block from the footnote using an un-indented HTML comment:

[^1]: footnote definition text

<!-- -->

    // indented code block
    fn main() {
        println!("hello world!");
    }

Footnote definitions no longer need to be separated by blank lines. To preserve the current behavior, if you intentionally want to write a footnote reference followed by a colon at the start of a line, use a backslash escape:

[^1]: footnote definition text
[^1]\: this is a reference, rather than a definition

If a footnote definition is immediately followed by a list, block quote, or table, it needs to be indented by four spaces to be considered part of the footnote. To preserve the current behavior, writing code that will be interpreted the same way in either edition, you'll need to fall back to HTML syntax, since there's no easy way to write code that the new syntax will accept as part of the footnote without the old syntax considering it a code block.

When migrating to the new Edition, a table within a footnote can be written like this (under the old Edition, the table is treated as source code):

[^1]:

    | column1 | column2 |
    |---------|---------|
    | row1a   | row1b   |
    | row2a   | row2b   |

Reference-level explanation

The detailed syntax for GitHub-compatible footnotes in pulldown-cmark is documented here, in specs/gfm_footnotes.txt, and will be copied here when that pull request is merged and the design declared final.

Drawbacks

This is a compatibility break in the way rustdoc parses markdown. These changes are particularly painful, because markdown accepts all text as valid. Rustdoc will warn on a few egregiously bad cases, but it still produces docs (they're warnings, not errors, unless someone uses #![deny(rustdoc::all)]), which means people can deploy broken docs without realizing it.

While this particular change is expected to actually affect very few people (how many even know about footnotes?), it's still a change in behavior guarded only with warnings.

Rationale and alternatives

The biggest problem with doing nothing is that we continue to live with rustdoc having a footnote syntax that looks like, but is subtly different than, the popular footnote syntax used on GitHub and GitLab, documented on the authoritative-looking Markdown Guide, and implemented in tools like VSCode and Pandoc. The best syntax is the one that everybody else uses, except when there's a compelling reason to be different, which there really isn't here. The only reason rustdoc isn't compatible with everybody else's markdown footnotes is because it was a (relatively) early adopter, and the rollout was rushed.

Some other possibilities include:

  • Just pushing out the change without an Edition. When upgrading to a new version of pulldown-cmark that changes its parser to match the current version of the CommonMark spec, there's no need to wait for an Edition, because any changes are extremely minor and rustdoc is documented to implement CommonMark.
  • Add #[doc(gfm)] and #[doc(not_gfm)] attributes to explicitly request the markdown syntax variant, and change the default over the Edition. This means having to mention different markdown flavors in the rustdoc book's list of doc attributes, when most rustdoc users should never need to know about any of this. It's also not the way syntax changes for Rust itself are done (flexibility for flexibility's sake is bad design).
  • Instead of using an Edition, rustdoc could do what it did when switching to CommonMark: go through a warning period, then eventually remove the old syntax entirely. The second footnote syntax probably isn't as big of a burden as Hoedown was, so maybe it's not justified breaking compatibility like that.

Prior art

Unresolved questions

  • Is this worth doing at all? I expect most people won't even notice it, since footnotes aren't that popular, and the change isn't that big (the most important result is that footnote definitions don't need to be separated by blank lines any more, which will probably fix more docs than it breaks).
  • The deciding factor to bother with an Edition and the "parser divergence warning", or just make the change everywhere with no attempt to mitigate, is whether there are very many docs that will be affected at all. This should be verified using crater, with a Draft PR containing the code to implement the warning, but patched to return a hard error instead, probably.
  • Will the old syntax ever be removed? Third-party tooling that consumes rustdoc JSON files will need to parse the markdown inside, and probably won't be able to parse the syntax unless it uses pulldown-cmark specifically, so keeping the old syntax around is a liability for more than just rustdoc devs. Again, a crater run will probably be the deciding factor.

Future possibilities

What happens when we want to support for new Markdown extensions?

This has been a problem before. Technically, docs could have been broken by the addition of new features such as intra-doc links, and could be broken if things like $-math are added. Tactics to mitigate this include deploying the new syntax over an Edition (like how async became a keyword in 2018), but it might instead be a good idea to allow markdown extensions to be toggled with an attribute:

#![doc(enable_math)]

This is probably a bad idea for the GFM footnote syntax, since people who just want to write doc comments probably shouldn't be expected to deal with weird legacy syntax nonsense, but it seems reasonable to allow people to express whether they need math syntax or not.

10 Likes

There's also #![doc = include_str!("../README.md")].

This is the first I'm hearing of rustdoc supporting it :grin:

3 Likes

I use them, but very rarely in a way this would mess with I think. Wonder how hard it would be to survey ala crater.

I dislike dedicating to follow GitHub specifically... but recognize there's no "extension standard" either. Having a GFM mode (among others) is fine. Considering each breaking change independently is probably also fine.

2 Likes

The deciding factor to bother with an Edition and the "parser divergence warning", or just make the change everywhere with no attempt to mitigate, is whether there are very many docs that will be affected at all. This should be verified using crater, right?

How do you plan to verify this with crater? It can only report whether compilation succeeded or failed, not whether the HTML output changed. To get an accurate crater report you'd need rustdoc to parse it both ways and give a hard error if they differ.

GFM is the most widely used CommonMark extension, and it has a formal spec that implementers can use. In fact, it is so common that many people don't know that tables are part of GFM, but not CommonMark, because I've yet to see a markdown parser that doesn't support it. I think following GFM for footnotes makes by far the most sense.

1 Like

Sounds good to me. I'm very curious to see the impact on the existing ecosystem as well. Making it an error and then running crater like jyn514 suggested seems like a good way to check it.

That's how it would be done, yeah. It's the same way the warning would be implemented, but it would be a Draft PR to make it a hard error, used only to do this compatibility check.

2 Likes

I would encourage the route of just making the change (possibly with a temporary future warning) instead of using the edition mechanism. Running crater should quickly tell you how risky that is. With an edition, it has to forever support both forms which I think would be more debt and complexity.

4 Likes

I agree with both just making this change, and also to guard against future updates to GFM by calling this mode gfm_footnotes if a feature flag is used. But also don't bother with the feature flag.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.