What to do about pulldown and commonmark?

Hello all,

As you may be aware, rustdoc has had some recent – and long awaited – improvements lately. @imperio has pushed through a change to use the pulldown rendering engine in place of hoedown. pulldown offers a number of bug fixes and moves us over to being “standards” (CommonMark) compliant – and, for bonus points, it’s written in Rust.

However, there is a wrinkle. Pulldown differs from hoedown in some particulars:

  • There are various rendering differences you have to be aware of.
    • For example, hoedown supports 2^5 for superscripts, whereas pulldown requires 2<sup>5</sup>
    • More details below
  • Pulldown identifies codeblocks that hoedown ignored.
    • This (initially) caused a fair amount of breakage where various bits of code were now being tested that were not before
    • We’ve since landed a “transition” PR that issues warnings about such tests instead
      • After some suitable period, these tests will execute by default.

There was a long discussion in the #rust-docs IRC channel this morning about how to phase this change in. In particular, is it sufficient to issue warnings for new tests, but allow the changes in rendering to silently change?

General policy: the three phases

Generally speaking when we have to make a change that changes behavior, we try to phase it in. The current “best practices” is to go through three general phases.

  1. “Opt-in”: keep old behavior, issue a targeted warning for those cases where behavior will change.
    • Example: issue a future compatibility lint that is “warn by default”
  2. “Opt-out”: switch to new behavior by default, but allow users to “opt out” and get back the new behavior.
    • Example: change future compatibility lint to “deny by default” (user can still set it back to warn)
  3. “Error”: switch to new behavior.
    • Example: change future compatibility lint to a hard error.

As of PR 41290, we are following this scheme for tests, at least – but not for rendering, where we basically skipped ahead to stage 3.

How we could phase in rendering

If we wanted to go through the phases for rendering, it would presumably mean that we render things with both hoedown and pulldown and issue warnings where they seem to be generating different output. I think that, in an ideal world, we would have started out by landing exactly this, rather than just changing over to pulldown by default. However, this is not where we’re at.

@GuillaumeGomez feels that writing such a PR is definitely feasible – but there was concern that this would result in a lot of warnings. Given that the beta branch is going to occur soon, there may not be enough time to calibrate the warnings so that people are only getting warnings about important rendering changes, and not minor stuff. Our experience so far has been that most rendering results are unchanged. Therefore, we don’t want to be issuing tons of warnings over nothing.

@Snorge suggested that they already did some amount of “rendering diffs” preparing for the change, so perhaps they can comment more on how difficult that is to tune properly.

Known rendering differences

So far three differences in rendering have been found:

  • superscripts must be written out using HTML (2^5 becomes 2<sup>5</sup>)
  • footnote definitions need an extra line before the definition ([^foo]: ... becomes \n[^foo]: ...)
  • code blocks can occur in lists (see e.g. this diff, which demonstrates a comment that had to be changed)

UPDATE: More differences have come up in the comments:

TL;DR the question at hand

Beta is branching next week or so. The currently nightly implements this behavior:

  • use hoedown for tests, warn about diffs with pulldown.
  • use pulldown for rendering, which may differ in some particulars.

Is this satisfactory, or should we change something? The following options have been proposed:

  • Nothing. The status quo is ok – tests continue to work, and we will phase in pulldown over time. Rendering is updated, but the effects are minor, and people will correct them over time.
  • Revert the change completely and land it with phasing. We could back out the change and use hoedown again. Then we could land pulldown again, but this time in a more phased fashion, so that we begin with warnings about rendering differences. By backing out the change, we gain time to tinker with the warnings and make sure they are not too overwhelming.
  • Try to land rendering warnings now. However, there isn’t much time left before beta, so there is some concern that this will result in a lot of unnecessary warnings being issued and an overall degraded user experience.

Thoughts?

2 Likes

I haven't seen it in docs yet, but there's a good chance someone has run into this one.

With this input:

/// Rust is number
/// 1. Yay!
///
/// It is the best
/// language since
/// 1999. Yay!

Hoedown produces the following output:

<p>Rust is number
1. Yay!</p>

<p>It is the best
language since
1999. Yay!</p>

Pulldown-cmark produces the following output:

<p>Rust is number</p>
<ol>
<li>Yay!</li>
</ol>
<p>It is the best
language since</p>
<ol start="1999">
<li>Yay!</li>
</ol>

However, the current version of the CommonMark spec (0.27) would produce:

<p>Rust is number</p>
<ol>
<li>Yay!</li>
</ol>
<p>It is the best
language since
1999. Yay!</p>

Ref: CommonMark 0.27 Example 265, 266 and explanation before

2 Likes

I’m for keeping this situation but aren’t opposed to try landing rendering warnings. I’m just a bit afraid about the fact that I’ll have enough time to implement something efficient enough.

Following from my previous post, there are likely a significant number of edge cases that will render differently from hoedown to pulldown-cmark.

Since hoedown was written to a non-CommonMark specification, it only passes 337 of the 622 CommonMark 0.27 tests (run with jgm/CommonMark's spec_tests.py). Meanwhile, pulldown-cmark passes 568 of 622 tests. Some of the differences in those test failures may be trivial, but many are likely enough to change the rendering.

Most of the docs currently in the wild have been specifically written to work with hoedown, so just hoping they will look the same when the same text is parsed as if it was written for a different specification seems tough.

I think that if we opt not to go with the status quo, my preference would be to back out the changes and land them again, this time with the diffs in place (i.e., let's take our time, no need to rush).

2 Likes

My other point with this is actually that will have a huge comparison to display some warnings, but I'm not sure this is really worth it, considering how small they are. However, I could try to target some specific changes to make it even more "clever". Well, we'll see, depending of the outcome of this conversation.

This sounds to me like you are suggesting that we should revert the change (or try to add warnings). Am I interpreting you correctly?

Is cargobomb extensible in the way that it could get a regression count for this? What we’d need to do is cargo doc all the crates and then diff the HTML they produce.

Rust’s own CI is showing a TON of those new warnings – e.g. in the log starting here. A lot of these are in the books or other external docs, but there are plenty from the standard library too. Clearly we want to run those if they’re real tests, but I still hesitate to see this forced on everyone.

1 Like

They were being run as tests before this change, and passed.

Then I’m confused about the warning – is it getting false positives? I thought the idea was that these code blocks were not tested under hoedown, but we expect they will be tested under pulldown.

The warning says "hey, pulldown-cmark will consider this a test." It does not say if that test would pass or fail. Before we added in the warning, it would have unconditionally ran as a test, and so, there was a gap of time post-initial-pulldown-landing but pre-warning-added where they were actually run.

Does that make sense?

Ok, sure. I really meant before pulldown came on the scene at all, did we have stuff that should have been tests, but weren’t executed?

Apparently; I haven’t had the chance to dig into what exactly these warnings are for yet. I hope to do so soon.

I am not a user/author/maintainer/stakeholder in any way so I was intentionally limiting my comment to information (and a bit of perhaps unwarranted speculation) rather than opinion.


I don't know much about Python but I do know they produce a great deal of documentation in their project. I looked into the history of Python's documentation tools for a bit to see if they had done anything like this in the past. This is only an investigation into what another project has done, not to be seen as advocating for a particular format or anything like that.

In 2002 they moved from a mostly-plain-text format to reStructuredText.

The reStructuredText specification has been maintained over the past 15 years by people closely tied to the Python project. Even though it is not a Python-only specification, they do everything they can to not introduce breaking changes when the spec is updated. If they add new features, they make sure they are done in a way that will not break existing documents.

Since its original release in 2002, the reStructuredText specification has been updated a number of times, but in the last 10 years almost all of the changes have been minor nits and clarifications.

Additionally, PEP documents written in the pre-reStructuredText format are still properly parsed by the pep2html tool today. It scans the header of the document to see whether it was written in a plain text format or reStructuredText. No breakage, even for 15 year old documents.

That's about all I learned in my investigations.


Since you've asked, I'll at least expand on this with a bit of my own opinion. Just remember that I'm not a stakeholder so anything I say doesn't carry much weight. Consider it rambling. I'll leave it as messy as it comes out on the first attempt:

It is easy to avoid breaking changes when you are in control of the specification.

The CommonMark spec has had fewer revisions in the last year than the year before that, but the revisions have still been significant enough to break existing documents. For example, a list in version 0.26 was immediately ended if two blank lines were encountered. In version 0.27, the items of a list can be separated by any number of blank lines. So suddenly your four consecutive lists with spaces between are one big list if you update your parser to the next version.

If you want to use CommonMark, you will need to attach a version of the specification to every document or face this same battle each and every time the specification is updated. You'll need to have a parser for each version as well.

Additionally, the pulldown-cmark parser currently only passes 568 of 622 CommonMark 0.27 tests. Fixing documents today to render properly in pulldown-cmark will be useless if the library is updated to properly match the current CommonMark specification. Everything will have to be updated again.

CommonMark lacks support for some features rustdoc seems to expect, such as footnotes. What footnote specification should be used? There isn't really one. Pulldown-cmark has a hacked-on footnote extension, but it is not part of any real specification and was more or less just made up on the spot. What are you going to do when someone else adds a different footnote specification which is either adopted by CommonMark or pulldown-cmark?

It is relatively easy to add a new feature that was previously not present compared to breaking and changing an existing feature. If you decide to add a new interlinking feature or something, it can probably be done on top of the existing system without breaking old documents, because it can use a new syntax previously not present. However, hoedown has many features in common with CommonMark that use slightly different rules.

I feel like any document currently written for hoedown should be parsed by hoedown until it has been updated and marked as a CommonMark (with version?) document. Even though the syntax looks very similar, changing from "hoedown" to "pulldown-cmark" should be considered just as significant a change as going from plain text to HTML.

8 Likes

I believe that the ultimate impact of this change is not well understood. That the potential breakage poses some risk to Rust’s reputation. I have until this point though not objected strenuously, hoping that this time it will turn out ok. So I think it would be unfair of me to do so now.

It may be that the mitigations in place are sufficient. That whatever rendering differences remain will be few, and we will receive few complaints. I don’t know. It seems likely to me that some people will see differences in their rendering, some people will file bugs and be angry, and then it will blow over, perhaps becoming a minor black mark on Rust’s history that people occasionally dredge up to point out that the Rust team doesn’t always make decisions consistent with their stated stability goals. It will almost certainly not be a major catastrophe.

If it were my call I would revert it and go back to hoedown, come up with a plan to better understand the impact, and better inform users about impending breakage. That though is much more work than just letting it go, and I am not the one to do that work.

6 Likes

This would be my preference, to revert everything for the beta, then get more time to work on making both the rendering and the test changes opt-in and adding warnings for the rendering changes.

Even if we suspect that changes to rendering will be small and not affect many people, since we can't be sure, we should follow the process we have. If we don't follow the usual process and it turns out we're wrong, the damage to the reputation we're trying to build about stability and what people can expect from upgrades to Rust will already have been done by the time we find out.

In general, I think backing changes out of nightly so that they don't move on to beta should be a normal occurrence! The whole benefit of having the trains is that when we try something and find out it's not quite working the way we thought, we have the opportunity to fix it without affecting users other than those who have opted in to nightly's lack of stability guarantees. The benefit of having the six week release cycle is that we don't have to rush to get something half-baked into a release, we can pull it out and get the time to do it right because the next train is not that far behind. It's not a rejection of a change or of the work that people have put in, it's consideration for our users and for the long term success of the project as a whole.

For a bit of context of where I'm coming from, I was not involved with the original plan on how to land this change. I do very much want to see us on pulldown-cmark over hoedown, but I don't want to take the chance of breaking things in order to do so. I do think we have potential issues around upgrades to pulldown-cmark going forward that @snorge has brought up that we should think about how to address before landing pulldown-cmark, and I don't think we should discuss the possible options before getting the time to have those discussions by reverting the change for beta.

12 Likes

Yes to this! I've been thinking about this a lot lately. It is frustrating to have something you have worked on get backed out, but I've found that once it happens I usually experience a vast sense of relief, since the pressure is off, and I can fix things at my leisure. (And, as you say, it's nothing personal.)

2 Likes

So to be clear, I do believe we did follow the process. I promised @aturon i'd explain more, might as well do it now :smile:

rfcs/text/1122-language-semver.md at master · rust-lang/rfcs · GitHub and rfcs/text/1105-api-evolution.md at master · rust-lang/rfcs · GitHub are the two RFCs relevant to Rust following SemVer. While Rustdoc isn't the language, the API evolution RFC doesn't really have anything applicable to this situation. As such, I was going by the language one.

We never made a guarantee about what form of Markdown we were using. The RFC has a section for underspecified semantics:

There are a number of areas where the precise language semantics are currently somewhat underspecified. Over time, we expect to be fully defining the semantics of all of these areas. This may cause some existing code -- and in particular existing unsafe code -- to break or become invalid. Changes of this nature should be treated as soundness changes, meaning that we should attempt to mitigate the impact and ease the transition wherever possible.

Emphasis mine. So, soundness changes... here

When compiler or type-system bugs are encountered in the language itself (as opposed to in a library), clearly they ought to be fixed. However, it is important to fix them in such a way as to minimize the impact on the ecosystem.

Agree 100%.

The first step then is to evaluate the impact of the fix on the crates found in the crates.io website (using e.g. the crater tool). If impact is found to be "small" (which this RFC does not attempt to precisely define), then the fix can simply be landed. As today, the commit message of any breaking change should include the term [breaking-change] along with a description of how to resolve the problem, which helps those people who are affected to migrate their code. A description of the problem should also appear in the relevant subteam report.

Subteam reports don't exist any more, but, when looking at doing this, we said "Crater can't evaluate this change. So what do we do?" The plan was, do the work, and then see what happens with the standard documentation distributed with Rust. This isn't always representative, but if we found major issues, we'd know about them. So, we did the work, and then tried building libstd's docs. The vast, vast majority of the diff was changing '/" in HTML attributes, in other words, something that a better normalizer would find. This was also the only change to get the tests to pass, we had this hardcoded in the rustdoc tests.

We knew there would be other small differences, but only in rendering, which is very different than causing people's builds to break. Also, this is what nightly is for; we can always back stuff out later if it turns out there was massive breakage. But we would need to rely on people looking at things, since again, crater can't do anything at all here. So we landed it, and advertised it heavily, to little reception. Nobody replied on Reddit or Twitter, there were a few comments about timelines in the users' thread.

A few days later, some people ran into the test failure issue. This was a very obscure bug, because both versions render things correctly, it's all about what's considered a test or not. Furthermore, this breakage would be limited to the crates themselves, and not their user's. We got a total of five reports about this in total, all with trivial fixes.

Now that we had information that we could actually detect changes at all, we can take a different strategy. Everyone is on board with not breaking people's code, so we wanted to get in a cargobomb run, which could actually give us an idea of the impact here. The problem with doing so is that basically only @brson can get cargobomb going; I tried to get it working but could not. And now is a super busy time for him.

So, eventually, he got one going, and was able to give reports for 1/3rd of the total. At this point, there were ~30 total failures, not all of them rustdoc related. That's still non-zero, and possibly more than we were comfortable with, and after two weeks of arguing about this, we looked to the RFC again:

In cases where the impact seems larger, any effort to ease the transition is sure to be welcome. The following are suggestions for possible steps we could take (not all of which will be applicable to all scenarios):

  • Identify important crates (such as those with many dependants) and work with the crate author to correct the code as quickly as possible, ideally before the fix even lands.
  • Work hard to ensure that the error message identifies the problem clearly and suggests the appropriate solution.
  • If we develop a rustfix tool, in some cases we may be able to extend that tool to perform the fix automatically.
  • Provide an annotation that allows for a scoped "opt out" of the newer rules, as described below. While the change is still breaking, this at least makes it easy for crates to update and get back to compiling status quickly.
  • Begin with a deprecation or other warning before issuing a hard error. In extreme cases, it might be nice to begin by issuing a deprecation warning for the unsound behavior, and only make the behavior a hard error after the deprecation has had time to circulate. This gives people more time to update their crates. However, this option may frequently not be available, because the source of a compilation error is often hard to pin down with precision.

Seems good; we had already been doing the first one, sending in PRs or explaining what was broken, and all of the previous breakage would be fixed. The extra partial Cargobomb people would take longer though, so cases 3 and 4 come into play: in this case, a flag that would display a warning, but not fail people's builds. This was quickly implemented, and aimed at removing some time later after we could help fix the stuff that had come up.

At this point was when this was deemed "not enough" and things came to a head.

So yes, I still think this process was followed overall, and is still being followed, just like any other change. Even without reverting everything, nobody's stuff will break. The warning will let people see this change even if they aren't closely following Rust development. Fixing any issues will cause stuff to work properly both before and after, so transitioning is easier and doesn't need to be coordinated with Rust versions. It's all business as usual as far as I'm concerned.

.... I'll just leave it at that for now.

1 Like

If we define "stuff" to include "rendering the same way things render now", I don't think we can say this for certain, which is the primary reason I'm advocating for a cycle of opt-in on stable.