Then I’m confused about the warning – is it getting false positives? I thought the idea was that these code blocks were not tested under hoedown, but we expect they will be tested under pulldown.
The warning says “hey, pulldown-cmark will consider this a test.” It does not say if that test would pass or fail. Before we added in the warning, it would have unconditionally ran as a test, and so, there was a gap of time post-initial-pulldown-landing but pre-warning-added where they were actually run.
Does that make sense?
Ok, sure. I really meant before pulldown came on the scene at all, did we have stuff that should have been tests, but weren’t executed?
Apparently; I haven’t had the chance to dig into what exactly these warnings are for yet. I hope to do so soon.
I am not a user/author/maintainer/stakeholder in any way so I was intentionally limiting my comment to information (and a bit of perhaps unwarranted speculation) rather than opinion.
I don’t know much about Python but I do know they produce a great deal of documentation in their project. I looked into the history of Python’s documentation tools for a bit to see if they had done anything like this in the past. This is only an investigation into what another project has done, not to be seen as advocating for a particular format or anything like that.
In 2002 they moved from a mostly-plain-text format to reStructuredText.
The reStructuredText specification has been maintained over the past 15 years by people closely tied to the Python project. Even though it is not a Python-only specification, they do everything they can to not introduce breaking changes when the spec is updated. If they add new features, they make sure they are done in a way that will not break existing documents.
Since its original release in 2002, the reStructuredText specification has been updated a number of times, but in the last 10 years almost all of the changes have been minor nits and clarifications.
Additionally, PEP documents written in the pre-reStructuredText format are still properly parsed by the pep2html tool today. It scans the header of the document to see whether it was written in a plain text format or reStructuredText. No breakage, even for 15 year old documents.
That’s about all I learned in my investigations.
Since you’ve asked, I’ll at least expand on this with a bit of my own opinion. Just remember that I’m not a stakeholder so anything I say doesn’t carry much weight. Consider it rambling. I’ll leave it as messy as it comes out on the first attempt:
It is easy to avoid breaking changes when you are in control of the specification.
The CommonMark spec has had fewer revisions in the last year than the year before that, but the revisions have still been significant enough to break existing documents. For example, a list in version 0.26 was immediately ended if two blank lines were encountered. In version 0.27, the items of a list can be separated by any number of blank lines. So suddenly your four consecutive lists with spaces between are one big list if you update your parser to the next version.
If you want to use CommonMark, you will need to attach a version of the specification to every document or face this same battle each and every time the specification is updated. You’ll need to have a parser for each version as well.
Additionally, the pulldown-cmark parser currently only passes 568 of 622 CommonMark 0.27 tests. Fixing documents today to render properly in pulldown-cmark will be useless if the library is updated to properly match the current CommonMark specification. Everything will have to be updated again.
CommonMark lacks support for some features rustdoc seems to expect, such as footnotes. What footnote specification should be used? There isn’t really one. Pulldown-cmark has a hacked-on footnote extension, but it is not part of any real specification and was more or less just made up on the spot. What are you going to do when someone else adds a different footnote specification which is either adopted by CommonMark or pulldown-cmark?
It is relatively easy to add a new feature that was previously not present compared to breaking and changing an existing feature. If you decide to add a new interlinking feature or something, it can probably be done on top of the existing system without breaking old documents, because it can use a new syntax previously not present. However, hoedown has many features in common with CommonMark that use slightly different rules.
I feel like any document currently written for hoedown should be parsed by hoedown until it has been updated and marked as a CommonMark (with version?) document. Even though the syntax looks very similar, changing from “hoedown” to “pulldown-cmark” should be considered just as significant a change as going from plain text to HTML.
I believe that the ultimate impact of this change is not well understood. That the potential breakage poses some risk to Rust’s reputation. I have until this point though not objected strenuously, hoping that this time it will turn out ok. So I think it would be unfair of me to do so now.
It may be that the mitigations in place are sufficient. That whatever rendering differences remain will be few, and we will receive few complaints. I don’t know. It seems likely to me that some people will see differences in their rendering, some people will file bugs and be angry, and then it will blow over, perhaps becoming a minor black mark on Rust’s history that people occasionally dredge up to point out that the Rust team doesn’t always make decisions consistent with their stated stability goals. It will almost certainly not be a major catastrophe.
If it were my call I would revert it and go back to hoedown, come up with a plan to better understand the impact, and better inform users about impending breakage. That though is much more work than just letting it go, and I am not the one to do that work.
This would be my preference, to revert everything for the beta, then get more time to work on making both the rendering and the test changes opt-in and adding warnings for the rendering changes.
Even if we suspect that changes to rendering will be small and not affect many people, since we can’t be sure, we should follow the process we have. If we don’t follow the usual process and it turns out we’re wrong, the damage to the reputation we’re trying to build about stability and what people can expect from upgrades to Rust will already have been done by the time we find out.
In general, I think backing changes out of nightly so that they don’t move on to beta should be a normal occurrence! The whole benefit of having the trains is that when we try something and find out it’s not quite working the way we thought, we have the opportunity to fix it without affecting users other than those who have opted in to nightly’s lack of stability guarantees. The benefit of having the six week release cycle is that we don’t have to rush to get something half-baked into a release, we can pull it out and get the time to do it right because the next train is not that far behind. It’s not a rejection of a change or of the work that people have put in, it’s consideration for our users and for the long term success of the project as a whole.
For a bit of context of where I’m coming from, I was not involved with the original plan on how to land this change. I do very much want to see us on pulldown-cmark over hoedown, but I don’t want to take the chance of breaking things in order to do so. I do think we have potential issues around upgrades to pulldown-cmark going forward that @snorge has brought up that we should think about how to address before landing pulldown-cmark, and I don’t think we should discuss the possible options before getting the time to have those discussions by reverting the change for beta.
Yes to this! I’ve been thinking about this a lot lately. It is frustrating to have something you have worked on get backed out, but I’ve found that once it happens I usually experience a vast sense of relief, since the pressure is off, and I can fix things at my leisure. (And, as you say, it’s nothing personal.)
So to be clear, I do believe we did follow the process. I promised @aturon i’d explain more, might as well do it now
https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md and https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md are the two RFCs relevant to Rust following SemVer. While Rustdoc isn’t the language, the API evolution RFC doesn’t really have anything applicable to this situation. As such, I was going by the language one.
We never made a guarantee about what form of Markdown we were using. The RFC has a section for underspecified semantics:
There are a number of areas where the precise language semantics are currently somewhat underspecified. Over time, we expect to be fully defining the semantics of all of these areas. This may cause some existing code – and in particular existing unsafe code – to break or become invalid. Changes of this nature should be treated as soundness changes, meaning that we should attempt to mitigate the impact and ease the transition wherever possible.
Emphasis mine. So, soundness changes… here
When compiler or type-system bugs are encountered in the language itself (as opposed to in a library), clearly they ought to be fixed. However, it is important to fix them in such a way as to minimize the impact on the ecosystem.
The first step then is to evaluate the impact of the fix on the crates found in the crates.io website (using e.g. the crater tool). If impact is found to be “small” (which this RFC does not attempt to precisely define), then the fix can simply be landed. As today, the commit message of any breaking change should include the term [breaking-change] along with a description of how to resolve the problem, which helps those people who are affected to migrate their code. A description of the problem should also appear in the relevant subteam report.
Subteam reports don’t exist any more, but, when looking at doing this, we said “Crater can’t evaluate this change. So what do we do?” The plan was, do the work, and then see what happens with the standard documentation distributed with Rust. This isn’t always representative, but if we found major issues, we’d know about them. So, we did the work, and then tried building
libstd's docs. The vast, vast majority of the diff was changing
" in HTML attributes, in other words, something that a better normalizer would find. This was also the only change to get the tests to pass, we had this hardcoded in the rustdoc tests.
We knew there would be other small differences, but only in rendering, which is very different than causing people’s builds to break. Also, this is what nightly is for; we can always back stuff out later if it turns out there was massive breakage. But we would need to rely on people looking at things, since again, crater can’t do anything at all here. So we landed it, and advertised it heavily, to little reception. Nobody replied on Reddit or Twitter, there were a few comments about timelines in the users’ thread.
A few days later, some people ran into the test failure issue. This was a very obscure bug, because both versions render things correctly, it’s all about what’s considered a test or not. Furthermore, this breakage would be limited to the crates themselves, and not their user’s. We got a total of five reports about this in total, all with trivial fixes.
Now that we had information that we could actually detect changes at all, we can take a different strategy. Everyone is on board with not breaking people’s code, so we wanted to get in a cargobomb run, which could actually give us an idea of the impact here. The problem with doing so is that basically only @brson can get cargobomb going; I tried to get it working but could not. And now is a super busy time for him.
So, eventually, he got one going, and was able to give reports for 1/3rd of the total. At this point, there were ~30 total failures, not all of them rustdoc related. That’s still non-zero, and possibly more than we were comfortable with, and after two weeks of arguing about this, we looked to the RFC again:
In cases where the impact seems larger, any effort to ease the transition is sure to be welcome. The following are suggestions for possible steps we could take (not all of which will be applicable to all scenarios):
- Identify important crates (such as those with many dependants) and work with the crate author to correct the code as quickly as possible, ideally before the fix even lands.
- Work hard to ensure that the error message identifies the problem clearly and suggests the appropriate solution.
- If we develop a rustfix tool, in some cases we may be able to extend that tool to perform the fix automatically.
- Provide an annotation that allows for a scoped “opt out” of the newer rules, as described below. While the change is still breaking, this at least makes it easy for crates to update and get back to compiling status quickly.
- Begin with a deprecation or other warning before issuing a hard error. In extreme cases, it might be nice to begin by issuing a deprecation warning for the unsound behavior, and only make the behavior a hard error after the deprecation has had time to circulate. This gives people more time to update their crates. However, this option may frequently not be available, because the source of a compilation error is often hard to pin down with precision.
Seems good; we had already been doing the first one, sending in PRs or explaining what was broken, and all of the previous breakage would be fixed. The extra partial Cargobomb people would take longer though, so cases 3 and 4 come into play: in this case, a flag that would display a warning, but not fail people’s builds. This was quickly implemented, and aimed at removing some time later after we could help fix the stuff that had come up.
At this point was when this was deemed “not enough” and things came to a head.
So yes, I still think this process was followed overall, and is still being followed, just like any other change. Even without reverting everything, nobody’s stuff will break. The warning will let people see this change even if they aren’t closely following Rust development. Fixing any issues will cause stuff to work properly both before and after, so transitioning is easier and doesn’t need to be coordinated with Rust versions. It’s all business as usual as far as I’m concerned.
… I’ll just leave it at that for now.
If we define “stuff” to include “rendering the same way things render now”, I don’t think we can say this for certain, which is the primary reason I’m advocating for a cycle of opt-in on stable.
I think part of the problem is that we don’t have such a clear policy around changes to things like rustdoc output, which do not affect your ability to compile code or the behavior of the resulting code. In contrast, we’ve provided explicit guidance for API evolution and language evolution, which for practical reasons do permit certain forms of limited breakage. On the other hand, the policy for our tools (cargo, rustup, rustdoc etc) is far less clear. We should fix that!
From my perspective, the change to warn when new tests would be run (but not run them) takes care of the primary stability concern, which is code suddenly failing to compile upon upgrade. But I do understand the point that silent changes to rendering behavior could damage our stability stability story.
There’s a principles vs pragmatics issue here as well: the time we spend trying to more thoroughly examine and warn about the (probably minor) changes here is potentially time we’re not spending improving rustdoc itself. (That could be a false dichotomy; with OSS, it’s always hard to know.) From IRC discussion yesterday, it sounded like @Snorge already had done a lot of work on a tool that could effectively diff the output. Is that right? Do we think we’re not so far of from being able to gather some more data here?
A final point: I know that many have been waiting a long time to land this change, but in the big scheme of things, there’s not so much urgency to push it out in this particular release cycle. The conservative option is always to delay. But I think, if we head that direction, we need to have a very clear story about what steps need to be taken before we do land the change, and e.g. what we will do if we are unable to gather useful data, and all stakeholders need to be on board.
A couple concerns here:
How confident are we that we can get enough people to opt-in and discover/fix problems? Maybe making it prominent in the release announcement is enough. But people may need some kind of diff tool to actually find changes.
Usually with “permissible breakage” we try to set things up so that you can “straddle” versions. That is, ideally you could change your docs so that they render correctly in both hoedown and pulldown-cmark. Is that possible? If not, how do we imagine the opt-in working?
I only see these concerns as relevant right this moment if we decide we’re not reverting for beta… we can always choose to revert for beta and then go forward with exactly what’s been done now in the next cycle.
Is there a chance we can come to a decision on what to do for beta first?
Oh, definitely, and if we don’t reach a resolution in the near future that’s what we should do. But I’m operating under the assumption that we have at least a couple more days to try to reach consensus before landing a revert.
To add on to what @aturon said, we constantly change rustdoc’s output. We fix bugs, tweak CSS, add new things.
Even without textual changes, there’s extra links, the removal of libc docs entirely, some other style things.
It is not possible to have stable rendering without completely abandoning any and all improvements.
What we could do, and one of the things I’d like to do, is make rustdoc’s rendering stuff pluggable, such that if you wanted to ensure things always stayed the same, you could use the exact same renderer to generate your docs, never getting any of those fixes or improvements. Effectively, that API would be what’s considered stable.
Yes, this 10000%. There are real, actual bugs that we keep in rustdoc because we cannot fix them without these changes. And it’s already been years of this delay, doing it Yet Again is extremely demoralizing, especially given that we mostly have theoretical concerns here.
If the community had gotten up in arms when we first announced this change, I’d have felt very differently. But only a few people are upset, and it’s mostly around concerns that others will be concerned in the future, which I can appreciate but also are extremely hard to hear after years and years of delay after delay and obstruction after obstruction.
If the people who actually do the work are empowered to actually have control over their work, then waiting for that day to come gets easier. It still stings, but it’s easier. I do believe that we’ll get there, and am kinda sad that it took this situation to actually make it finally happen, maybe.
In order to help everyone understand the type of breakage that will occur, I will share some more or less representative examples from crates.io crates. I found these all by compiling the
clap crate, turning the docs into just trees of tags, and diffing for differences in them (ignoring tag content).
Here is an example of breakage in a crates.io crate from current stable to current nightly:
Crate: ansi-term, 0.9.0, last release 27 Aug 2016
Stable rustdoc (hoedown) output:
<p>It might make more sense to look at a <a href="https://upload.wikimedia.org/wikipedia/en/1/15/Xterm_256color_chart.svg">colour chart</a>.</p>
Nightly rustdoc (pulldown-cmark) output:
<p>It might make more sense to look at a [colour chart][cc]. [cc]: https://upload.wikimedia.org/wikipedia/en/1/15/Xterm_256color_chart.svg</p>
The crate was last published over half a year ago and crates.io reports it is downloaded about 2,000 times per day right now. Someone is probably generating documentation that includes this crate’s documentation.
The markup is perfectly fine and accepted in hoedown (and Markdown.pl, since hoedown is pretty faithful to that) as a link reference and definition. However, CommonMark requires a blank line between the end of a paragraph and a link reference definition, so pulldown-cmark correctly leaves it as a paragraph.
Here is another instance of the same issue, this time from the clap crate.
Stable rustdoc (hoedown) output:
<p>Various settings that apply to arguments and may be set, unset, and checked via getter/setter methods <a href="./struct.Arg.html#method.set"><code>Arg::set</code></a>, <a href="./struct.Arg.html#method.unset"><code>Arg::unset</code></a>, and <a href="./struct.Arg.html#method.is_set"><code>Arg::is_set</code></a></p>
Nightly rustdoc (pulldown-cmark) output:
<p>Various settings that apply to arguments and may be set, unset, and checked via getter/setter methods [<code>Arg::set</code>], [<code>Arg::unset</code>], and [<code>Arg::is_set</code>] [<code>Arg::set</code>]: ./struct.Arg.html#method.set [<code>Arg::unset</code>]: ./struct.Arg.html#method.unset [<code>Arg::is_set</code>]: ./struct.Arg.html#method.is_set</p>
In this case, the hoedown one doesn’t even seem to agree with Markdown.pl. The pulldown-cmark one does agree with CommonMark, but of course that’s not the output we’d want to see.
There are a number of other differences just from the
clap crate and the other docs that come along when its docs are generated. I won’t go into detail on every one here, but some other examples I found multiple occurrences of in this rather small collection of docs are:
- “automatic links” in pulldown-cmark need
<>around them, while hoedown didn’t require that
- lists in pulldown-cmark can occur immediately after paragraphs, but hoedown won’t treat those as a list unless there was a blank line before (usually this one would be “fixed” by switching to pulldown-cmark)
- emphasis marks around a word need a space after the closing one in commonmark ("
**NOTE:**Words words" stays as-is in pulldown-cmark, but is “**NOTE:**Words words” in hoedown and Markdown.pl)
EDIT: The reasoning behind the final bullet point above regarding emphasis marks was incorrectly worded. If the text was
**NOTE**Words words, it would be fine, but since it was
**NOTE:**Words words (note the : between NOTE and the
**), it didn’t work as expected. CommonMark does not count the second
** as a right-flanking delimiter run because it is preceded by a punctuation character and not followed by whitespace.
Thanks so much @Snorge for doing this investigation. Your findings suggest that the problems are more widespread and significant than what had previously been laid out (e.g. in the original post). That’s definitely enough to raise concerns about Rust’s stability if we ship it to stable, IMO. @steveklabnik, @GuillaumeGomez, thoughts?
How far along is your diffing tool, in terms of being something we could use in the compiler to warn about anticipated changes?
Sorry, I really don’t even know enough about Rust yet to make a tool to do this in an automated fashion. I’m still working my way through reading the book.
All I’ve done to find those results is:
cargo docon the master branch of the
clapcrate with rust nightly-2017-03-28 and nightly-2017-04-08. (The day before #40338 landed and the day after #41112. Not sure if there would be better choices to use here.)
- Made slight modifications to this example script from html5ever to only output one tag per line, omitting all content, attributes, etc. (So we’re ignoring the actual text between tags, just looking at tags to get a diff with virtually no “noise”.)
- Ran both sets of docs through that script to get files of just tags
- Diffed the resulting directories
It could certainly be the case that this crate was an outlier. Most crates seemed to have almost no documentation at all, so I just went with the first one I found in random clicking around that actually appeared to have something to check on. There were almost 7000 HTML files in the directory but only 10 files had differences. Some of those 10 had multiple differences and maybe most of the 7000 files were autogenerated or something, I don’t know enough about the directories to know.
I suppose if there was a crate that came along with a lot of other crates’ docs I could try the same thing again, but I’m guessing someone else would be able to do something like that on a wide array of crates faster than I could figure it out.
I wanted to run the same thing on the standard library docs but I couldn’t figure out how to run two versions of
x.py doc on one set of files.
I think it is better to let
pulldown-cmark provide a hoedown-compatible mode, which
generates output like hoedown and report those incompatible issues to the driver (rustdoc), based on the diff. This should be easier to generate the warnings inside the crate instead of inspecting these from outside in the compiler.
The issue isn’t in output generation but parsing.
pulldown-cmark separates these two tasks, but it is not at all capable of a “hoedown-compatible” parsing mode.
In its current state, it isn’t even easily modifiable to fix the 50 failing CommonMark spec tests, because it made certain assumptions that later CommonMark versions have clashed with. Raph seems to have more or less given up on the current approach and it’s likely the fixes won’t come until the inside is redone in a significant manner.
It would be nice to have pluggable support for other parsing modes (especially to allow multiple versions of CommonMark, especially since it is still pre-1.0), but doesn’t seem to be anywhere near that point.
Edit: I see you updated your post. The output/rendering side of pulldown-cmark seems (again, I don’t know enough about Rust to say this with confidence) to be very flexible and easy to use in neat ways. Modifying the input/parsing side on the other hand is not in any way smooth and simple. It is fast, but on the messy side. Raph has shared a prototype branch with a possible new approach that actually builds a tree in parsing, but it was incomplete and still wasn’t built in a “pluggable” manner where one format could be swapped for another easily. This is probably part of the reason that the package is still at version 0.x.