What to do about pulldown and commonmark?

Perhaps you missed my point? It’s that I would support a single, large breaking change. Right now, there are no mitigations in place to prevent the current proposal from turning into a series of small breaking changes over time. I’d prefer to see that addressed before this gets merged, even if addressing it means deciding that we’re ok with a series of small breaking changes over time.

I suppose I'm less concerned with how this first transition happens and more concerned with the fact that changing from hoedown (which I assume is stable and doesn't break itself regularly) to pulldown-cmark and therefore commonmark (which seems to break itself regularly) will result in this happening frequently, if not every release.

You've got this backwards. Hoedown does not follow a spec, but instead follows a very vague document. rustdoc never seems to break because we never update hoedown. Do you have evidence that CommonMark breaks regularly? Throwing around accusations without sources isn't very constructive.

Yes, this seems important right now before making any other decisions. The most general ways to move forward would be:

  1. Stick with what we have (hoedown) forever
  2. Keep what we have (hoedown), add additional input format(s), supporting multiple indefinitely
  3. Completely remove what we have (hoedown) in favor of a new format

I don't know how we got here, but the first option must be off the table, otherwise we wouldn't be in this position at all.

The second option has some great benefits like not worrying about breaking existing docs every time a new format comes along. The old ones are still parsed by the parser they were written for, and new ones can be updated to whatever is current. Technically, this option is probably much more difficult and I really don't see anyone in this thread that seems to be pushing for it other than me.

The third option seems like the focus of this thread, but we've discovered there are some issues with it. Some current docs will not come out right because hoedown is not quite the same as CommonMark.

It is possible to create docs that will render the same in both hoedown and CommonMark (which might require HTML tags in some cases). This could lead to an option where docs are parsed by both parsers in order to give users warnings that their docs might stop working in the future. It wouldn't help for docs that are abandoned, someone would need to do some updating. Basically, transition is possible, but there could be some pain.

The key to making option 3 feasible is that the new format has to be stable in the long term. If the new format undergoes breaking changes at any time in the future, we are suddenly right back where we are right now. The new format could still be updated over time, as long as the updates don't break existing docs.

Edit: I jumped from the very general option #3 directly to "move from hoedown to CommonMark" -- I should have at least mentioned that it would be possible to switch to some format other than CommonMark...it seems like that decision has already been made in the past.

1 Like

Following up from my previous post, I'll inspect some of the history of pulldown-cmark and CommonMark.

The pulldown-cmark project was started when CommonMark was at version 0.18, about 2 years ago. It was getting fairly regular attention for a while, as CommonMark was releasing frequent updates. At the release of CommonMark 0.20 on 8 June 2015, pulldown-cmark passed 553/553 CommonMark tests (yay!).

Since that time, pulldown-cmark has still seen some updates, but CommonMark has made a number of changes, enough that pulldown-cmark started to fall behind. I'm sure it could catch back up, but this actually gives us a bit of a glimpse into how many changes CommonMark has gone through in the last ~2 years. Now pulldown-cmark passes only 568/622 tests. So there has certainly been some change over that time span, but did it break anything people would actually write?

There have been a lot of changes to CommonMark since version 0.20. Many of them would not break existing documents. In fact, most of the updates to CommonMark since version 0.20 would "fix" documents that didn't come out as intended in 0.20. However, since pulldown-cmark isn't yet updated to the 0.27 spec, not all of those fixes would be available to docs writers until pulldown-cmark catches up.

There are a couple spots where direct advice given in the 0.20 spec has been modified, though. These seem like places where someone could have found their document stopped rendering as intended. Whether or not they would come up in Rust documentation is a good question, but it at least demonstrates that CommonMark isn't quite stable yet:

Before: "two blank lines can be used to separate consecutive lists..." After: "to separate consecutive lists of the same type....you can insert a blank HTML comment" Explanation: two blank lines no longer end a list, extra blank lines may now occur inside a nested block or between list items

Before: whitespace is allowed between [link text] and [link label] After: no whitespace is allowed between [link text] and [link label] Explanation: apparently the original Markdown explicitly allowed this space, but it leads to inadvertent capture, it probably fixed more than it broke, but likely broke at least some

The good news is these are pretty minor! I didn't notice any other changes that seemed likely to break intentionally-formatted documents. The rest of the changes seem to just help do the "right" thing in cases where someone who wanted the "wrong" thing would not have written it that way anyway.

Edit: Also, one rather unfortunate thing that happened was CommonMark changed the names of some block types between version 0.20 and now. These are exported publicly by pulldown-cmark, so users of pulldown-cmark right now need to use a Tag::Rule for something CommonMark calls a "thematic break" (formerly horizontal rule) and a Tag::Header for something CommonMark now calls a "heading".

4 Likes

So, possible ways we could end up with a series of small breaking changes over time:

  1. Bugs or spec version lag in pulldown-cmark implementation are corrected
  2. CommonMark makes breaking changes between now and 1.0
  3. "Extensions" currently in use (footnotes, for one, are there others) are standardized in a way that isn't compatible with the current implementation

Let's try to figure out how likely each of these would be.

Pulldown-cmark updates

It's really not that bad that pulldown-cmark isn't right up to the current CommonMark spec. Keeping it constantly up to date is a lot of work when the changes are minor. If it hit 1.0, then there would be more motivation to get it back to passing all the tests.

From what I can tell after the investigation in my previous post, even though pulldown-cmark is not quite up to the current CommonMark 0.27 spec at this time, updating it is unlikely to cause breakage on the same level as going from hoedown to pulldown-cmark could. The fixes just make documents easier to write. Any document updates people make during the conversion are unlikely to break when pulldown-cmark is updated to match CommonMark's latest spec. Passing all the spec tests should help limit bugs, but if the fix is just a bug in the parser, that is probably acceptable. You just tell the person who wrote to the bug rather than the spec that he should have written it to the spec.

So far, so good. What about the gap to CommonMark 1.0?

CommonMark updates

Optimism! Yay!

[quote="steveklabnik, post:35, topic:5115"] CommonMark is pretty close to having a 1.0 spec release; that is, in my understanding there are 8 outstanding issues, and

I wonder if the quote from CommonMark's website is a bit on the optimistic side? Based on this issue, it seems like they previously promised "early 2016" and now we're a year past that.

Can we tell how close they are by looking at commit activity? Commits have slowed way down in the last year or so. That's good, right? However, there are still 30 issues open at the jgm/CommonMark GitHub repository. I'd expect to see more of them being closed or marked somehow if they are closing in on a 1.0. There are a number of interesting threads at talk.commonmark.org as well, but it looks like some of them have stalled out with no decisions made.

Since CommonMark still isn't at a 1.0 level, it could decide to introduce a breaking change before reaching that point.

I think a good step to take here would be to send a stakeholder from this project to get an up-to-date report on the status of CommonMark and the issues keeping it from reaching 1.0. This probably means taking an active role in helping them reach 1.0 as well. If it turns out that some of the undecided items would cause breakage, this would at least help get them decided and into pulldown-cmark before we make the switch from hoedown. If the report is that the only questions left would not break existing documents, we can more confidently move forward.

Extension changes

Ick. This part scares me more.

Good side: If CommonMark gets real extensions or support for arbitrary extensions, it will almost certainly not break existing documents written in CommonMark.

Bad side: There are two extensions already set into pulldown-cmark that were also present in hoedown. Any "official" extension created later will likely interfere with these.

So far, pulldown-cmark includes support for two "extensions" to CommonMark: tables and footnotes. Users can craft these with HTML, but if a markdown version is available, they'll probably use it. They've certainly used them in existing Rust docs. They are turned on by options in pulldown-cmark, and both options were enabled when used to parse Rust docs.

CommonMark doesn't have a built-in way to do extensions. They might get one, but it doesn't seem to be on the road to 1.0. So if we start using a custom extension that later conflicts with one that is added to CommonMark, we could be in trouble. Old tables don't work or we need to support two kinds of tables. Old footnotes turn into something else or aren't recognized. Things like that. Maybe the standardized version of footnotes uses a totally different syntax and we have to keep the old one around forever while also supporting the new one ("Our docs are in CommonMark 1.x with the official footnotes extension, but also you can use an old version of footnotes too, because.").

The GitHub Flavored Markdown Spec includes a Tables extension. I don't know if it is compatible with the one in pulldown-cmark, but they could probably be made to match. Then we can say "CommonMark 1.x with GFM Tables Extension". At least there's a real spec for it that a lot of people will probably use. Switching to it could also probably break existing tables.

What about footnotes? They aren't included in the GFM spec. The footnotes accepted by pulldown-cmark are very similar to those that hoedown uses. There are at least two (one, two) threads on talk.commonmark.org discussing them, but not much is resolved. There is at least one thread at the google/pulldown-cmark GitHub repository that discusses potential changes to the footnote rules used by pulldown-cmark. In that thread, @critiqjo put some effort into a CommonMark-style spec for footnote definitions that could potentially have some legs if it was brought out of that thread to a wider audience. These independent threads seem to share some similarities. Nothing's final yet, but none of those threads agree 100% with the current footnote implementation in pulldown-cmark or the footnotes that are present in Rust documentation already. Most of the differences have to do with nested blocks, a concept that CommonMark wishes to give some uniformity, but hoedown doesn't seem to care about much.

11 Likes

Only what was stated in the thread before me.

pulldown-cmark's tables extension was modeled after GitHub's table syntax, but it was written before the GFM spec was released. Once Raph's big refactor is done, and p-c is up to the current CommonMark version, I would be happy to bring pulldown-cmark into full alignment with GitHub Flavored Markdown, not just by bringing the table syntax into alignment, but also by adding support for task lists, strikethroughs, and autolinks.

4 Likes

One way to do it is to ignore CommonMark and adhere to GFM as closely as possible - after all, github is pretty popular, so they aren’t going to make any big breaking changes. Plus people are likely familiar with GFM anyways, so we could just say that from now on, we are GFM-compliant.

As mentioned above, GFM is going to be CommonMark. So there's not a lot of difference there. See GitHub Flavored Markdown Spec for the spec (commonmark + their extensions) and A formal spec for GitHub Flavored Markdown - The GitHub Blog for more explanation.

4 Likes

I must admit I’m a bit surprised at the decision, since I pretty much put the migration in the “soundness-fix” category (catering to a well-defined spec instead of “UB”), especially since it does not break real code, but rather an output that was already quite tweaked.

On a side note, if we’re talking breakage, did the ship sail on markdown vs ReStructuredText? The second one is well-defined, did not incur any breakage in the last 15 years or so, and is supported widely, being even adopted by the linux kernel recently. There was a thread about it somewhere.

1 Like

Basically, yeah, a long time ago.

2 Likes

Switch to comrak ?

3 Likes

I think reverting for now was the right decision, but we should move to Pulldown sooner rather than later. We must be careful to message this loudly and clearly. Generally we have found that the only way to be sure we advertised a breaking change is by issuing warnings. Therefore, I think we should do our best to have a warning cycle here - at least where they can easily be detected (superscripts for example). It doesn’t sound like an opt-in is practical, but hopefully a warning cycle is?

Note that it doesn’t matter whether this is a soundness fix or technically not a breaking change because of a lack of spec. Peoples’ code is doing different things, therefore it counts as a breaking change and we have to take the necessary precautions.

3 Likes

We’ll start discussions about this transition very soon.

Status update: Pulldown rendering is behind a flag (--enable-commonmark), warnings are implemented but still need a little work, the work to changing the default renderer is being tracked at https://github.com/rust-lang/rust/issues/44229

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.