Rustdoc: reStructuredText vs Markdown

I’m by no means the most educated or motivated on this one, but all the people who are better qualified to write this have better things to do, so here we go!

#On The Potential Inadequacy of Markdown

Rustdoc currently uses “vanilla” markdown with some common extensions for all of our documentation. This is done by piping through a C library called Hoedown. As more proposals pop up to tweak or extend our documentation capabilities, this becomes an increasingly questionable position.

Proposed Modifications

  • Performance Docs: Performance is a distinct issue to raw usage or behaviour. For this reason I believe it would be desirable to separate it out as something that can be programmatically identified and manipulated. Possible usecases include “perf only” filters and tables of operations for e.g. collection traits.

  • Safety/Failure Docs: Similarly to the performance issue, safety and failure are important issues that would be convenient to extract/manipulate.

  • Inter-Api Linking: Being able to specify a part of some Rust api, and have it automagically linked without knowing where it would appear in the generated page hierarchy

  • Context-Sensitive Header Levels: Markdown considers an # to be an <h1> and not simply “the highest level heading reasonable where this will be embedded”, which seems to really be what we want.

  • Mathjax Support: It would be great to be able to properly render simple and complex snippets of math using Mathjax, but the underlying server-side text processor needs to cooperate.

These issues can be individually hacked around. #'s can be migrated to ##, or we can (try) to string-replace <h1>'s with <h2>'s. Safety/Failure/Perf docs can be designated by “magic” #safety, #performance, etc headers. Magic URLs can be post-processed to find relevant docs. Mathjax can probably be backdoored in with some exceptionally brittle regexes. But these are hacks that have a debt associated with them.

Hoedown itself probably can’t be configured to provide these features, as it is exceptionally unconfigurable, as far as I can tell. Special strings/characters are all hardcoded. Features are just toggled by hardcoded magic bitflags. I don’t think we want to take on supporting our own fork of it to provide these features. Hoedown being written in C also presents a potential security vulnerability (compared to the rest of Rustdoc, at least).

All of this piles up to the point where you start saying “maybe markdown isn’t right for the job”. Now of course, markdown isn’t without merits. It’s the defacto internet standard for internet communication. These posts, Github posts, Reddit posts, and various others are all written in markdown. If you’re contributing to Rust, you probably know at least some markdown, and can therefore pick up and modify the docs easily. Markdown also excels at intuitiveness in my opinion. However, markdown also lacks a good standard. It’s more like English than a formal language, in that various semi-compatible dialects exist.

However, most critically of all, thousands of words of Rust docs are already written in markdown.

Enter reStructuredText

reStructuredText is the language used by Python and Swift to write their documentation in. It is well-defined, extensible, and critically designed for exactly our usecase. Chris Morgan originally proposed and provided some arguments for this migration (as far as I know) in his talk here (at 33 mins – although Steve’s talk before it is also worth reading).

I don’t have any familiarity with this language myself, so maybe those of you who do can chime in. However if you skim the quick guide you can see that the common usecase of just making prose is pretty straightforward and easy. However it’s definitely more complex than markdown.

To use reStructuredText, we can try to hook into Sphinx, the tool built for Python’s docs, or write our own native parser/handler. I haven’t done enough research into Sphinx to determine the relative merits of these positions.

10 Likes

I can appreciate that Markdown has flaws, and I totally agree with the benefits of using something more structured.

… but I really, really, really hate writing ReST. :sweat_smile: :frowning:.

I’m not sure how to reconcile these things. It’s hard enough to get people to write documentation already, the more complex we make it, the less people will do it.

10 Likes

I think the best path is to improve rustdoc incrementally. I’m pretty sure the core team made the decision to stick with Markdown for the foreseeable future.

There’s a lot of stuff I’d like to see done with rustdoc that just isn’t particularly feasible with Markdown, not without hacks piled on top of hacks piled on top of hacks. reST seems like it would enable this stuff pretty easily. I’m thinking right now of basic stuff like hyperlinking function/type references, but there’s also things like denoting information on failure, providing structured markup about arguments, etc.

As near as I can tell, the best argument against reST is “it doesn’t have fenced code blocks”. Since we’re editing rustdoc in a text editor instead of on the web, I’m not sure that’s really a big deal, but I also wonder if we couldn’t extend reST to support fenced code blocks; I’m not particularly familiar with reST but after skimming the documentation, it doesn’t seem like it would conflict with anything.

The only other real oddity is the fact that inline code has to use two back ticks, e.g.

``code goes here``

(note that I can’t even figure out how to print that inline in markdown, because markdown is not properly specified and has no good escaping syntax; at least with reST I can tell you definitively that you cannot embed two backticks inside an inline literal).

I don’t know how much of an issue that is, though. My belief is that mst of our inline code spans in rustdoc are references to functions/types, and I’d like to see us be able to use substitution references, e.g. |String::as_slice()|_, to produce hyperlinks instead.


Ultimately, I feel like we should prototype out a reST-based rustdoc, convert a few modules over to it, and see how it looks.

2 Likes

Ah, forgot to link Chris Morgan’s text notes and slides on the matter: http://chrismorgan.info/blog/rust-docs-vision-presentation.html

For a long time, we were using pandoc which supported many, many extensions and it wasn’t much of an issue. We switched over to hoedown since it was a much smaller dependency (“what do you mean I need to install HASKELL to generate documentation for Rust!?”).

I don’t particularly want to depend on Python (docutils is basically the only implementation of reST) either. Anything in pure-Rust would be great to actually ship!

1 Like

I’d love to see a pure-rust reimplementation of reST.

Also, another thing to consider is that we could define a way to actually support both, as a transition plan (eventually dropping markdown if we decide to move everything over to reST). We could do that just by defining some sort of marker to put on the first line of the docstring to indicate what format to use.

/// {reST}
/// This function does stuff.
///
/// .. failure::
///   Fails when the moon is at its zenith.

(failure syntax TBD, it could also be a field list :failure: Fails when the moon is at its zenith. or if we don’t want to actually define extensions, even just a container directive with the failure class, which should work with “stock” reST and let us handle it in CSS).

2 Likes

FWIW rust/rust-lang#16374 lays the groundwork for adding “frontmatter” support to rustdoc, which appears to basically be metadata. This is done using a custom parser with a custom grammar. This is a great example of the sort of hacks on top of Markdown that I believe reStructuredText provides a much better solution for.

The critical feature that is missing in the current Markdown flavor used by rustdoc is extensibility, both in terms of syntax and implementation.

In terms of syntax: There is only so many punctuation characters in ASCII, so we don’t to invent a new sigil-base syntax for every new feature that we might want to add. (Cross-references, math, …) Instead, we should only add one inline-level syntax and one block-level syntax that each include the name of the extension and some extension-specific parameters. reStructuredText has interpreted text roles (inline) and directives (block). This is not unlike Rust’s move from @T, ~T, ~[T], ~str laguage types to Gc<T>, Box<T>, Vec<T>, String library types.

In terms of implementation: Instead of having a handful of roles/directive/whatever hard-coded in Hoedown, rustdoc should have a plugin/extension system to allow users to write their own. See extensions shipped with Sphinx or elsewhere. (Sphinx extensions can do more than add new roles and directives, but that’d be a good start.)

I don’t care all that much if it’s in Markdown or reStructuredText (although Sphinx has all these very nice features already implemented), but I really believe that this extensibility is essential.

Note that although its built-in HTML renderer’s configuration is very limited, Hoedown becomes much more flexible if we write a custom callback-based renderer. These callbacks could be the basis for rustdoc plugins written in Rust.

2 Likes

I filed https://github.com/hoedown/hoedown/issues/99 for adding “roles” and “directives” to Hoedown. We’d have to implement our own Hoedown renderer I think but that’s OK, the built-in one is 736 lines of C.

Wouldn’t be possible for a project to include own rustdoc plugins if needed? For a lot of projects any sane default is good enough, and other projects might use different parsers: some provided in rustdoc itself, and some implemented for this purpose.

Yes, extensible syntax would enable having a plugin system in rustdoc to enable that too.

It turns out we don’t have to re-implement the entire HTML renderer: we can take the built-in one and only override some of the callbacks. Rustdoc already does this.

BTW, I’m currently in process of writing a Markdown parser in Rust. It is not really usable right now (so I haven’t announced it yet on Reddit), but it should be in the nearest future.

Right now it is rather simple proof-of-concept-like thing, but I intend to make it support popular Markdown extensions, hopefully passing this Markdown test suite. I’m open for suggestions on extensions required for documentation.

Are you doing this as a learning exercise, or do you intend it to be production-ready? If the latter, how would yours differ from Hoedown?

Well, it certainly started as a learning exercise, but I do intend to support and optimize it. Of course, I can’t provide any guarantees, after all, I’m doing it in my free time, but I’ll do my best.

I’m using hoedown as a sort of reference, but md.rs has different design because it is a pull parser (and hoedown is push-based). It is also able to parse streams of data instead of buffers. Hopefully it will also be more extensible, but the first milestone is to support basic Markdown and some vital extensions like code fences.

Hey, I started working on this yesterday also (inspired by this thread, and working toward something production-ready). Maybe we could collaborate a bit, instead of both working on markdown processors separately.

I have basically the same goals as you (pass the testsuite, be standards-compliant) except I would rather not make it work in a streaming fashion, since I find extensions like footnotes valuable and streaming markdown would make that not work.

Right now the big design thing I’m working on is getting the data structures right. I think if you can manage to build the right data structures for something like markdown, the rest of it will fall into place naturally. Not to say I’ve found those yet or anything yet.

But yeah, PM me or we can start a new thread or something if you’re interested in joining efforts.

I’ve used Sphinx a moderate amount, without actually ever writing my own directives. But I think I can comment a bit on ReST.

  • It’s hard to write. ReST is very picky and structured, unlike Markdown, and will break things in weird ways if you don’t use three spaces in exactly the right place.
  • Directives can behave strange and still often leave me unsure why my link isn’t working.
  • Title syntax is more annoying than Markdown (and more picky; title underlines only - and they have to be the same exact length as the phrase above them).

On the other hand:

  • Directives are awesome when they work correctly. It offers a way to properly refer to separate types of links and get the right kind of text substitution/linkification for each of them.
  • It basically has namespaces for documents. Or something. I can do :py:class:`MyCoolClass` and I know that it’s a :class: directive in the Python domain. If we made a Rust domain we could have :rs:trait:`Index` and :rs:mod:`std` for links. I think that’s a really great way to be able to automatically refer to code, mostly since you don’t have to write those links manually.

Basically agree with @steveklabnik in that I also have an inexplicable hatred for actually writing ReST. For some reason every time it doesn’t work properly I just get infuriated and wish I could just write Markdown everywhere, but be able to use Sphinx directives.

2 Likes

I have yet to look into it in depth, so in the meantime I’ll just leave this here: http://blog.codinghorror.com/standard-flavored-markdown/

I was just about to post this :stuck_out_tongue:

The actual proposed standard is here: http://standardmarkdown.com/

Although it’s not finalized, I don’t see anything about extending the language. However, they’ve based the language around constructing a tree of “block structures”, and then parsing the contents of the blocks independently as “inline” elements. This suggests that a parser for their spec should be fairly easily extendable; just introduce new blocks and inlines.