Adding LaTeX support to rustdoc

Coming here from Add LaTeX support to Rustdoc · Issue #34261 · rust-lang/rust · GitHub

Anyone writing mathematical code who wants to write mathematical documentation has probably either wanted LaTeX in rustdoc or employed some kind of hack to make it work.

It sure would be nice if there were a first-class option!

15 Likes

Would it not make more sense to use Typst syntax? It is a modern written-in-Rust replacement for LaTeX that seems to be taking off and getting a lot of hype currently. (I haven't tried math in it yet, but I did try it out for presentations as a beamer replacement and it is a breath of fresh air.)

22 Likes

I'd love to see support for formulas in rustdoc.

Backwards compatibility isn't quite as critical a concern as with code (since documentation failing to render is not as serious a problem as code failing to compile or changing semantics), but there's still some degree of concern about compatibility surface area, and new versions of rustdoc are expected to correctly compile documentation for old crates. Markdown is reasonably well specified. LaTeX has a huge (and extensible) surface area, in terms of directives.

3 Likes

I think the actual implementation should just forward the "LaTeX" to some browser side renderer like MathJax. This offloads a lot of implementation questions and complexity to just choosing a renderer.

3 Likes

For my own blog I found https://katex.org/ to be much more performant than MathJax so I would recommend that instead. Much smaller download and much faster rendering after page load.

9 Likes

I think there are a lot of people employing various hacks to render KaTeX in rustdoc. Not sure if it's the most popular option, but it's the one I've at least anecdotally encountered the most in practice.

1 Like

See some recent-ish Zulip discussion on this topic #t-rustdoc > Adding LaTeX/MathML support to rustdoc.

Note that if LaTeX syntax is preferred, the "state of the art" tends to be Temml, which was forked from KaTeX but drops the heavy non-MathML fallback (MathML has been supported by all major browsers for a couple of years at this point MathML | Can I use... Support tables for HTML5, CSS3, etc).

8 Likes

I have some concerns about doing this client-side – from my point of view it would be better if rustdoc did the rendering of math syntax into HTML/MathML itself, rather than just sending the source code and using JavaScript to convert it in the browser.

This is partly because I often view documentation on browsers where JavaScript is restricted or absent, and partly out of performance concerns (why waste the electricity to do conversion every time, and the bandwidth to send the conversion routine, when you could just do it once at compile time?)

To put it in perspective, Temml advertises that it's only 170 KB in size. I would expect it to be reasonable to fit the documentation for the entire standard library in less size than that. (Admittedly, Rust currently doesn't do well in this respect at the moment, e.g. the documentation of core is, as of 1.85, 202MB, and of std is 109MB (I don't know why core is larger). But I expect most of this size to be easily removable and suspect the large size is simply a consequence of a lack of optimization. Adding an inherent dependency on a JS library may make documentation size optimization impossible in the future.)

6 Likes

Doing it during generation would also make it easier to support more modern syntax such as used by Typst, rather than the gnarly syntax Latex uses. They are similar, but from reading the docs the Typst syntax is definitely cleaned up, such as plain parenthsis working rather than needing \left( and \right) to get the proper height of parentheses.

4 Likes

To put it more into perspective, the HTML file for the core::arch::x86_64 module documentation alone is 3.6MB in size. That contains a list of 13K definitions, most of which have a name that's more than 10 characters long, so just storing the names of these functions would be more than 170KB in size.

As for another example, the source file iterator.rs for the Iterator trait is 137KB and contains mostly documentation, all of which must also be present in the final HTML output (along with markup tags to properly display it).

1 Like

Hmm, maybe we're both out by some orders of magnitude here. I compressed the two files you mentioned with zstd -19 (the highest general-purpose level – there are higher levels but they're harder to decompress) and they came to 169 KiB and 43.3 KiB respectively, which is probably a good approximation for how much data the file actually needs to store. (The names of x86-64 intrinsics are very repetitive, allowing the size to be smaller than is needed to store their names. A lot of the other content is even more repetitive, e.g. the "available on x86 or x86-64 only" boxes have their code written out each time rather than just being represented as a single CSS class.)

This means that the documentation takes up more space than I would have suggested, but nonetheless there is likely a lot of scope for optimisation. It's also worth noting that the standard library is likely to be much bigger than the documentation files for individual crates – although there are some crates that have huge traits like Iterator, there are also plenty of crates which really don't need to convey that much information in their documentation, and those crates may not want to deal with a dependency much larger than the documentation actually requires.

When deciding whether to render to static html ahead of time or use KaTeX/MathJax to render in the browser, it's important to consider how many bytes latex is when rendered to html versus when stored as latex source. The raw source is much, much smaller. For users on slow connections, it may be preferable to send latex source and render in the browser (using a cached version of KaTeX, which is free) than to send the math as html.

It's definitely smaller by a large factor, but I expect most documentation pages that contain math to contain only a small amount of it – so the difference in bytes is likely to be relatively small.

Cached documents are also a lot less free nowadays than they used to be: many modern browsers redownload files fairly often in order to avoid revealing to the site hosting them that they had them cached (because otherwise, detecting what's in cache could be used as a way to track users, potentially even across websites).

2 Likes

If I'm not mistaken, cached resources are no longer shared between domains for that very reason.

1 Like

KaTeX, and probably most reasonable alternatives, can be used server side to generate mathml just as well as on the client. Fundamentally this isn't a choice that needs to be made right now, and I would put my preference on whatever winds up as the simple/easy/straightforward yet correct path. At a guess that's client side katex; admittedly that's the only one I've used before, but likely whoever winds up implementing all this will have some notes.

There's a divide here between typesetters like MathJax and KaTeX, which produce proportionally larger output, and converters like Temml or TeXZilla where the output is (much) more compact but relies more on browsers' built-in math formatting rules.

MathML | Can I use... Support tables for HTML5, CSS3, etc suggests letting browser handle it should just work.

1 Like

MathML Core is widely supported, to be precise, but yes I agree that it should just work.

Something that's a bit tricky is that LaTeX and MathML are languages for pretty different domains, so they don't really map neatly into each other. Unfortunately, writing MathML by hand is pretty terrible.

Personally, I'd like to see something like Markdown-for-math. AsciiMath is unfortunately quite old (in web terms), so I'm unsure if that maps well onto the nearly-a-decade-younger MathML Core.

(I see Typst math as having the same mapping problems as LaTeX, since it too is primarily a typesetting system, but please correct me if I'm wrong.)

I would also love typst support but as far as I know, there still isn't a good standalone library that only handles typst math MathML output for HTML by 01mf02 · Pull Request #5721 · typst/typst · GitHub. Have to imagine that won't always be the case, though. @traviscross was looking into this at some point, as well.

I am not on the rustdoc team but from my perspective, math typesetting is something that anyone interested could start writing an RFC for. It has come up in discussion a number of times in various places with reasonably high levels of interest, but there has not yet been a singular concrete proposal that can be voted on. It needs to spell out such things as:

  1. What syntax do we want and how will it be parsed?
  2. As part of syntax, how do we differentiate rendered math vs. e.g. displaying LaTeX as code? (If anybody is interested in continuing discussion at RFC: Add support for display blocks & spans (for Math and diagrams) by tgross35 · Pull Request #745 · commonmark/commonmark-spec · GitHub, please do; I haven't had the time).
  3. Is there a default language or does it need to be selected? (markdown syntax, code attribute, etc)
  4. How it will be rendered? (I assume shipping Temml.min.js is the easiest option here, even if the ideal case would be a Rust dependency that can reder AOT)
  5. 6.Ability to extend to other math languages in the future
2 Likes

Another possibility is to provide a way to hook something for rendering, rather than rustdoc handling it directly. E.g. the following source:

```render,latex
\pi
```
```render,typst
pi
```
```render,mermaid,special-attribute
flowchart
    A[RFC] -->|FCP| B(Accepted)
```

could be defined to call a function cratename_custom_render(attributes: string, inline: bool, className:string) with "latex", "typst", or "mermaid,special-attribute" as the attribute, then the user can provide a .js file that does whatever they want with items matching that class. Or this could be hooked for something build-time via Cargo.toml config, somehow.

2 Likes