- Feature Name:
check_unprefixed_html_id
andcheck_unprefixed_html_class
Summary
Have rustdoc produce a warning when inline HTML has a class=
or id=
attribute that doesn't match the pattern CRATENAME_name, and isn't part of a list of exceptions that go through FCP to stabilize as an actual API.
Motivation
In an HTML page, id and class both form global namespaces [1]. Rustdoc already uses this namespace, which means that doc authors may inadvertently write code that conflicts with newer versions of rustdoc, which has sometimes introduced new classes and IDs when adding or refactoring features.
This can also cause problems with various forms of inlining. For example, impl
blocks will inline the documentation of the trait, including the HTML, which could result in ID conflicts between two different crate authors, even if neither of them conflict with rustdoc itself. Inlining was also one of the original motivating examples for intra-doc links, because rustdoc needs to compute different relative URLs when inlining into different pages.
Guide-level explanation
HTML section on how to write documentation
Inline HTML
As a standard Commonmark parser with no special restrictions, rustdoc allows you to write HTML whenever the regular markup isn't sufficient, such as advanced table layouts:
<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
<th>Category</th>
</tr>
</thead>
<tbody>
<tr>
<td>traits</td>
<td>static and dynamic dispatch</td>
<td>generics</td>
</tr>
<tr>
<td colspan="3">enums</td>
</tr>
<tr>
<td rowspan="2">doc comments</td>
<td colspan="2">reference documentation</td>
</tr>
<tr>
<td>markdown comments</td>
<td>attributes with custom syntax</td>
</tr>
</tbody>
</table>
This will render the same way Markdown tables do, even though Markdown tables
don't support colspan
or rowspan
:
This works fine in rustdoc, GitHub, and mdBook, but Discourse doesn't allow rowspan
or colspan
for some reason.
Name | Description | Category |
---|---|---|
traits | static and dynamic dispatch | generics |
enums | ||
doc comments | reference documentation | |
markdown comments | attributes with custom syntax |
HTML is not sanitized when included in the documentation, though
improperly-nested tags will produce a build-time warning. It can be turned
into an error by adding #![deny(rustdoc::invalid_html_tags)]
to your crate.
warning: unclosed HTML tag `h2`
--> $DIR/invalid-html-tags.rs:19:7
|
LL | /// <h2>
| ^^^^
warning: unclosed quoted HTML attribute on tag `p`
--> $DIR/invalid-html-self-closing-tag.rs:19:14
|
LL | /// <p style="x/></p>
| ^
Emphasis added for this proposal.
Additionally, IDs and classes should be prefixed with your crate's name, followed by an underscore. Since rustdoc sometimes includes excerpts of the documentation of your dependencies in your crate's documentation, this ensures you don't conflict with them. It can be turned into an error by adding #![deny(rustdoc::unprefixed_html_class)]
and #![deny(rustdoc::unprefixed_html_id)]
(this lint is currently a nightly-only feature).
warning: unprefixed HTML class `entry`
--> lib.rs:1:17
|
1 | /// <div class="entry">Hello there! This DIV will warn!</div>
| ^ add prefix: `mycrate_`
warning: unprefixed HTML class `entry`
--> lib.rs:2:14
|
2 | /// <div id="entry">Hello there! This DIV will warn!</div>
| ^ add prefix: `mycrate_`
warning: 2 warnings emitted
We also recommend the following additional restrictions:
- Start all doc comments with a one-sentence summary that doesn't use inline HTML. This summary will be used in contexts where arbitrary HTML cannot, such as tooltips.
- Though JavaScript is allowed, many viewers won't run it. Ensure your docs are readable without JavaScript.
- Do not embed CSS or JavaScript in doc comments to customize rustdoc's
UI. If you want to publish documentation with a customized UI, invoke
rustdoc with the
--html-in-header
command-line parameter to generate it with your custom stylesheet or script, then publish the result as pre-built HTML.
Reference-level explanation
New rustdoc-specific lints
unprefixed_html_class
This lint is warn by default. It detects inline HTML classes that do not start with the crate name, followed by an underscore _
, and that haven't been stabilized as CSS classes available to doc authors.
For example, if your crate is named mycrate
, then the first DIV will produce a warning, but the second DIV will not:
#![warn(rustdoc::unprefixed_html_class)]
/// <div class="entry">Hello there! This DIV will warn!</div>
///
/// <div class="mycrate_entry">This code is okay.</div>
warning: unprefixed HTML class `entry`
--> lib.rs:1:17
|
1 | /// <div class="entry">Hello there! This DIV will warn!</div>
| ^^^^^
warning: 1 warning emitted
The following classes are considered stable features that doc authors can use, and are exempt from this warning:
Name | Recommended use | Example |
---|---|---|
stab |
Mark up information about stability, platform support, and deprecation. | <div class="stab portability">Linux only</div> |
portability |
Use with stab for portability notes. |
|
deprecated |
Use with stab for deprecation notes. |
<div class="stab deprecated">Use <code>foobar</code> instead.</div> |
unprefixed_html_id
This lint is warn by default. It detects inline HTML IDs that do not start with the crate name, followed by an underscore _
.
For example, if your crate is named mycrate
, then the first DIV will produce a warning, but the second DIV will not:
#![warn(rustdoc::unprefixed_html_id)]
/// <div id="entry">Hello there! This DIV will warn!</div>
///
/// <div id="mycrate_entry">This code is okay.</div>
warning: unprefixed HTML class `entry`
--> lib.rs:1:14
|
1 | /// <div id="entry">Hello there! This DIV will warn!</div>
| ^^^^^
warning: 1 warning emitted
This lint only checks places where IDs are defined, such as the id=
attribute and the deprecated name=
attribute on <a>
tags. Attributes where IDs are used, such as <a href=>
and aria-labeledby
, are not checked.
Drawbacks
This is going to require extending rustdoc's HTML parser in rust/html_tags.rs at master · rust-lang/rust · GitHub. That ad-hoc parser is pretty complex, and given some of the features it already has (like sniffing for Rust paths) and its relationship to the Markdown specification (which has slightly stricter restrictions on what counts as HTML compared to HTML5 itself) an off-the-shelf parser won't be a complete solution in any case.
The bigger question is how much rustdoc should actually support inline HTML at all. The purpose of the invalid_html_tags
lint was partially to help people out who never intended to use inline HTML. This lint is strictly intended for people who wrote something that looks like an ID or CLASS attribute, which means they were almost certainly doing it on purpose.
Rationale and alternatives
Why not rewrite the HTML id=
attribute?
Rustdoc avoids generating conflicting IDs for headers and other sections by keeping track of all IDs on a page and adding suffixes. Further stability is achieved with a name-mangling-like scheme of adding extra information to the generated ID to avoid producing conflicts at all.
While it wouldn't actually be that hard to rewrite the id=
attribute itself, making it work would also require rustdoc to rewrite all the places where IDs are referenced. I don't actually know all the places they can be used in an HTML document, but I know there's a lot:
- CSS and JS
- SVG
<use>
-
aria-labeledby=
and<label for=>
<button form=>
- Anchor references, like
https://doc.rust-lang.org/stable/std/index.html#keywords
Since anchor references can be outside the page, we would also want to try to keep the generated IDs as stable as possible, probably by adding more information rather than just adding numbers to the end. This guarantees that all the places where the ID is referenced need to be rewritten, though.
Also, this approach only works for IDs, not classes.
Why not have rustdoc prefix all its classes and IDs, and let the doc author do whatever they want?
This only solves the problem for conflicts between doc authors and rustdoc. It does not help with conflicts between two different doc authors getting inlined into a single page.
The other, bigger problem is that it breaks most existing anchor links. For example, https://doc.rust-lang.org/stable/std/index.html#keywords
works today, and it's a rustdoc-generated ID. If all rustdoc IDs were prefixed, this link would break.
The third downside is that rustdoc classes and IDs are very common, so adding a prefix to all of them would bloat the page heavily. Most doc authors don't use inline HTML, and would not appreciate large amounts of overhead for a feature they don't use.
Why not shadow dom?
There are two main reasons why Shadow DOM is not a realistic solution here:
- It requires JavaScript. While rustdoc does use JavaScript, it should be readable without it. Conflicting classes and IDs can result in unreadable docs.
- It doesn't help with scrolling to a fragment.
Why CRATENAME_
and not CRATENAME-
?
Rustdoc-builtin IDs and classes don't include underscores. They always use hyphens for separators, so if user-defined ones always contain an underscore, they can't conflict.
Rustdoc generates two other kinds of IDs:
- Header slugs currently can contain underscores. The algorithm would need tweaked to prevent that, to ensure they can't conflict.
- Item IDs, such as https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#method.new, always start with an item kind name (which doesn't contain an underscore), followed by a
.
period. Since crate names can't contain periods, this makes it impossible to conflict.
Prior art
This is mostly formalizing a convention that's already being used to avoid conflicts. It's why hljs classes always start with hljs-
, and fontawesome classes start with fa-
.
The design choice that ensures user-defined IDs and classes in raw HTML always match the regex /^[[:ident:]]_/
, while rustdoc never generates IDs that match this pattern, is a bit weird and subtle, but it's similar to the rule that Custom HTML Elements always contain a hyphen, while builtins never do.
Unresolved questions
This won't help with people including things like Mermaid JS and KaTeX. If people are going to use them, we'll need to just make sure we don't conflict with the classes generated by those JavaScript add-ons. Luckily, both of them are designed to be included into existing pages with minimal conflicts, but what sort of recommendation should we provide to anyone trying to make reusable JS files for rustdoc?
Rustdoc has poor support for two different crates in the same dependency tree with the same name. This can happen, though, for several reasons:
- Two different major versions of the same crate might be transitive dependencies. This could easily be fixed by requiring the major version number to be part of the prefix, but in practice that would make bumping the major version of a crate annoying if the author made heavy usage of custom classes or IDs.
- A Cargo package can rename its crate using
lib.name
(forked crates do this a lot). If a crate transitively depends on both the fork and the original, it can result in a conflict. Forcing someone to rename all the classes and IDs when forking a crate would also be annoying (and rustdoc currently doesn't know the Cargo package name anyway). - Two different crates within different cratespaces can have the same name. Whatever solution the Rust language itself arrives at for this will probably inform rustdoc's solution.
- It can also have a binary and a library in the same workspace with the same name. That one probably isn't too difficult to cope with. Two crates in the same workspace might as well be one crate, since the problem can be fixed as long as it can be detected without having to worry about forking,
[patch]
ing, or otherwise removing dependencies that break the docs. Unfortunately, while it would be relatively easy to detect ID conflicts by scanning the document and warning on any duplicates, classes are supposed to allow duplicates, so there's no easy way to detect this.
Currently, running cargo doc
on a crate that transitively depends on two crates with the same name will cause whichever one gets documented last to "win" and clobber the other. Coming up with a solution to that problem will inform the solution to the ID/class prefix problem as well.
Future possibilities
Currently, rustdoc checks for imbalanced HTML, unclosed attribute quotes, and, if this lint is written and merged, IDs and classes that are likely to create conflicts.
Should rustdoc check for HTML ID conflicts (presumably to warn about them)?
Should rustdoc warn on deprecated HTML, like <keygen>
(which doesn't work in Firefox or Chrome) and <plaintext>
(which is basically just going to break the page)?
Rustdoc has poor support for a lot of things people might want to do in HTML, like images and free-text pages that aren't intended to document a particular item (think overviews and tutorials). What goes between the quotes in <img src="">
?
-
see "alternatives" for why Shadow DOM isn't a solution ↩︎