Note: this is about i18n support for the rustc JSON output, not i18n support in the language itself nor the terminal UI of rustc. i18n should always be done at library-level as it’s ultimately a domain-specific concern.
Summary
Make rustc’s diagnostics fully structured thus i18n-aware, so that meaningful i18n and l10n work can start across the ecosystem.
Motivation
The obvious; outreach, greatly smoothened learning curve for non-native speakers of English, so on.
Present situation
We already decided to not handle i18n and l10n at the language level long ago (maybe since the i18n format got removed from format!
?), and rustc nowadays doesn’t support any kind of them. A machine-readable output is present in rustc, namely the --error-format=json
flag, but it’s not directly usable as an aid to l10n in its current form, despite machine-readable output formats being especially strong indicators for i18n readiness. Namely:
- many errors and (all?) warnings don’t have unique diagnostic code assigned, and
- the parameters are rendered into the template message strings instead of being separate,
so that tool writers are bound to have a difficult time processing the messages if they want to tinker with them in any non-trivial way, especially i18n and l10n. This is unfortunate, and it hurts adoption from those developers who may have difficulties reading English beyond keywords used in programs. (Hint: the percentage is not negligible in East Asia for example.) Also, as almost any structure is better than plain strings, tool writers across the ecosystem would all benefit as well from the improved diagnostics. The command-line UI is not touched at all, so the impact of the change should be minimal.
(To solve the international outreach problem an overall design is required, but let’s leave it for another RFC. I believe the core team should be in a better position to put forward such an RFC than me an individual volunteer.)
Design
Firstly, diagnostic messages generated by calls like span_err()
are unstructured and needs to be entirely moved to structured counterparts, with the diagnostic codes assigned. A policy of only allowing structured diagnostic messages should be adopted, as the unstructured diagnostic calls should preferably be removed and unstructured messages will be impossible to construct then.
Then, extend the structured diagnostic message macros to preserve the parameter mapping.
Finally, extend the JSON output with addition of a “template parameters” mapping field. For compatibility with already existing tools, the current pre-rendered message field is not dropped.
Drawbacks
More work needed to be done when creating new diagnostics.
Alternatives
Do nothing
Just keep everything unchanged. People who find compiler messages hard to understand just have to learn themselves some English instead of retracting to their comfort zone.
Parse the messages back into template and parameters
A great number of message templates have backticks around code snippet parameters, and backticks are not used in Rust. So it’s trivial to parse the parameters out of the rendered string.
However, not all messages are formatted this way, and special cases are undesirable.
What about dropping the pre-rendered messages from output?
This is certainly doable, and will make the output very concise and elegant. Let’s transform eveything into codes! However, it has a potentially serious side-effect, that the compiler is no longer the single source of truth for the human-oriented rendered messages, but rather only for the individual combinations of diagnostic codes and parameters.
To save tools from rolling their own message renderers and keep the messages consistent across tools, a separate crate would be created to provide the reference message rendering, preferably officially maintained. Ideally, as the diagnostics are assembled during rustc compilation, such a crate could be auto-generated along with rustc itself. But it’s a nontrivial amount of addition to rustbuild, so we’d rather not go this way for now.