Updated according to feedback and new ideas :
Summary
This pre-RFC describe improvements to the rustdoc tool, to make it able to generate documentation for multiple languages, warn about outdated translations and assist translators to keep their work up to date.
Motivation
Having a documentation in your native language is essential if you don’t speak English, and still enjoyable even if you do. But a common problem with translated documentations is that they may go outdated without notice if the translator does not review them at every release.
A huge part of the documentation of Rust projects (including the standard library) is inside the source code, as documentation comments, and generated by the rustdoc tool. This tool may be improved to handle translation and warn both the users and the translators, if some part of the original documentation has been modified after the translation occurred.
The main objectives of the suggested rustdoc tool improvement are :
- Introduce a standard format for localization of documentation comments that feels natural for Rust developers (since translators will mostly be Rust users too)
- The default workflow must make the translation effort fully transparent for the developers :
- Localization has no impact on the source code.
- Localization effort take place is in a separate directory that developers don't have to care about.
- No additional step or special tooling is required to generate the localized documentation.
- When the documentation is generated, warn on command line about outdated or missing translations
- The translated documentation must contain a warning and a reference to the current original text on items with an outdated translation.
- A translator must be able to easily spot all the outdated translations.
Guide level explanation
Translate crate documentation for a new language
If you want to provide a translation for a new language, run the command cargo doc --l10n-generate LANG
where LANG is the code of the language. A localization directory localization/LANG
for the specified language is automatically generated at the root of the crate directory. It will mirror the structure of the source directory, but ".rs" files are replaced with ".loc" files.
For example, the crate directory of a library localized in French and Spanish, with a single module named “my_mod”, should look like this :
+-src
| +-my_mod
| | +-mod.rs
| +-lib.rs
+-localization
+-es_ES
| +-src
| +-my_mod
| | +-mod.loc
| +-lib.loc
+-fr_FR
+-src
+-my_mod
| +-mod.loc
+-lib.loc
The .loc files contains the declarations of the items documented in the matching .rs file. On every item, the content of the documentation have been moved into a #[doc_original = r"..."]
attribute followed with an empty #[doc_translation = r""]
attribute.
For example given this lib.rs file :
/// The main struct of the library
struct MainStruct {
/// The only field of MainStruct
field : u32,
}
impl MainStruct {
/// Do something interesting
fn do_something(&mut self) {
self.field += 1;
}
fn undocumented_fn(&self){
println!("Hello World !");
}
}
The generated “lib.loc” would look like this :
#![translator=""]
#[doc_original=r"The main struct of the library"]
#[doc_translation=r""]
struct MainStruct {
#[doc_original=r"The only field of MainStruct"]
#[doc_translation=r""]
field : u32,
}
impl MainStruct {
#[doc_original=r"Do something interesting"]
#[doc_translation=r""]
fn do_something(&mut self) {}
}
Complete the #[doc_translation]
attributes with the translation for your language and the #[translator]
attribute with your name and address (if you want to).
Generating translated documentation
When you run the cargo doc
command, the documentation for available locales will be generated along the original one. If you want to generate the documentation only for one language, use the --language LANG
parameter.
To handle incomplete or outdated translation :
- If some items documented on the source does not have a matching item in the localization files, you will get a warning on the command line and the original text will be used for these items in the generated documentation .
- If some items in the localization files have the
#[doc_original]
attribute that does not match anymore with the documentation from the source, you will get a warning on the command line. In the generated documentation, there will be a warning header on the description of these items with a link to the documentation in the original language.
Fix an outdated translation
If changes happened on the source code and you want to fix the translation to match, run the command cargo doc --l10n-generate LANG
where LANG is the code of the language. The content of the localization files for the specified language is automatically updated to match the new source and you will be warned on the command line about the parts of the localization files that need to be updated :
-
The declaration of new items are inserted into the ".loc" files with a
#[doc_original = r"..."]
containing the original documentation for the item and an empty#[doc_translation = r""]
attribute. -
For items with a modified documentation comment since the last translation, there will be a
#[doc_outdated]
attribute containing the documention comment at the time of the previous translation while#[doc_original]
contains the new original documentation comment. The #[doc_translated]` is left unchanged
For example if the previous lib.rs file is modified to :
/// The main struct of the library
struct MainStruct {
/// The first field of MainStruct
field : u32,
/// An additional field
additional_field : u32,
}
impl MainStruct {
/// Do something interesting
fn do_something(&mut self) {
self.field += 1;
}
/// Do something else interesting
fn do_something_else(&self){
self.additional_field += 1;
}
}
The generated “lib.loc” would look like this just after the automatic generation :
#![translator="John Doe<john.doe@domain.net>"]
#[doc_original=r"The main struct of the library"]
#[doc_translation=r"La struct principale de la bibliothèque"]
struct MainStruct {
#[doc_original=r"The first field of MainStruct"]
#[doc_outdated=r"The only field of MainStruct"]
#[doc_translation=r"Le seul champ de la bibliothèque"]
field : u32,
#[doc_original=r"An additional field"]
#[doc_translation=r""]
additional_field : u32,
}
impl MainStruct {
#[doc_original=r"Do something interesting"]
#[doc_translation=r"Fait quelquechose d'interessant"]
fn do_something(&mut self) {}
#[doc_original=r"Do something else interesting"]
#[doc_translation=r""]
fn do_something_else(&mut self) {}
}
Complete the empty #[doc_translation]
attribute with the translation. Update the #[doc_translation]
on Items with the #[doc_outdated]
attribute. Then remove the #[doc_outdated]
attribute.
Detailed design
Localization directory
Everything about localization will be in a directory passed to rustdoc via the --l10n-path DIR
parameter. By default the cargo doc
command will pass the localization
directory at the root of the crate directory if it exist. This directory will contain a sub-directory for each language. These directories would mirror the source directory with ".loc" files instead of ".rs" files.
Localization files
Syntax
The content of the “.loc” files is the same than the one of the matching “.rs” file except :
- Only documented item declarations are present
- The body of the items is ignored and should be empty, unless it contains documented items.
- There is no documentation comments on items but attributes :
-
#[doc_translation]
contains the translation of the item documentation -
#[doc_original]
contains an exact copy of the item documentation from the source. It will be automatically generated. -
#[doc_outdated]
contains the item documentation from the source, at the time of the translation. It will be automatically created if the item documentation from the source has been modified (it is unchanged if already present).
-
- The crate can have a
#[translator]
attribute, listing translators informations.
Automatic generation
It would be too complex to create manually all the localization files with all the documented items and with all the #[doc_original]
attributes matching exactly the documentation comment from the source. Hopefully, the rustdoc tool will be able to generate all these files and help to keep them up to date.
When you pass the --l10n-generate LANG
parameter to rustdoc, it will generate (or update) the localization files for the specified language:
- If the language sub-directory does not exist yet in the localization directory, it is created
- For each file containing documented items in the source code, a “.loc” file is created if it does not exists already.
- For each item documented in the source code, rustdoc will check the matching item in the module localization file:
- If the item does not exist in the localization file:
- Display a warning at command line:
<file>:<line> The item <item> need a translation
- The item is created on the localization file with a
#[doc_original]
attribute containing a copy of the item documentation from the source and an empty#[doc_translation]
attribute.
- Display a warning at command line:
- If the item exists and the
#[doc_original]
is different from the documentation in the source:- Display a warning at command line:
<file>:<line> Translation for <item> need to be updated.
- The
#[doc_original]
attribute is updated to contain the new value in the source - If the
#[doc_outdated]
attribute does not exist yet, it is created to contain the previous#[doc_original]
. - The
#[doc_translation]
attribute is unchanged.
- Display a warning at command line:
- If the item exists and the
#[doc_original]
contain the same text as the documentation in the source:- If the
#[doc_translation]
is empty, display a warning at command line :<file>:<line> The item <item> need a translation
- If there is an
#[doc_outdated]
attribute, display a warning at command line:<file>:<line> Translation for <item> need to be updated
. - Else do nothing.
- If the
- If the item does not exist in the localization file:
- The generated
#[doc_original]
and#[doc_outdated]
attributes are using litteral raw strings with the minimum required amount of#
. The#[doc_translation]
will have the same amout of#
in its raw string header than the#[doc_original]
The translator will have to complete the empty #[doc_translation]
and update the ones with a #[doc_outdated]
. When they have finished updating the translation,they will delete the #[doc_outdated]
. To be sure they does not forget to translate item or remove some #[doc_outdated]
, they can run the generator again and fix the items until there is no more warning.
Localized documentation generation
When a localization directory is specified, rustdoc will generate, by default, the documentation for the main language and all the languages available. The --language LANG
parameter allow to generate the documentation only for the specified language.
For every localized documentation to generate, rustdoc load the source code and the localization file. For every documented item in the source, it compare with the #[doc_original]
attribute in the localization file :
- If they match, then the value of
#[doc_translation]
is used for the translated documentation - If they don’t match, or if there is a
#[doc_outdated]
attribute :- The translated documentation of the item will contain an alert with a link to the main language documentation for the item
- The value of
#[doc_translation]
is used in the translated documentation (after the warning)
- If the item does not exist in the localization file or the
#[doc_translation]
is empty :- The documentation comment from the source is used
If a translation has outated or missing item, there will be a warning : The translation for <LANG> seems outdated.
. Followed by you should contact <translator>
, when the #[translator] attribute is filled.
Drawbacks
- Add a lot of complexity to rustdoc
- The
#[doc_original]
attribute make the localization files look verbose. - the attribute syntax is not as idiomatic than documentation comments.
Alternatives
Use a doc comment syntax
Even if it doc comments are converted to #[doc]
attributes internally, the documentation in source files is usually done with comments. Using a syntax based on doc comments may feel more natural. The attributes would have to be replaced with some kind of tag. For instance :
///[l10n]: # (original)
/// Original documentation
///[l10n]: # (translation)
/// Translated documentation
fn do_something(&mut self) { }
Use a hash
For #[doc_original] and #[doc_outdated], we can use a hash instead of the full text. Since the translation would be the only full text, it would not require a tag with comment alternative. For instance:
///[l10n]: #original (8a5858a)
/// Translated documentation
fn do_something(&mut self) { }
It would make localization files less verbose, but the translator loose the ability to spot the original text directly in the localization file.
Use a diff in #[doc_outdated]
When the original documentation is modified, #[doc_outdated]
may contain a diff between the previous original and the current one, instead of the full text. It may make the changes easier to spot in long comments, but it would introduce even more complexity into rustdoc.
Extension of localization files
At first I decided to go for the .loc
extension for localization files, but since they are syntactically valid Rust files, maybe they should have the .rs
or .loc.rs
extension so they can be handled like Rust files by text editors.
Rely on existing translation tools
There are existing format for localization files like gettext or fluent. Rustdoc could generate gettext or fluent files instead of the proposed format.
But one of the most interesting points of these formats is handling dynamic text (plural, gender, ...). Since the doc comments are static text, using fluent or gettext would not be so useful. Moreover most of the documentation translators will be Rust developers not used to translation tools. They will probably fell more comfortable with a format that looks like a source file. This format would probably be easier for rustdoc to parse too.
Unresolved questions
markdown files
Since markdown files are not a collection of items but a whole file, it would require a different mechanism. It may be handled by paragraph.
macros
Macros can generate items with documentation. But it would probably be too complex to generate ".loc" files with macros.
Generated ".loc" files should be based on ".rs" files with expanded macros. If the translator want to use macros too in the ".loc" file, they would have to write them manually.