Pre-RFC: Documentation Internationalisation, and modularisation


#1

Rust documentation currently can only be done in one language. While there are ports of the book, to my knowledge there is no effort to localise the std library. There should be a DRY way for people to have the same code, and same examples, but be able to easily trade in the text, and have the documentation tool generate multiple versions of the same module, but with different languages. This would allow localisations to be able to be maintained in the original project’s repo, rather than requiring localisation efforts to be relegated to a fork.


#2

I’ve been mulling over ways rustdoc could be improved for a while, and I’ve had some thoughts. Three in particular strike me as applicable:

  1. Allow “inner attributes” on doc comments, which apply to the doc comment itself.

  2. Add support for a #[doc(include(path="file.md"))] attribute for pulling doc material from an external file.

  3. Add #[doc(locale="fr-FR")] attribute for flagging specific bits of documentation as only applying to a given locale.

So, at that point, you could do something like:

/**
#![locale="en-AU"]
This function is awesome!
*/
/**
#![locale="jp-JP"]
この関数は素晴らしいです!
*/
fn awesome() { println!("awesome!"); }

Of course, you probably don’t want source files to be reams and reams of documentation with the code buried in between, hence:

/**
#![locale="en-AU"]
This function is awesome!
*/
#[doc(include(path="awesome.md", section="awesome.jp-JP", locale="jp-JP"))]
fn awesome() { println!("awesome!"); }

With awesome.md:

# awesome.jp-JP
この関数は素晴らしいです!

This way, the documentation for the primary maintenance language can be in the source, with everything else in a separate file. Keeping the two in sync might be assisted by giving rustdoc the ability to spit out a list of things documented and a digest of their “primary language” documentation. That way, translations can include the hash of the documentation they were translated from; a tool can then warn you when translations need to be re-checked.

Also, I’m not saying Markdown is the best (or even appropriate) structure for this, and it would be even nicer if you could just import a whole .md file which contained appropriate locale annotations… but this is Markdown, so we might have to make do.


#3

IMO there are important points documentation localization must have to succeed:

  • Fully transparent for developpers
  • Easy to keep updated :
    • Ability to know if a change happened in the original
    • Ability to get a diff of changes that happened in the original
    • Ability to contact translators when a change occurred
  • Warn the reader when it is outdated

To reach all these, I suggest this structure:

  • All documentation localization is in a separate directory.

  • The content of the “l11n” directory mimics the src directory except the “xxx.rs” files are replaced by translation files : “xxx.fr_FR”, “xxx.es_MX”, …

  • The content of a translation file would be something like that (syntax to bikeshed):

    #maintainers { Peter Parker spidey@marvel.com Clark Kent superman@dc.com } fn item_to_document(param1 : i32){ #original { Exact copy of the original documentation of the “item_to_document”. Will be used to detect if there was a change in the original and make a diff. } #translation { Translated documentation of “item_to_document” } } So when the Rustdoc tool has a localization directory available, it will generate translated documentations from the available file. But if some items are missing in the translation file, or are outdated (difference between source and “#orignal” bloc):

    • The translated documentation will have a warning on the item and will provide a link to the untranslated version.
    • Rustdoc will display a warning and suggest contacting the maintainers. (crate.io may send an automatic email).

I really believe that it is useless to provide translation if we can ensure they are up to date. It’s the main reason I always use English documentation even if a French one is available.

We should be able to add a new translation with no modifications on source files. I really don’t want to add any new syntax for innner docs; I think it is quite complex enougth.


#4

To extend your idea above a little bit further, I think the external doc attribute should be unique per type and should contain the internal structure externally:

/**
This function is awesome!
*/
#[doc(include(path="awesome.md", section="awesome"))]
#[example(path="ex1.rs")]
fn awesome() { println!("awesome!"); }

and the awesome.md file (or maybe other file type?) contains the list of translations. I also don’t see the need for the internal attribute. A simple fallback mechanism should be enough - just fallback to inline documentation if there is no matching docs in the external source.


#5

Because I want it for other things than just translations. I want to be able to mark comment blocks as being for “internal” documentation only. I want to be able to describe subsections, or conditionally include/exclude bits of documentation.

Anyway, it wasn’t a coherent proposal, just a few thoughts I’d had.


#6

To extend my proposal, rustdoc might provide further help to make translators work easier with a command like : rustdoc --l11nDir="localization" --generateLanguage="fr-FR"

It would generate all missing translation files with all the items to documents and its “#original” blocks.

On existing files, but with outdated items, it would report (on stdout) items to update and insert in the translation files a “#new_original” block and eventuality a “#diff_original” block. So when the translator has finished updating, he just has to rename the “#new_original” block.


#7

I was thinking that it might be more intuitive to replicate the source structure, as a way to structure the docs.

So we have a src that would look like this:

src/
    vec/
        mod.rs
    main.rs

Then in we’d replicate that structure for docs, so now it would look like the following, with the .metadata representing stuff like maintainers, last updated, or whatever is needed.

src/
    vec/
        mod.rs
    main.rs
docs/
    en/
        vec/
            mod.md
        main.md
        .metadata
    fn/
        vec/
            mod.md
        main.md
        .metadata
    de/
        vec/
            mod.md
        main.md
        .metadata

Then in a .rs file

#[docs="vec_description"]
fn vec() {
   // ...
}

And in the md files.
// en/vec/mod.rs
# Vec Description
This is a vector.

// fn/vec/mod.rs
# Vec Description
Ceci est un vecteur.

// de/vec/mod.rs
# Vec Description
Dies ist ein Vektor

#8

I would be strongly against any internationalisation system that:

  • makes the english files harder to read, and
  • silently allows international documentation to fall out of sync.

Those are my “constraints”. I guess a pre-RFC discussion should focus first on achieving consensus on a set of constraints and goals instead of the implementation details.

A way to preserve the readability of the english source files could be to just include per source file a path to a folder with the internationalisation. For example given file.rs, one can tell rust docs “somehow” (intentional hand waiving here) that the internationalisation files live in the file/ path, and then it can following a convention go and look for file/file_fr_FR.rs. These files could then contain the function signatures and types documented in a different language, and be able to cite the examples of the original english documentation somehow to avoid code repetition.

A way to avoid the documentation to fall out of sync would be to require that each pull-request that alters the english documentation in some way also must alter the documentation in any other language. This probably wouldn’t be doable by the PR author only so we need maintainers for that. This is basically the only solution I can think of that I would be 100% comfortable with. A fallback would be for rust doc to somehow use commit information to determine if this did happen, and if it didn’t, warn the readers of the documentation in different languages that a more up-to-date documentation exists in english. This allows the documentation to fall out of sync, but doesn’t allow this to happen silently, allowing the international documentation to be curated just before a release.

Anyhow the proposed solutions that involve adding locale attributes to basically all functions in the english documentation in probably all languages of the world don’t satisfy my first constrain and I would be strongly against any form of that.