Translating Rust Standard Library Documents

Hi, everyone, today I want to talk about some things about the localization of the Rust standard library. The localization of the Rust standard library was first proposed in 2019.The relevant link is here: https://internals.rust-lang.org/t/translating-the-stdlib-docs/10384.

The Rust standard library localization has the following benefits:

  • Make Rust more popular and easy to use
  • Enrich Rust related ecology
  • It is more conducive to Rust beginners to get started, and Rust development can be carried out with a small amount of foundation

At present, the Chinese version of the Rust standard library is in the preview stage. The project address is: https://github.com/wtklbm/rust-library-i18n. In this warehouse, machine translation is used and simple manual proofreading has been carried out.

These are all done through files in JSON format。 2021-09-02_13.09.15

After the user downloads the translated file, the localized document can be displayed directly in the IDE tool。

In addition to the source code translation, you can also generate HTML static documents, the effect please see the picture:

The translated document first goes through a program called Cmtor , which will extract the documents in the source code. When the documents are extracted, the documents will be further filtered and processed, the sentences in the documents will be rearranged, and the documents will be The translated content is translated by machine translation and saved in a JSON file。

After the translated content is saved to the JSON file, we can do manual proofreading at this time to find errors and correct them in time.When the document is modified, we can put the modified document back into the source file.Using machine translation can reduce the workload of manual translation。

There is no problem with document extraction and construction. The main problem lies in document proofreading.Need more people to participate。

Does the Rust team have any plans? Are you willing to support the creation and release of localized documents in an official form? Organize more people to proofread and organize documents?

@Manishearth

7 Likes

谢谢你!我没有时间做这个工作所以我很开心看这。

So I think the biggest problem here is that this is keyed on the existing English text in the docs; which means that any edits whatsoever to the English docs will bring things out of sync. The plan as proposed in the original thread is still the best path forward in my opinion: We need to get a way for rustdoc to dump a json file mapping item paths to their markdown docs. We can then get these JSON files translated (ideally through Pontoon, but we can start off with manual translations) in a separate repo and use those in released documentation by allowing rustdoc to be passed a directory containing translations.

I'm happy to provide direction on how to do this, but I don't have the time to do it myself.

Really glad to see some movement along this direction, though!

3 Likes

Currently all translations in https://github.com/wtklbm/rust-library-i18n are done based on JSON.All JSON files are placed in the warehouse.The rest can be done by Cmtor. Cmtor will automatically discover the changed parts of the document. Every time a localized document is released, it is completely synchronized with the English document. We only need to pay attention to whether the content in the JSON file Just be correct.And through Cmtor, can generate localized documents in multiple languages.This is completely possible and very efficient.

Yes, I understand. Firstly, I can't find the Cmtor program anywhere, or any documentation on it. But as a wider point, the internationalization community has been moving away from keying on the strings themselves towards having more structured keys, and this is why the plan for rustc has always been to use the item paths as keys.

Your efforts are indeed using JSON, but they're not using it the way we need: they should be mapping item paths to documentation, not "english documentation to translated documentation". Again, the original thread has a plan laid out in the first post.

Ok, I understand

That's a very good effort. The structure of JSON contains all necessary data, but not yet in the format required for maintaining the translations for a long time. In any case, we can hopefully use them to jump start such translation efforts. While it's true that, over time, the current form of matching exact documentation strings becomes outdated, if we were to start the i18n effort now then this JSON can provide very good initial matches for most of the documentation.

I would vastly prefer if a format like fluent were used here instead of JSON, for consistency with pontoon and the rest of the i18n infrastructure that we have.

From my perspective, there has been the desire to provide such things (also translation of the book), but there's never been enough people staffing and executing such things.

2 Likes

I think the trade-off between matching strings of English and matching path names is about what happens when the English docs change: matching exact strings will result in the changed sentences being in English (easy to automatically detect), whereas matching path names results in the change being silently ignored (harder to automatically detect).

So in the case of docs Fluent doesn't bring much to the table because there's no complex substitution going on. I'm actually not really in favor of using Fluent for stuff other than diagnostics.

Pontoon supports some JSON formats and if we want we can add support for our own. The important thing is that it needs to be a key value format, not a "golden" (keyed by existing text) format. Once we have that, it can be pretty trivially molded into a form Pontoon will like.

2 Likes

Can't we have the translation machinery talk to version control to detect when the English has changed?

Long time ago, I made a pre-RFC about localization by improving rustdoc, I did not investigated further since it did not seemed to get many traction, but I believe that it would be the best way to have a good translation workflow, for all crates, where the translation can be part of the main project without constraining the developer

1 Like

In my proposal, while translation are matched by item name, there is also a copy of the original text on the translation file, so you simply can compare the original text to the current doc comment to detect outdated translation.

Relying on an additional tool that is not part of the Rust toolkit does not seem a good idea.

2 Likes