Translating the compiler

It varies from language to language. I’ve never really looked at the Russian one.

Also, I didn’t say that there are vibrant Rust communities out there, I said there are vibrant programming communities. Rust is relatively new, and we’ve only recently started seeing Rust books/etc being published in other languages. It’s unreasonable to expect Rust to have such a community now. That said, there are super active Mandarin-speaking Rust chat channels out there (the QQ group I know of has more than a thousand members!)

Yes, we aim to be inclusive towards who people are, not towards some idealized concept of “who people should be”. Your comment is not the same as simply encouraging people to learn English.

(This metadiscussion also doesn’t belong here since it’s off topic)

2 Likes

I concede that the Chinese community may be large (I wouldn’t know how to measure, and I’ll trust you on this one), and it might make sense to continue with this effort bases solely on that; but the StackOverflow links you’ve provided are not convincing even if you take into account the overall number of questions, not rust-specific.

I don’t mean to say that there aren’t forums and questions sites in languages other than English; but they tend to be orders of magnitudes smaller than English. Even for Russian, which does have a sizable programming community (and has the largest StackOverflow question base of the ones you listed), I estimate that I’m at least a 100 times more likely to find an answer to my question on the internet if I look in the English sections rather than the Russian ones.

Edit: And compiler messages are the things that you are the most likely to google.

Edit 2: What I’m saying here is that translation of compiler messages seems the least likely to help. I think blocking the rollout of the new Rust website on translating it to the same amount of languages as the old site was translated would have made a lot of sense; translating compiler messages, however, is very low on the priority list that I would make if the aim is making a language more accessible to people whose native language is not English. First comes the Book, second the website, third the standard library docs, then some other things that I can’t think of right now, and only then compiler messages.

That doesn’t make them not vibrant. And stack overflow is not an indicator of this, I only point them out because you said they didn’t exist. E.g. the Japanese community seems to prefer Qiita over ja.stackoverflow.com.

I have a plan for the stdlib docs too. A slight hitch is that rustdoc doesn’t have many people working on it right now. I’m playing around with the error index first to see if this kind of thing will work.

I can totally post the plan for rustdoc, it just has fewer unresolved questions so I didn’t think it would be useful yet. There is a bunch more design work that needs to be done for the compiler, so I’m getting this discussion out first. (Furthermore, I’ve already had discussions about this with rustdoc maintainers in the past)

My two cents:

I agree with @Manishearth that this is a good thing to be thinking about. I think it would be great to offer translations of Rust compiler error messages. Obviously nobody would be forced to use them and I think that – in practice – there would still be plenty of incentive to learn English (though, speaking personally, I’d rather that we English speakers had a few more incentives to learn other languages). I am definitely interested in finding any way we can to make Rust more accessible to folks.

Regardless, though, it seems like what is blocking us from a technical perspective is basically refactoring to use “diagnostics structs”, something we want to do regardless, as @Manishearth noted here:

so it seems like we should focus on that.

6 Likes

Yep! If we do it in an ergonomic way via a macro/custom derive, translation support is super easy to build into the system and the conversion can be done in a largely automated way, aside from some complicated diagnostics.

I think I’ll edit the original post to mention this.

1 Like

The above was my conclusion as well when in 2018 I looked at the difficult of such a conversion. Aside from a few complicated diagnostics, the conversion of most diagnostics can be automated in a number of ways. With Fluent and Pontoon that becomes even easier, and if rustc is refactored to use diagnostic structs, as @nikomatsakis mentioned, most of the infrastructure update becomes very straightforward.

That leaves the task of converting the 5k or so distinct error messages to local language communities, so those communities who want local-language diagnostics presumably will be the first to have them.

3 Likes

Yeah, and it’s worth mentioning that the website’s 500 strings got translated in a week for at least one language. And there’s a lot of prose here, not one-liners. This work is quite parallelizeable and can be done pretty quickly, especially with Pontoon which is special-built for this purpose (and also provides a review process)

2 Likes

Opened a separate thread about the challenges present in handling the stdlib docs: Translating the stdlib docs

User’s usually expect their system to behave consistently, and that means LANG and LC_* (not just LC_ALL; the logic is somewhat complex) by default (on Linux; on Windows there is a separate API). I agree there should be a rustc/cargo option to override it, and I think you should have a choice whether to download the translations at all in rustup, but if you have them and change the system locale, it should get reflected like it is by any other application with locale support.

I really don’t like that ftl files are key-to-string rather than English-to-translation, because it makes keeping track of which translations need to be updated difficult, but in this case it could be taken advantage of. If the error codes doubled as the Fluent IDs, then they would have to uniquely identify the message, so the conversion would force fixing this.


This is another bit where the English-to-translation shines, because all you need is a tool to split the document to translation units and then pick the translations from the catalog if they exist. Debian has po4a for this purpose and the Translate Toolkit has the same functionality (when I tried them, for HTML the later worked better). I don’t think either has markdown support, but adding a new parser is not that hard. The harder part is integrating it with Fluent, because it requires IDs. Maybe the tool could simply generate them.

I also can’t imagine writing the documentation by writing an ID in the comment and the text somewhere else (or writing the text somewhere, using the full symbol name as the ID), so the English catalog would have to be generated forever. And to be honest, I think it would be more maintainable if it was the case for the compiler as well.

As I always used PO (which is English-to-translation) or XLIFF (which can be used either way) and I didn’t yet look much at Fluent. Does it have some support for extracting strings?

All the translation formats are basically equivalent, and the easy part anyway. The hard part is initially marking what needs to be translated and/or splitting it up to translation units and then setting up the process and tooling for maintaining the translations, and doing it so that it does not make things harder for the programmers working on the code.

Marking the translatable strings and making sure they have unique IDs can be efficiently done together, and can be independent of what library it will call to for the actual translations.

2 Likes

Nah, we wouldn’t use error codes as ids here, we’d have a new id scheme, likely derived from the name of the diagnostic struct. There are plenty of messages without error codes. This would still be pretty automatic – the diagnostic struct conversion is something we’ve wanted to do anyway, and it’s really easy to piggyback robust id-to-translation support on top of that.

You’re responding to outdated comments about rustdoc here, the plan is not to use Fluent there, but to keep using an id-based format with autogenerated ids. See Translating the stdlib docs. This is still somewhat like English-to-translation, but it uses id-to-translation under the hood to be tracked better.

The error index is already generated from a very straightforward format, extracting the strings isn’t hard. I already wrote a hacky tool that does this without any integration into the rust build system.


None of the plans involve forcing programmers to “add ids and then add the text in another file”, they all work via extracting an English .ftl file from source at build time, using the same proc macro.

Please make automatic translations at least opt-out. My experience as non-native speaker is that I don’t even try to find localized communities about X. Simply because a) all really invested people who have enough knowledge are speaking (at least enough) English. b) The technical term is always an English one because there’s no German native word for this. We even call it “denglish” due to the amount of German-English mix.

I’ve set my whole OS to EN-US as I’m annoyed by the bad translations, hard to look up error codes and the lack of people that will be able/want to to answer in German, it’s just a waste of time for me. And I think it fragments the community even further. (this already didn’t work out with a native users.rust-lang forum…)

4 Likes