Translating the compiler

We discussed this a bunch in the wg-diagnostics Zulip

So there’s a larger roadmap for improving diagnostics, and a part of that is using diagnostic structs.

The Fluent work can actually piggyback on the diagnostic struct work. If we use a custom derive or other attribute for the diagnostic structs, the vast majority of them can be autoconverted to Fluent strings, and we can make it possible to seamlessly and automatically upgrade the custom derive to use Fluent ids. We’d still need some kind of text! macro for diagnostics with complex custom logic.

This means the next steps for this are to help out with that work (once the team has figured out the precise design). Once that’s trundling along, we can start integrating Fluent (we can also wait for it to be completed, either works).

6 Likes

(This is kind of off topic for this thread, but The Book does have a number of community-maintained translations. For anyone reading this who is interested in a translation or helping with translations of The Book, please see this note and issues labeled Translation. Please file issues on the book repo if you would like to discuss translations of the book, to avoid taking this thread further off topic.)

11 Likes

I think the error index may be the best next step:

  • it’s a helpful information for newcomers when they hit the error and looking for help,
  • it may help improving our search result in languages,
  • its scale is relatively smaller,
  • it should be architectually similar to website translation as it’s just a page as well,
  • it should be less controversial than translating compiler and other binaries, and
  • it still helps building the terminology and alike for future translating compiler and other tools if we do want to at some point.
8 Likes

Yeah. This also would be better without integrating Fluent, since we won't need anything Fluentlike for the error index, which means we don't need to escape things. It may be easier to just support translations as JSON (format here).

It's likely we'll pick the same solution for rustdoc.

If someone’s interested in doing the error index, the steps would be:

  • Write a tool that parses error_index.rs files and converts them into the JSON format here – it’s fine if the descriptions are blank for now. It may be worth tweaking the error index generator to have a JSON output format.
  • Create a repo with an en folder containing all the JSON files.
  • Ask me to add it to Pontoon so people can start translating it
  • Extract the error index generator, add a Travis deploy job to the repo so that it publishes localized error indices

proper integration, where rustc --explain E1234 can also be localized, would require language pack support in rustup/etc. For now for the error index I think we should focus on getting the out-of-tree stuff working first.

3 Likes

Personal experience: as a non-native speaker of English, I find translated error messages worse and harder to use. When I’m writing code, I’m thinking in English. Having to perform a context switch and either translate the code (variable, function, type names) from English to the language of the compiler output or vice versa takes effort, and it doesn’t really add anything of help, it merely adds another opportunity for confusion. I’ve seen non-English error messages from GCC a couple of times (Hungarian, my native language, and French, a language I used to be proficient with), and each time I noticed the non-English compiler output, it only confused me.

6 Likes

Everyone participating in this discussion can be assumed to be comfortable with English. For these people, the advantages of using English in the toolchain have already been mentioned. The people who will actually benefit from toolchain translations are not going to write it here — in English — so any attempt to gauge the usefulness of that feature with anecdotes is pretty moot.

7 Likes

Sure, everyone here is from a self-selected group, but what’s the alternative?

English is my second language, and I remember learning programming and English at the same time. I can also relay experience of my friends learning to program.

1 Like

Thinking about those who are not here and not going by our own experiences and anecdotes.

(This reply might seem glib, but it's a seriously useful exercise)

9 Likes

It would be nice if we could do something less speculative, though. Is there a known way to poll non-English speakers (e.g. at rust meet up in other countries)?

It’s a piece of feedback we’ve gotten repeatedly. People want translated docs, tools, and websites. Not everyone, but definitely some people.

It’s not at all speculative.

In my experience this varies from language to language. For example native speakers of most Indic or Romance languages typically seem to prefer resources in English. On the other hand, Chinese, (Brazilian) Portuguese, Korean, and maybe Russian/Japanese communities have a sizeable chunk of programmers who prefer to learn in their native language (and may not even speak English!). Chinese in particular has a really mature ecosystem for learning most popular programming languages without needing to know English at all. I’ve been told Korean does too.

17 Likes

I’ve got a hacky proof of concept for the error index here. This same format can be used for rustdoc, though that will require more work dumping and reconsuming the strings.

2 Likes

Something I want to mention is that the current “error code” situation is IMO very suboptimal and I’ve been wanting to replace it for a while with something else, and there are better designs which also help with translation.

One thing that could work is reifying everything we want to display into ADTs, but it might get too verbose.

@nikomatsakis likely suggested something neat at some point but I don’t remember.

Seems like the plan in?:

I discussed this with eddyb out of band, this is part of what they meant but they also wanted to replace error codes with better identifiers (perhaps the error struct name) which is a whole other discussion (and it doesn't help translation at all)

Just to share a point of experience WRT whether translations are useful or not, since there has been some voices of doubt: I'm co-organizing Rust meetups in Tokyo. The meetups are in Japanese, and my general feeling/observation about the participants is that while there are many people that read English fairly proficiently, not all do, and many people find using English effortful and thus, they find native material really helpful quality-of-life improvement. And then there's the younger audience who definitely have troubles understanding English material.

Btw. there already is an official, published translation of the O'Reilly crab book, unofficial translations of the both editions of TRPL, and an official published translation is coming out at some point. Besides, there's another, original book by Japanese authors Keen, Kawano and Komatsu, comparable in size to the crab book: amazon link.

Besides prose-style documentation, I think that most Japanese developers don't quite actively use StackOverflow, but similar Japanese services such as Qiita or Teratail.

The point that I want to make is that at least with some language communities, translations are perceived as very useful and helpful. I know that this is different between the communities: originally coming from Finland, I think that most Finnish developers would have a negative view of Finnish translations, thinking them as a liability. (Suffering from quality problems, getting out of date quickly, introducing non-compatible terminology with the English documentation.) I think that people discussing on this forum definitely have self selection bias.

14 Likes

I acknowledge that I might be biased, having a pretty good grasp on the English language, but this is my opinion on this initiative:

As a non-native programmer (Russian is native to me), I find it strange that people would ever prefer translated output. I guess it’s better than nothing. But, in my opinion, English has been a soft requirement for most programming jobs for quite a while now.

This is because, in my experience, you’re literally crippled as a programmer until you learn some English. Suppose the compiler and the standard library docs are translated into your native language; that does sometimes happen for very widely used languages. But StackOverflow is not translated into different languages. Third-party library docs are not translated into different languages. If you search for an error or a problem, you are 1000 times more likely to find a relevant result in English rather than your own language. Translations, even if they do exist, can’t be trusted to be up-to-date. News tends not to be translated. Third-party programming books, like Programming Rust, are unlikely to get a translation into your language. Etc, etc.

So my (very rough) estimation would be that a complete translation of the compiler and the standard library docs into your native language would make a developer who doesn’t know English a 50% developer rather than a 30% developer. Don’t get me wrong, that’s still super useful. But I’ve got to ask if it’s worth it, pardon my callousness. First of all, it definitely, in my honest opinion, wouldn’t be worth it if it inconveniences people who do know English, even if it’s not their native language. Second of all, is making people a little bit less ineffective worth the enormous effort that this would take? I’d rather they just have some more incentive to learn English. It would eliminate this disadvantage entirely and let them be a more integral part of the overall language ecosystem. Granted, I don’t know how many people like that there are; every programmer I’ve ever dealt with has had some knowledge of English, but that’s selection bias since most companies in my area consider English skill very strongly when hiring. But my guess would be that the percentage is not that significant?

Everybody is free to spend their effort where they want, of course, but I think that if anybody is hesitating, it would be better to put their effort into something else.

Edit:

After rereading what I’ve written, I might have come off a little too pessimistic. So let me clarify that translating the Book and the websites I can definitely get behind; it makes a lot of sense to make these as understandable as possible for people who have some grasp of English, but understandably would understand it easier and quicker in their own native language. But I’m highly skeptical of translating the compiler messages. These tend to be formulaic and easily googlable and easy enough to understand even if you know very little English. And I don’t think there are many people in programming who have no English skill at all.

5 Likes

https://ru.stackoverflow.com/ , https://ja.stackoverflow.com/ , https://pt.stackoverflow.com/, https://es.stackoverflow.com/ , https://www.zhihu.com/ , https://qiita.com/ , https://segmentfault.com/

There are some pretty vibrant communities out there.

There are tons of programmers who are not comfortable dealing with technical English, especially in China, Japan, Brazil, and (maybe?) Korea. "No English skill at all" is a red herring, one can know enough English to carry out very basic conversation but that's not the same thing as being able to work with tech documentation/tooling in English.

This isn't a matter of opinion: this is a fact. These programming communities often don't have much overlap with English-speaking communities, for obvious reasons, and thus it's sometimes hard to notice them, but they exist and are quite large.

This argument has no place on an official Rust forum.

8 Likes

https://ru.stackoverflow.com/ , https://ja.stackoverflow.com/ , https://pt.stackoverflow.com/, https://es.stackoverflow.com/ , https://www.zhihu.com/ , https://qiita.com/ , https://segmentfault.com/

There are some pretty vibrant communities out there.

Newest 'rust' Questions - Stack Overflow has 12,303 questions. Новые вопросы с меткой [rust] - Stack Overflow на русском has 111 questions.

Doesn't seem all that vibrant to me?

This argument has no place on an official Rust forum.

Honest question: why not? Is this an "inclusiveness" thing? Is this about this part of the CoC:

  • We are committed to providing a friendly, safe and welcoming environment for all, regardless of level of experience, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, nationality, or other similar characteristic.

I might not be reading this correctly, but these are all things people can't help being (and shouldn't be pressured into not being). Encouraging people to learn English is not in the same category as repressing people based on their sexual orientation or race. It seems like a generally useful and seemingly uncomplicated life skill to me. Like learning mathematics, or programming language theory.

I always assumed the opposite situation (at least from my own experience).

Technical text is largely about terminology (which you have to learn in any language), the language itself (grammar, tenses, etc) is much simpler than e.g. literary or publicistic or spoken styles, you also doesn't have to perceive it by ear or produce yourself (talk).
So, you can translate it word-by-word basically.

Holding a conversation is much harder.

(This doesn't apply to larger texts like the TRPL book, even if they are written in technical style, they certainly read faster in a native language.)

4 Likes