Internationalization of crate metadata?

Alright, so I'm seeing a lot of points here that keep getting brought up in such discussions.

My perspective is as of a person who has spent a lot of time with the field of internationalization, and has been leading most of the internationalization work for Rust in the last few years, including the discussions about internationalizing the standard library/compiler, and non-ascii identifiers.

In such discussions, a bunch of dissenting points are brought up almost every time, often over and over again. Most of them are fallacious in some form or the other.

Firstly, I kind of want to paint a picture of the people most served by internationalization in programming. While English has certainly become somewhat of a lingua franca amongst programmers, this is not the case globally. There are many countries in which advanced technical education for many fields exists in a language that is not English, and they end up producing a lot of programmers who are not fluent in English. South Korea, Japan, Taiwan, Brazil, and China are all good examples of such countries, to varying degrees.

For hopefully obvious reasons, there's far less visibility into this for primarily-English-speaking open source communities. For example, there are huge Chinese and Portuguese Rust communities, but they hang out on completely different venues. They exist -- and they're wonderful -- but we don't have much cross-communication with them.

More than "advanced technical communities that don't speak English", there's the way thornier situation of speakers of languages where that's not even an option. There are so many people excluded from programming because they don't speak English and their language is lacking in resources. Now, that's not a problem the Rust community can solve on its own. But we can absolutely take steps to reduce friction there; make sure we're not making the problem worse. This is directly in service of our goal of being more inclusive.

A bit of a personal anecdote: I went to college in a former British colony. As a former British colony, English is relatively common amongst educated individuals, and most people regardless of education can understand very basic English, so my college primarily used English. Also, as a former British colony, the resultant poverty and lack of infrastructure meant that many people did not have the opportunity to get a consistent education that might prepare them for a higher education program in English. There were a fair number of people who were far less fluent in English who had a lot of trouble keeping up with the higher level technical instruction. To the credit of the institution, it provided remedial classes for English, but this did not necessarily fix the problem (you really can't patch up fluency that quickly and easily). Many of these students managed to get to a point where they could manage and ended up being successful, but it's still a pretty large barrier. I'm sure there are plenty of people who bounce off of this kind of constraint, or don't even try. It's a huge case of survivorship bias to look at the people in programming now and say "look, everyone here speaks English, what's the problem?".


Anyway, to address some specific points that keep cropping up in such discussions (not necessarily quoting any particular instance, and some of these have not been brought up yet):

"English is the lingua franca of programming"

No, it is not, it's a major language programming is done in, there are many programmers who do not speak English.

"I mostly see English programmers in this community/crates.io/etc, what's the problem?"

This is a case of selection/survivorship bias. If programming were less hostile to non-English speakers, we would have a more vibrant and diverse community.

"This creates more work for maintainers"

Maintainers can choose whether they want to do this. In my experience, such work is typically done by a different community member who wants their "language subcommunity" to be able to use the project. And yes, it's hard to keep up to date, which is why you can ask translators if they can commit to fixing up stuff when you udpate things (and tag them when you do so). If not, remove it.

"English isn't my mother tongue and I still prefer doing programming in English"

This is true for many people, and it's true for me. This doesn't mean it's true for everybody, the situation is different for each language. Furthermore, often the reasons behind this feeling are because of a lack of good materials and vibrant communities for those languages: precisely the problem that internationalization helps to fix!

"This contributes to balkanization of the ecosystem"

Firstly, this is already an issue: the Chinese community writes crates that the primarily-English community doesn't use, and to some extent vice versa. Most people don't notice this, it's fine.

Secondly, to drill down a little bit into this: This isn't really balkanization. The people who are enabled by internationalization would otherwise likely never contribute to the ecosystem in the first place. Internationalization enables access, and yes, some of those people will create artifacts that are less useful to you, but they would never have created those artifacts in the first place if they didn't have that access! Just because it's not useful to you doesn't mean it shouldn't exist.

Besides, this isn't a zero sum game. If someone wishes to write a cool serialization framework that is written and documented in Portuguese, let them. There will be other serialization frameworks for you to use, and perhaps one day Rust will have the tooling support for crates to be fully documented in multiple languages.

This feature in particular reduces balkanization since it actually makes it so that these crates can exist on crates.io in a way that's accessible to speakers of multiple languages at once.

"It would be easier if we stuck to one language so everything evolves as one giant community/ecosystem"

There are reasons why this is an imperialistic viewpoint and really should not be entertained in this community. However, it's not always borne out of malice, and to address it whilst assuming good faith: Forcing everyone to speak a language to participate just leads to fewer people participating; it does not actually work. It's a barrier to entry more than anything else.

35 Likes