Internationalization of crate metadata?

Almost all crates have their descriptions and README in English. I wonder — is that a problem that needs fixing? Should Cargo/crates-io support writing crate metadata in other languages?

I think such support would require declaring which language description and README are written in (if there's only one language), and support specifying multiple versions of the description/README in other languages. What would that look like?

14 Likes

README files are probably the easiest thing to tackle, we could have README.fr and README.es as a convention. I find that localization of all other documentation to be a harder problem to address.

6 Likes

Since editors often detect the format of a file by its extension, it would be a shame to lose the .md extension.

Could we do README.es.md or README-es.md instead?

5 Likes

Would you expect Cargo to just search for these files and bundle them automatically?

I think that all readme's should be listed in Cargo.toml, maybe with globs to simplify it:

readme = ["README.md", "README.*.md"]

As a developer from Russia, I can say that I almost always prefer documentation on English even if there are docs on Russian (which isn't my native btw) because it is more accurate.

Also, I believe that most programmers knows English in a level when reading documentations doesn't make much trouble.

Localization is problematic for new programmers which just starting to learn and I think that we should support publishing localized learning resources for Rust, for example, it would be nice to make localization of Rust-book. Also, there is no much point of localization of some crate if people who don't know English would stop on std docs and never start to use crates.io in first place.

Finally, IMHO, we should rather prioritize localizing the Book and std docs than crates documentation.

P.S. I think, it is worth to investigate if crate developers will create this localization even if they would able. Because, I certainly wouldn't ever start to do this.

P.P.S. Wow, I found Russian version of Rust book: https://www.piter.com/product/programmirovanie-na-rust

19 Likes

if crate developers will create this localization even if they would able

While I would be able to document both English and another language, I worry that external contributors would translate to other languages that I simply can't maintain, and those translations would quickly become out of date.

Maintaining a large amount of translations needs a proper team, which most open source projects don't have, so I probably would stick to just English which is what virtually everyone is already doing.

15 Likes

Consider that there already are some crates that have readme only in Chinese. These crates should have ability declare their language correctly.

There's a tricky aspect of Unicode that code points alone are not enough to correctly display all text. Especially CJK languages share some code points that look differently in Chinese and Japanese, and rely on HTML lang attribute (or regional fonts) to disambiguate these code points. So to display all-Chinese README correctly I need to set lang appropriately too.

17 Likes

Single-language other than English is a good point, but that's a lot less work since it can be done without having to support an arbitrary amount of languages, all at the same time.

I wonder, can't such README's be wrapped in a <div lang="…">…</div>? There's still a single README, everything works, and now it has the correct language set.

This isn't as discoverable and would be a pain for documentation comments, but it can be done today without any new additions.

3 Likes

I think that translations of the standard library would still be valuable – and the standard library is important enough that many people would volunteer to translate it.

However, providing all the translations in the source code is not feasible, because every time an item is added or the English documentation of an item is changed, all the translations should be updated too. I don't think we want multiple pull requests to rust-lang/rust every time an item is added or modified, to update the various translations.

A better solution is to use a service like crowdin, so translations can be entered conveniently, without requiring pull requests. Whenever an item's documentation changes, all its translations should be removed automatically, so they don't become stale.

1 Like

I have a lot of experience with translation and localization of free software for Ukrainian language as one of leaders of national Linux user group.

The major problem for translators is additional burden. I need to chose between translation of new projects/pages, updates to older project/pages, and my life. If one project will publish an update once a year at average, then I will be unable to support more than 365 projects, because I will need to work 24x7x365 to keep with updates.

As I see, a number of free projects, which require translation, is increased to hundreds of thousands, while the number of volunteers is dropped to a dozen, i.e. there are dozens of thousands of free projects worth translation per volunteer.

We are concentrating our efforts to messages, which are seen by a user every day, e.g. popular desktops and free software. Developer software is often ignored. My mom doesn't use a compiler.

Also, we are sharing efforts by using of "translation memory" systems, which will automatically propose similar translations, to increase productivity of translations, and automatic systems to fill blank translations with "fuzzy" translations. However, the applicability of these systems is limited due to high ambiguity of English text.

Currently, I'm trying to use AI (linear transformers) to create automatic translation from English with help of translation to other, less ambiguous language, to share efforts between translators for different languages. Linear transformers can accept large sequences (thousands of words), so it's possible to supply English text and 1-3 other existing translations to an AI translator, which then may produce high quality translation to target language, if properly trained.

If someone works on a multilingual system for Rust documentation, then ensure that translations can be piped to external AI translator or translation memory automatically, and then merged back, similarly to msgmerge tool for .po files from gettext.

9 Likes

AFAICT English is the lingua franca of the software development world.

An argument could be made that not incorporating this feature would be exclusionary. But that argument cuts both ways: I don't understand e.g. Korean, and a project with only Korean docs would by definition mean I can't read or use it. So incorporating the feature has the same problem as not incorporating it in this regard.

Is there any concrete data showing that it's a problem?

In addition, even if this is pushed through, I expect the majority of crates to flat-out ignore any other languages. Perhaps some crates will have a README in another language, but only if the author(s) feel(s) like it. And therein lies the crux: writing documentation is already viewed as a necessary evil, and thus most developers won't be inclined to translate. And to be honest, as someone who already has a fair lack of time, I can really both sympathize and empathize with that.

2 Likes

I think having the ability to specify the main language would be a neccessary outcome. For example, I've programmer code using libraries developed in and primarily written for the Chinese market, and while they were usable, they were not fun or sustainable.

You just made my point for me. That library you're talking about is unusable by anyone that can't read Chinese, and this proposal would provide the means to take that to the next level.

I do not believe that that kind of balkanization is desirable, or a trend that should actively be supported.

2 Likes

But to an extent we already do this? Helping other text render better is a good benefit compared to fears of balkanization, because it already happens. Even if a Korean project had an English translation, I wouldn't want to use it, because that just increased the difficulty of getting support fivefold.

I don't think your counterargument against being exclusionary makes much sense. Someone in another language having to wrap their code in lang tags is an unnecessary burden. Now whether it's worth the implementation cost is debatable, but I don't think their should be ideological grounds against helping other people write code.

4 Likes

My argument isn't ideological. It's rational. It's essentially "as many crates as possible, documentation and all, should be accessible to as many people as possible". It's the essence of FOSS, and philosophically it's a utilitarian standpoint.

The core issue is the pre-existing issue of language, which is a de facto divider. But that won't be solved here, regardless of how this issue is settled.

So in lieu of that, I simply think it's best to maintain the status quo.

2 Likes

I'm ESL. I never seek out documentation in my mother tongue. I haven't used UIs in my mother tongue ever since I figured out how to do so. I find documentation in English to be usually of higher quality, and I write my code and documentation in English, even before I even set foot on an English speaking country. I still think that enabling people write code and documentation in their own language as much as possible should be a project wide priority. There are more native speakers of Chinese Mandarin than there are people that have learned English as a second language. More than a billion people speak Hindi or Spanish. Not all of them speak English. There are >7.5 billion people on this planet. There are only 1.2 billion English speakers.

It is my personal belief that learning English is very important to develop within of our industry. I consider it my duty to make sure that it isn't the only way. Rust is about (many things but one of them is) inclusiveness. Making sure we have on-ramps for people of different levels of experience, background and desires is important for our goals. Making sure that a 10 year old with a tenuous grasp of English can somehow start using Rust is within scope. Making sure that a Finnish engineer writing tax calculation for their local government can write arvonlisävero instead of value_added_tax in their code and write their documentation for fellow citizens in their own language is certainly in scope.

If we don't add ways to handle the fact that not everyone will operate in English, they will still do it, but in an ad-hoc way. And then it will be harder not only for the Finnish developer looking for their official tax calculation crate, but also for anyone who doesn't speak Finnish not being able to filter out crates like that. Having multiple language support might not mean homogeneously great documentation. It might even preclude it! But at least it will give people that are writing docs in other languages a way to give English speakers a short description explaining why this isn't for them (or asking for help to maintain it!) and a fighting chance to the handful of people that do want to spend the effort to have their documentation in multiple languages.

29 Likes

OK

I'm not sure I see a connection between your arguments here and "as many people as possible" unless you ignore the numbers @ekuber provides above.

Help me find a more charitable interpretation?

(Utilitarianism is an ideology.)

From a utilitarian viewpoint, I'm not sure where you're coming from. Encouraging people to leave their native language to share code is not a path to increasing the amount of software and its docs available in the world. It might be a way to curate the set of software to which one is exposed, but it will create strictly less overall code/docs/etc than a linguistically inclusive approach.

Maybe there's an argument that there will be more global value with a strong gravity well of English-first software, even if there will be less produced overall? I don't find this compelling, since as @ekuber points out there's a lot of software the world would find valuable that has little to no relation to English.

I can only rationalize this claim if I limit the utility considered to that of existing English speakers and those willing to join the club. Utilitarianism (if it's your thing) applies to the output and happiness of non-English speakers too.

Exactly! Rust can't solve the "problem" of many natural/human languages. All we can do is choose a response to the reality of the many ways we communicate.

To me, this seems a lot like the "why does Rust have 12 string types?" conversation. Yes, it's inconvenient to have your programming environment force consideration of reality. But at the end of the day, there are a lot of string encodings that Rust needs to interact with, and representing those variations faithfully is a core part of the problem domain of the language.

Similarly, the human language used to communicate is part of the core social problem domain of a code-sharing repository like crates.io. We can't not make a choice, we can only double down on the existing choices we made (prioritize English users over others) or make new choices.

11 Likes

Now I regret asking this question in an English-only forum. Everyone here by definition can communicate in English, and has managed to use Rust despite Rust lacking non-English resources. Of course lack of localization is not a problem for any one of you.

27 Likes

Alright, so I'm seeing a lot of points here that keep getting brought up in such discussions.

My perspective is as of a person who has spent a lot of time with the field of internationalization, and has been leading most of the internationalization work for Rust in the last few years, including the discussions about internationalizing the standard library/compiler, and non-ascii identifiers.

In such discussions, a bunch of dissenting points are brought up almost every time, often over and over again. Most of them are fallacious in some form or the other.

Firstly, I kind of want to paint a picture of the people most served by internationalization in programming. While English has certainly become somewhat of a lingua franca amongst programmers, this is not the case globally. There are many countries in which advanced technical education for many fields exists in a language that is not English, and they end up producing a lot of programmers who are not fluent in English. South Korea, Japan, Taiwan, Brazil, and China are all good examples of such countries, to varying degrees.

For hopefully obvious reasons, there's far less visibility into this for primarily-English-speaking open source communities. For example, there are huge Chinese and Portuguese Rust communities, but they hang out on completely different venues. They exist -- and they're wonderful -- but we don't have much cross-communication with them.

More than "advanced technical communities that don't speak English", there's the way thornier situation of speakers of languages where that's not even an option. There are so many people excluded from programming because they don't speak English and their language is lacking in resources. Now, that's not a problem the Rust community can solve on its own. But we can absolutely take steps to reduce friction there; make sure we're not making the problem worse. This is directly in service of our goal of being more inclusive.

A bit of a personal anecdote: I went to college in a former British colony. As a former British colony, English is relatively common amongst educated individuals, and most people regardless of education can understand very basic English, so my college primarily used English. Also, as a former British colony, the resultant poverty and lack of infrastructure meant that many people did not have the opportunity to get a consistent education that might prepare them for a higher education program in English. There were a fair number of people who were far less fluent in English who had a lot of trouble keeping up with the higher level technical instruction. To the credit of the institution, it provided remedial classes for English, but this did not necessarily fix the problem (you really can't patch up fluency that quickly and easily). Many of these students managed to get to a point where they could manage and ended up being successful, but it's still a pretty large barrier. I'm sure there are plenty of people who bounce off of this kind of constraint, or don't even try. It's a huge case of survivorship bias to look at the people in programming now and say "look, everyone here speaks English, what's the problem?".


Anyway, to address some specific points that keep cropping up in such discussions (not necessarily quoting any particular instance, and some of these have not been brought up yet):

"English is the lingua franca of programming"

No, it is not, it's a major language programming is done in, there are many programmers who do not speak English.

"I mostly see English programmers in this community/crates.io/etc, what's the problem?"

This is a case of selection/survivorship bias. If programming were less hostile to non-English speakers, we would have a more vibrant and diverse community.

"This creates more work for maintainers"

Maintainers can choose whether they want to do this. In my experience, such work is typically done by a different community member who wants their "language subcommunity" to be able to use the project. And yes, it's hard to keep up to date, which is why you can ask translators if they can commit to fixing up stuff when you udpate things (and tag them when you do so). If not, remove it.

"English isn't my mother tongue and I still prefer doing programming in English"

This is true for many people, and it's true for me. This doesn't mean it's true for everybody, the situation is different for each language. Furthermore, often the reasons behind this feeling are because of a lack of good materials and vibrant communities for those languages: precisely the problem that internationalization helps to fix!

"This contributes to balkanization of the ecosystem"

Firstly, this is already an issue: the Chinese community writes crates that the primarily-English community doesn't use, and to some extent vice versa. Most people don't notice this, it's fine.

Secondly, to drill down a little bit into this: This isn't really balkanization. The people who are enabled by internationalization would otherwise likely never contribute to the ecosystem in the first place. Internationalization enables access, and yes, some of those people will create artifacts that are less useful to you, but they would never have created those artifacts in the first place if they didn't have that access! Just because it's not useful to you doesn't mean it shouldn't exist.

Besides, this isn't a zero sum game. If someone wishes to write a cool serialization framework that is written and documented in Portuguese, let them. There will be other serialization frameworks for you to use, and perhaps one day Rust will have the tooling support for crates to be fully documented in multiple languages.

This feature in particular reduces balkanization since it actually makes it so that these crates can exist on crates.io in a way that's accessible to speakers of multiple languages at once.

"It would be easier if we stuck to one language so everything evolves as one giant community/ecosystem"

There are reasons why this is an imperialistic viewpoint and really should not be entertained in this community. However, it's not always borne out of malice, and to address it whilst assuming good faith: Forcing everyone to speak a language to participate just leads to fewer people participating; it does not actually work. It's a barrier to entry more than anything else.

35 Likes