Better support translations for The Book?

Over in

, I talked about making changes to the book. One unfortunate part of this work is that it will also require a lot of work from the community translations http://doc.rust-lang.org/#community-translations This is one reason that I wasn't trying to really encourage such things, as I knew I wasn't happy with the actual state of the book.

So, since we're re-thinking the book, and since this revision will be the last, big one: how can we help incorporate these translations into the official docs? Any thoughts or ideas?

I would say it depends of the kind of translations we expect: should they be accurate translations of the English book, or other books written in other languages ?

In the first case, I think there would be a rather string need to have a way to easily keep track of the changes made to the English book, so that they could be easily listed for further translation. (For example, as a French speaker, it would be good if I could easily see which part of the French book are likely outdated and would need updating, so that I don’t need to read the whole git history of the book looking for the changes to the English book that have not yet been translated to French.) But I don’t know if such tools exist…

In the second case, much more freedom is given to each book to handle itself. It is (I believe?) more or less the actual situation, and it makes it difficult to refer to these books as “official”, as there is pretty much no guarantee that their contents are aligned on the English book.

https://github.com/kgv/rust_book_ru seems to track changes in the book and to update the translation regularly.

@mkpankov may know more details.

1 Like

Anything in-tree should be a straight translation of the official book.

I suggest considering the use of services for community-driven translation. In particular, I would point out transifex.com and crowdin.com. Both of them support open source projects.

Pros:

  • Monitoring of the actuality of translation based on git repository becomes trivial. (It is native platform capabilities)
  • Built-in glossary of translations, approval, validation, discussion, etc
  • User friendly UI specifically for translation (very easy to become involved in the process of translation.)
  • Various translation teams can be assembled in one place.

Cons:

  • Transferring existing translations on the platform is not trivial (I developed solutions to create a glossary file from translated files and upload it to the Crowdin platform.)
  • We will have to create a special script for pulling translation from platform back into the repository.
  • Lack of support for markdown (but working with Markdown files as TXT files is fine)

P.S. I am an owner of https://www.transifex.com/rust/rust-book/ and https://crowdin.com/project/rust-lang. They are not used now, and I am ready to transfer them to core team at any time.

1 Like

see also

https://github.com/azerupi/mdBook/issues/5

1 Like

Hello everyone! Sorry for inactivity, end of the year was tough.

I’ll try to outline where Russian translation currently is, and what help could we use. I was one of the main authors of this translated version.

Overall, we have everything translated, and it was synchronized with the original at some point. I’ll now cover some specifics in several areas. Feel free to ask any questions.

Tracking changes

I validated that everything was synchronized with 1.2 book in, like, Aug 15. Since then, other Russian community members (including @defuz) sporadically updated parts of book that are noticed to be outdated. Sometimes I create issues in our repo when I notice some big changes to some chapter are underway. The current update tracking process, I believe, is pretty chaotic. I don’t know if current Russian book is fully updated to match original.

We have rudimentary support for simpler tracking, but it lacks usability: there are files with git commit hash next to markdown files of chapters. These files mean the last revision in original repo when the translation was updated. I don’t know of any real usage of this, as infrastructure for processing this is missing.

What would be better, I think, is to store translated version in same repo, or at least unite the original and translation via git submodules. That way, noticing out-of-date translation would be way easier.

Completeness and adequateness of translation

As I said, we have everything translated, in terms of volume. We also have a policy as to what should be translated and what shouldn’t - like, names of people that refer to Rust’s authors shouldn’t be translated. I can provide the details if necessary.

We don’t deviate from structure of the book at scale larger then a paragraph. Where we do deviate a bit, sometimes - is where we decided to paraphrase the original sentences because they were hard to read and fully understand in Russian or were stylistically strange per norms of Russian. We didn’t specifically rewrite any noticeable part of the book.

Also, we had changes reviewed and continue to do review of all changes. Reviewers team is about 5 people. This is to underline that I’m not the sole judge of the style (and overall quality) of the translation :slightly_smiling:

There’s also one swiping stylistic change: we decided to prefer more formal style with “we” instead of “I” for author, that leans towards style of university textbooks. I don’t mean we stripped off all jokes or anything - it’s just “feel” we changed.

Overall, we strove to not blindly “translate”, but to “localize” the content.

Technical moments

We not only translated the book, but also provided PDF, Ebook and MOBI versions. This is done via Calibre and is pretty flaky. Different readers support formats differently, and there were issues in reader version of the book emitted by Calibre we couldn’t fix as fighting with styling Calibre does wasn’t successful. Could really use help here.

We also have continuous deployment to GitHub Pages.

Organizational problems

The repo that currently has the translation is owned by GitHub user kgv, who was inactive since last spring. I tried to find some contact info but couldn’t. I have write access, along with some other people, but these are not contributors that are most active today. Not having access to repo’s settings, I thought of forking the repo to our organization, that currently has the repo of Russian community site. I didn’t get to it yet.

My opinion on translation services

As the bulk of the book is already translated, we have review setup, and we use GitHub with Travis to continuously deploy the book, I don’t think we could benefit from them. There are some really tricky problems in translating a technical text like TRPL - we had days of battles in review of changes. So I don’t think barriers of git and GitHub are the bottleneck for those who would like to participate in translation.

2 Likes

This should be the way the book is written in general, so I would take upstream patches to fix this.

Thanks so much for the report!

In two weeks I finish my exams and I will have a lot more time to work on mdBook. I would like to tackle the multi-language support. At the moment my idea of multi-language support is having 1 TOC and multiple source languages to render as one (html) book. I still have to tink about how the TOC will be translated.

Let me know if there is anything I am neglecting about this issue.

When multi-language support is implemented in mdBook, this can be done more easily (for the new book). Also the fact that the new book has it's own repository will make tracking changes probably a lot easier.

There is also work being done in that department in mdBook, to provide an export function for pdf and ebook.

Why having multiple TOCs in different languages is not possible?

Yes, but we still will have to move files to single repo, and last time that was kind of issue, as having something in official repo means it's official. But our translation isn't (at least for now :slightly_smiling:)

I'll come over and try to share more details about our experience with Calibre.

1 Like

Yes of course it is, but I thought it would be easier to manage 1 TOC. Changes in the TOC would be directly reflected in all languages instead of having to mirror it manually. But I could see this as a disadvantage too if the translation is lagging behind.

Also one thing to consider is that if you ave 1 TOC dictating the structure for all languages you are assured to have a 1:1 mapping for pages in all languages. So you could be on a specific page and change the language via a menu button and be on the same page in the different language. Without the 1:1 mapping guarantee, it would be a lot more complex / impossible to do that.

@Azerupi

I don’t think that the common TOC for all translations is a good idea. Translations, in any case, will happen with a certain latency. When the structure of the book is changing, we can not immediately support the a consistency of chapters and sections between the original and the translations.

I think that the translations may be in the same repository with the original book, but should be implemented as separate books. We can just make a relinking between them on the index pages, I think it will be enough.

1 Like

I want to tell more about Transifex, to explain its benefits. In my opinion, two of the biggest challenges today for translators is monitoring of consistency between the original and the translation (especially for small changes), and processes of review (partially resolved by Reviewable.io, but I think we can better). I want to tell you how Transifex solves these problems. The bellow is true for other services that are specialised in community translations.

Monitoring of consistency between translation and the original.

Transifex automatically maintains mapping paragraph-to-paragraph between the original and the translation. When the original file is changed in the repository, the service automatically pulls up it and invalidates corresponding paragraphs in the translation. In the interface it is displayed as “the resource is translated to 98%." Moreover, we can immediately see what places we should update to bring translation into full compliance with the original.

Review process

Today we use the model of Pull Requests in Russian translations project to review new translations and improvements. In my opinion, such a model is good for the code but is not well suited for text, especially translation.

Even if suggested piece of translation is sufficiently large, it is made as single PR. We have only 2 states for the entire PR:

  1. It is fully approved and merged into master entirely
  2. It has any comments that requiring changes before merging.

Therefore, the process of review lasts for several days and works on the principle “everything or nothing". In Transifex processes of translation, review and improvements happen continuously. Anyone at any time can provide translation or improvements for every paragraph, and the moderator can accept or reject it. Translation, review and discussion are combined in a single interface. I think that this model allows us to update the translation much faster.

Glossary

I also think that it is important to have a good glossary for translation. At least in the Russian language, many of the concepts (especially such as lifetime/ownership/borrowing) do not have the uniquely translation. It is important to use the same word throughout the entire book. Transifex highlights all the words that are in the glossary at the translation interface. This allows translators to use the same terminology.

Integration with Github

I also want to add that the using of Transifex does not mean that we should refuse the using the Github repository. They can (and probably should) work in close integration. But it means that any changes in the translation should occur through Transifex. We can not simultaneously support the changes through PR and through Transifex for translations. All changes in the translation should be automatically pulled out from Transifex to the repository.

@defuz I know, but we would give up an awesome feature I think: Being able to change the language from a chapter and be redirected to the same chapter in the other language.

I made a little example in the tracking issue:

Let's take a hypothetical situation with the Rust book. Let's say I am reading a blog post and it references some chapter in the Rust book, for example the chapter about ownership. But English is not my main language and it would be a lot easier to understand the chapter in my native language. If we have 1 to 1 mapping on page / chapter level the user could then select his language (if it is supported) from a dropdown menu and he would land on the exact same page in his chosen language.

And to be honest, if you have different TOCs you essentially have different books. There is little gain to support that, other than being able to group all the translations in one directory and build them in one go.

I am discussing this in more length in the tracking issue. Feel free to come by and discuss it with us there to avoid hijacking this thread :wink:

1 Like

I requested Azerbaijan language, can you accept my request in transifex?

@steveklabnik I would like pop up the translation topic.

Preamble. The Russian community have done a good job on full translation by @mkpankov and other rus contributors. Now Russian community is woken up again to update the Rust book up to 2018 ed.

This conversation says the main headache is 1) tracking the (relatively small) changes in original English book and transferring them into translated version. Monitoring the concistency between original and translated version. On more option is track big changes between releases only.

Plus 2) the internal reviewing process for translated text. I omit that internal but also important part here. It's up to translator team.

I would like to talk about 1) aspect only and try to suggest an approach that needs some changes in process of writing English version. I would like those changes to be "automatic", so they do not disturb writers. I want to know your opinion about suggested approach and will be that possible to implement?

I will not talk about other related items like — multi lang TOC version + multi lang menu, transifex service, glossary, organization problems, etc.

As I see, transifex like service is good for translators, but that means writers should replace github by using transifex, that is bad case for writers unfortunately. The reasons are payed service, lack of smooth import/export integration between two services. So I omit that too.

3 Likes

I would like to write about three aspects of possible approach.

  1. The tracking changes main idea using github
  2. How we can update writers flow and how that can be possible to achieve
  3. Translators approach

1. Tracking changes main idea

The minimal identified translation item (MITI/MTI) can be paragraph, piece of code, title, sub title, etc. Any relatively short piece of text similar to paragraph separated from any other paragraph.

Every MITI contains meta-info (or anchor). That can be unique number, say an arbitrary UUID number. It will help track on every MITI between original English text and other language translations. MITI info can also contain simple hash/crc code for current mti. It can be helpful for future quick check if mti content has changed at some point of time.

Current markdown specification doesn't have anything like meta-data tags, so only "comment" is suitable here now.

MITI-data can be placed inside comment tag before or after identified paragraph (miti).

So possible example can look like:

@miti:abcd89-3456-edaffc-XXXXXX/4758697

Welcome to The Embedded Rust Book: An introductory book about using the Rust Programming Language on "Bare Metal" embedded systems, such as Microcontrollers.< !-- @miti:abcd89-3456-edaffc-XXXXXX/4758697 -- >

So it contains arbitrary unique UUID + hash/CRC value.

  • UUID gives ability to track MITI between language translations
  • Hash/CRC can be helpfull for later check if paragraph's main content was updated at some point

After processing MD files by mdbook utility those comments are put into target HTML.

I am still very interested in having them, but I don't have any real ideas on how best it is to do this. At minimum, it feels blocked on https://github.com/rust-lang-nursery/mdBook/issues/5