I’m starting to think that maybe the Right Thing is for CStr(ing) to insist on being UTF-8, in the same way that str(ing) do. The only difference between CStr and str would be that CStr is guaranteed nul-terminated and its len() may be O(n); conversely str is not guaranteed nul-terminated, may contain internal U+00000000, and its len() is O(1).
This also clarifies the difference between CStr and OsStr. An OsStr's job is to faithfully round-trip whatever nonsense we got from the operating system, and that means it is in an uncertain encoding. It may even be in more than one encoding (consider the perfectly valid, as far as a Unix kernel is concerned, pathname /Пользователей/暁美 ほむら/זֹהַר.pdf where the first component is encoded in KOI8-R, the second in EUC-JP, and the third in ISO-8859-8). Conversions to printable strings have to be checked. It seems likely that nul-termination will also in practice be part of OsStr's contract, since most of the APIs it would be used with do in fact take nul-terminated strings, but it might be convenient to not make that part of the official API contract; if nothing else, a hypothetical future all-Rust OS would maybe like to equate OsStr with str.
(Is it feasible, I wonder, to make OsStr be UTF-16 on Windows? That would mesh well with a strict policy of using only the W interfaces.)
I agree that, at least for now, it makes sense to relegate legacy encoding support to crates.
While I agree that the wchar.h interfaces in particular should never be used, I hesitate to say that none of the components of libc that may perform locale-aware text processing should be used from Rust. The exception that comes to mind is getaddrinfo with AI_IDN; bypassing libc’s name resolver is a Bad Plan for most programs, and so is reimplementing IDNA, but AI_IDN works from the active locale. (I might propose AI_UTF8 to the glibc people.)
No argument from me there, either.
I don’t remember for sure, but yes, that sounds right. It was a long time ago and it was also very poorly documented.