[pre-RFC] Deprecate and replace CStr/CString

To sum up, there are two cases: Windows, and non-UTF-8 Unix.

  • On Windows, some system APIs may expect strings encoded in the current code page. The code page can be changed, but not to UTF-8 or UTF-32, and not to UTF-16 except in managed applications (apparently). Therefore there is always the potential for data loss, so the correct answer is to not use those functions at all, prefering the wchar_t based ‘w’ versions. There should probably be a wchar_t variant of CString in the standard library; CString itself is still useful for libraries that use UTF-8 everywhere.

  • Non-UTF-8 Unix: If there is a problem, it is nowhere near limited to CString. OsStr is just an arbitrary collection of bytes, but OsStr::to_str and friends assume UTF-8; this is also what you get from env::args, File::read_to_string, and others. On the output side, io::Write::write_fmt assumes UTF-8, so plain old println! is broken in a non-UTF-8 locale. Properly supporting non-UTF-8 systems would require adding conversions in all these places, which I suspect is not going to happen. If it does, I suppose it would be worth distinguishing “theoretically current-C-encoding-encoded bag of bytes” C strings, as used by some libc functions, and “theoretically UTF-8-encoded bag of bytes”, as used by glib and other libraries. But to be clear, this only matters for display and user input purposes. For all other purposes, you want to preserve the original binary blobs. (In theory it also matters for hardcoded strings, such as standard path names (e.g. /dev/null). But I don’t think there is any non-negligible use of non-ASCII-superset encodings as C locales, such as, e.g., UTF-16 or EBCDIC; so it should be safe to encode in ASCII.)

3 Likes