[pre-RFC] Deprecate and replace CStr/CString

comex · April 3, 2017, 11:08pm

To sum up, there are two cases: Windows, and non-UTF-8 Unix.

On Windows, some system APIs may expect strings encoded in the current code page. The code page can be changed, but not to UTF-8 or UTF-32, and not to UTF-16 except in managed applications (apparently). Therefore there is always the potential for data loss, so the correct answer is to not use those functions at all, prefering the wchar_t based ‘w’ versions. There should probably be a wchar_t variant of CString in the standard library; CString itself is still useful for libraries that use UTF-8 everywhere.
Non-UTF-8 Unix: If there is a problem, it is nowhere near limited to CString. OsStr is just an arbitrary collection of bytes, but OsStr::to_str and friends assume UTF-8; this is also what you get from env::args, File::read_to_string, and others. On the output side, io::Write::write_fmt assumes UTF-8, so plain old println! is broken in a non-UTF-8 locale. Properly supporting non-UTF-8 systems would require adding conversions in all these places, which I suspect is not going to happen. If it does, I suppose it would be worth distinguishing “theoretically current-C-encoding-encoded bag of bytes” C strings, as used by some libc functions, and “theoretically UTF-8-encoded bag of bytes”, as used by glib and other libraries. But to be clear, this only matters for display and user input purposes. For all other purposes, you want to preserve the original binary blobs. (In theory it also matters for hardcoded strings, such as standard path names (e.g. /dev/null). But I don’t think there is any non-negligible use of non-ASCII-superset encodings as C locales, such as, e.g., UTF-16 or EBCDIC; so it should be safe to encode in ASCII.)

Topic		Replies	Views
Deprecate or Lint CString::as_ptr libs	6	1726	March 25, 2019
Convenient null-terminated string literals libs	15	7899	June 16, 2021
Strings and UTF-8 language design	21	7608	March 25, 2019
Suggestion: str::to_c_string_lossy or CStr::from_bytes_may_clone libs	9	1513	May 22, 2019
Wild idea: deprecating APIs that conflate str and [u8] libs	59	3616	November 12, 2020

[pre-RFC] Deprecate and replace CStr/CString

Related topics