`into_string_lossy` methods for `CString` and `OsString`?

Rua · May 20, 2024, 5:20pm

The borrowed "FFI string" types CStr and OsStr have a pair of methods to convert them into UTF-8 strings, to_str and to_string_lossy. Both of them can borrow the original data, the latter returns Cow in case loss occurs and the string must be modified.

I noticed that the owned FFI strings CString and OsString are missing the same pair of functions to convert them into owned String. There is into_string, which is an owned counterpart to to_str, but there is no owned equivalent to_string_lossy. Yes, these two types have all the methods of CStr and OsStr, but that means you are creating a borrowed str from an owned FFI string. To get an owned String you have to use .to_string_lossy().into_owned(), which clones the data, so then you have both the original string allocation and a new String.

What about adding into_string_lossy to do this? It would lossily convert an owned FFI string to an owned String. If the conversion is lossless, it just takes ownership of the internal Vec<u8> and puts that in the String, so it's not allocating. If the conversion is lossy, then it could still reuse the original allocation if the changes can be made in place, and would only make a new allocation if that isn't possible. In practice, the conversion will usually be lossless, making it more efficient than .to_string_lossy().into_owned().

An instance where this would be useful, is for functions that query the OS and give you an OsString, like std::env::var_os. I want a String here, but generally with a lossy conversion rather than a fallible one, so std::env::var is not suitable.

CAD97 · May 21, 2024, 3:01am

You can accomplish the ownership transfer with the admittedly clunky .into_string().unwrap_or_else(|e| e.into_cstring().to_string_lossy().into_owned()). This does a copy in the lossy case, but arguably you will usually want to do so because shifting the buffer after the replacement splice is similarly expensive if not more so than a full buffer copy to the fixed string.

If I had to conjecture, that the function doesn't already exist feels like a victim of "don't consume ownership you aren't going to use" API design. A "better" consumer in that school of thought might instead do the conversion as something like

match s.to_str() {
    Ok(_) => s.into_string().unwrap(),
    Err(_) => s.to_string_lossy().into_owned(),
}

to avoid unnecessary ownership transfer... but since that leaves drop flag reliant inaccessible state consuming stack space for no reason, it's not really an ideal thing to write, even ignoring the likely repeated UTF-8 check.

So tldr I'm +1 on this probably fitting right in.

jrose · May 21, 2024, 4:45am

As a bonus, into_string_lossy can be done in a single pass, but all the others require two passes, which can hurt if the non-Unicode is at the end of the string. (That’s not an inherent limitation, I guess, but given the shape of the existing API.) This is unlikely to matter for the lengths of strings that usually end up in CString and OsString, but even so.

kornel · May 21, 2024, 3:52pm

This would also be beneficial for code size, since the multi-step chains repeat the allocation and data copying code.

system · August 19, 2024, 3:53pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PathBuf to CString libs	21	5771	January 2, 2021
Suggestion: str::to_c_string_lossy or CStr::from_bytes_may_clone libs	9	1495	May 22, 2019
Why doesn't the `into_string` method be available directly under `PathBuf` even though it already exists in `OsString`? language design	10	1063	February 28, 2024
Too many words on a `from_utf8_lossy` variant that takes a `Vec<u8>` libs	4	1794	January 3, 2021
Missing String->CString without copy libs	12	596	February 17, 2025

`into_string_lossy` methods for `CString` and `OsString`?

Related topics