This is why I didn’t want to bring the design of strffi up. I’m not saying CString needs to be replaced by SeaString. CString should be the “one size fits most people” knife that does the job 80+% of the time because it cuts stuff good enough. SeaString is the giant cabinet filled with every conceivable kind of blade with interchangeable handles and weights that crazy people use when they need to de-bone a rhinoceros.
I don’t think CString needs to support every encoding, and every structure. I do think it should support the most obvious one implied by its name. If that can’t be done, it should be replaced so that people don’t unknowingly use the wrong one.
To be clear: as much as I don’t like the incorrectness of CStr[ing], even less do I like the potential for people to unwittingly make a mistake that’s easy to go unnoticed. Given that there are numerous cases in Rust’s standard library that appear to make this very mistake, I don’t think that point can simply be ignored.
But it’s not of an ambiguous code page, in the sense that we don’t know what it is. C strings are encoded in the current multibyte encoding, unless you’re dealing with an exception like cairo’s text functions. Every example I’ve seen of Rust code that’s currently using CStr[ing] and creating them from a Rust string want the C strings to be in this encoding, but currently aren’t.
Also, if a C function wants a string not in this encoding, these days the odds are that it wants UTF-8, which is also why I proposed adding a type specifically for that. If it’s not the current C runtime multibyte encoding or UTF-8, then that’s where you can break out the strffi box of toys.
It’s just an abbreviation for the structure parameter to keep names from being ridiculously long. Technically, the closest analog to the current CString in strffi would be ZUtf8RString, which is a wrapper around SeaString<ZeroTerminated, UnvalidatedUtf8, RustAlloc>. It’s selection had nothing to do with anything in Windows; it was just the first letter of “ZeroTerminated” and was distinct. For comparison, the same thing using slices for the structure would be SUtf8RString, but this is all kinda tangential. There’s a method to the madness.
Speaking in terms of the proposed CString replacement for libstd, MbString would also be fine. For me, the important thing is not being in a situation where old documentation on the internet directs people to the wrong type. If more focused changes are made (such as deprecating just the problematic methods), then MbString wouldn’t be needed, and we could keep CString. Again, I just want to avoid people being bitten by making the “obvious” choice.
Neither of those say that “C strings” don’t have any particular encoding. The Python one in particular specifically says that Unicode strings will be transcoded using the current default encoding (though I don’t know how the Python runtime’s encoding and the C runtime’s encoding interact). Python also separates “unknown lump of 8-bit codes” into a totally distinct type code: et.
I feel like we’re talking across purposes, here. My primary concern is that the type called CString, which one might reasonably want to use to pass a Rust string you have to C, does not handle encoding that string, making it unfit for that purpose unless the current environment happens to be configured the right way.
To put it another way: I believe CString the type interprets the name “C string” too broadly, and its interface is trivial to use incorrectly, and such incorrect use will not cause any obvious problems on less than comprehensive auditing.
I don’t think that’s reasonable. Of the ones I can recall off the top of my head, there’s ZMbCString, ZWCString, ZUtf8CString, ZUtf16CString, ZUtf16beCString, ZzMutf8CString, PbUtf16BstrString, PbRaw8BstrString, ZAnsiCString, ZOemCString, SAnsiCString, SOemCString, SWCString, SAnsiWinString, SWWinString, GoUtf8RString. Oh! And there’s bstrings (which are not BSTRs). And also the twin console codepages on Windows, too.
This is why I wanted to build the toolbox instead of trying to enumerate all of them. If I left any one out, people might be forced to either try and push fixes upstream, or reimplement it themselves when 80% of the code probably already exists. And I absolutely do not think this level of complexity belongs in libstd.
If we’re talking about C (if you’re doing this in Rust, you do it with Rust strings in Unicode as far as I’m concerned), then my understanding is that all your *char strings should be in the current multibyte encoding (as set by setlocale). If you have strings in different encodings, and you try to concatenate them, then that’s a bug in your program that libc cannot be expected to deal with.
Again, I feel like we’re arguing about different things. I’m not concerned with turning CString into the be-all, end-all representation of foreign strings. I just want it to use the encoding that functions in libc are going to expect. Like, if someone constructs a path in Rust, and wants to pass it to the cairo function that saves a PNG to disk, they need to encode the path into the encoding libc expects.
Yes, it would be nice if they used entirely Unicode-aware APIs. That’s not always an option, though. If they’re binding cairo, and they’re on Windows, or for some reason running in an environment where setlocale returns something other than UTF-8… well, better to work as best it can, rather than cause weird behaviour or to have the image saved under a garbled filename.
That’s pretty much exactly what I think should be done. I view CStr[ing] as an “input/output” step. String manipulation in Rust code should be done using Rust strings in Unicode. Once you go to call a C function, you need to transcode the string to whatever that function is going to expect. It’s possible that’s UTF-8, or Latin-1, or that really old DOS codepage that ZIP uses, but the reasonable default would be the encoding that libc itself is using for all of the strings it deals with.