Planned CStr changes seem inefficient

In what situation do you have a Mutex<CStr> that you can extract the length from? Could you give an example of code that would be broken by making CStr thin?

You canā€™t construct a standard library Mutex that contains an unsized type, but a custom thrird party Mutex could potentially allow it. Yes, itā€™s unlikely to actually happen in practice. But unsoundness in safe code is not acceptable, even if itā€™s unlikely

You can, though I don't think there's any blessed way to do so with CStr (and its unspecified layout).

4 Likes

Yeah, I should have clarified ā€œother than via unsizingā€.

3 Likes

Currently the only (safe and stable) method to construct a DST is through unsizing coercion (via the Unsize and CoerceUnsized traits). While this works for slices (eg. [u8; 6] -> [u8]) and trait objects (eg. i16 -> dyn Debug), you cannot unsize to CStr as there is no sized equivalent (ie. there does not exist a T such that T: Unsize<CStr>).

As CStr has an unspecified layout, and there is no way to get a *const CStr (without features like ptr_metadata or set_ptr_value), stable code cannot manipulate CStrs in the same way it can manipulate slices (which have a specified layout). This, I don't believe this is a breaking change, but I would be interested to see what code you think this will break.

1 Like

Yes, it is specified, itā€™s a null-terminated sequence of bytes. The layout of &CStr is not specified, but nor is that of &[u8].

Yes there is, just coerce from &CStr.

2 Likes

There wouldn't necessarily be any unsoundness since size_of_val could simply panic or abort in cases where the CStr is wrapped in an UnsafeCell. While this is a breaking change, the breakage is extremely limited, since you can't safely put CStr behind an UnsafeCell. And even if you did it wouldn't be very useful, since there is literally no way to get a &mut [u8] from it without relying on unstable implementation details (i.e. via pointer casts).

So I think that the "use-case" of putting CStr directly inside a Mutex or other interior-mutability constructs is completely useless, and can be completely discarded. There is no useful stable APIs you could provide that does more than just &CStr.

That said, maybe there's a usecase for a trailing CStr in a mutex, to access other fields. But in this case, you should just factor out the Sized fields and put them in the mutex outside the CStr.

So if trailing CStr inside an UnsafeCell is not useful and a plain CStr in an UnsafeCell is not useful, I don't see any use-case for putting CStr in an UnsafeCell.

Maybe as future-proofing, we should just start panicking in size_of for UnsafeCell<CStr> or UnsafeCell<StructWithCStrTail> after a crater run.

While looking for past material about this issue, I found this Github issue, and running a search for UnsafeCell<CStr> in all of Github gets 0 results. Which is to say, if this is actually being used, it is exceedingly uncommon. We have made breaking changes for things that were used more than UnsafeCell<CStr> before.

1 Like

It seems that I was wrong, and that rustc does allow you to construct a raw pointer to a CStr. Stable code is allowed to cast from *mut [c_char] to *mut CStr, a cast which would break if the metadata of CStr was changed from usize to ().

Even though its the layout of a pointer to CStr is explicitly unspecified[1], it is observable on stable, and changing it would be breaking (in theory, not sure how common code casting a slice pointer to a CStr pointer is in practice)


  1. From the CStr docs:

    Note that this structure does not have a guaranteed layout (the repr(transparent) notwithstanding) and should not be placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverage CStr::as_ptr and the unsafe CStr::from_ptr constructor to provide a safe interface to other consumers.
    https://doc.rust-lang.org/nightly/std/ffi/struct.CStr.html ā†©ļøŽ

Observable != semver breakage. Semver is all about the documented API guarantees, CStr has been documented to not have a guaranteed layout its entire existence so relying on that is not allowed.

4 Likes

AFAIK the intent was to keep this invariant. Why would that cost strlen scans?

from_bytes_until_nul effectively needs to do a strlen to ensure the nul byte is present, as does _with_nul to ensure only one nul byte. Currently, this length is kept in the fat pointer until the CStr is consumed (at least, if it doesn't go through as_ptr), but if this isn't the case, later use of the string will duplicate the strlen.

3 Likes