In what situation do you have a Mutex<CStr>
that you can extract the length from? Could you give an example of code that would be broken by making CStr
thin?
You canāt construct a standard library Mutex
that contains an unsized type, but a custom thrird party Mutex
could potentially allow it. Yes, itās unlikely to actually happen in practice. But unsoundness in safe code is not acceptable, even if itās unlikely
You can, though I don't think there's any blessed way to do so with CStr
(and its unspecified layout).
Yeah, I should have clarified āother than via unsizingā.
Currently the only (safe and stable) method to construct a DST is through unsizing coercion (via the Unsize
and CoerceUnsized
traits). While this works for slices (eg. [u8; 6] -> [u8]
) and trait objects (eg. i16 -> dyn Debug
), you cannot unsize to CStr
as there is no sized equivalent (ie. there does not exist a T
such that T: Unsize<CStr>
).
As CStr
has an unspecified layout, and there is no way to get a *const CStr
(without features like ptr_metadata
or set_ptr_value
), stable code cannot manipulate CStr
s in the same way it can manipulate slices (which have a specified layout). This, I don't believe this is a breaking change, but I would be interested to see what code you think this will break.
Yes, it is specified, itās a null-terminated sequence of bytes. The layout of &CStr
is not specified, but nor is that of &[u8]
.
Yes there is, just coerce from &CStr
.
There wouldn't necessarily be any unsoundness since size_of_val
could simply panic or abort in cases where the CStr
is wrapped in an UnsafeCell
. While this is a breaking change, the breakage is extremely limited, since you can't safely put CStr
behind an UnsafeCell
. And even if you did it wouldn't be very useful, since there is literally no way to get a &mut [u8]
from it without relying on unstable implementation details (i.e. via pointer casts).
So I think that the "use-case" of putting CStr
directly inside a Mutex
or other interior-mutability constructs is completely useless, and can be completely discarded. There is no useful stable APIs you could provide that does more than just &CStr
.
That said, maybe there's a usecase for a trailing CStr
in a mutex, to access other fields. But in this case, you should just factor out the Sized
fields and put them in the mutex outside the CStr
.
So if trailing CStr
inside an UnsafeCell
is not useful and a plain CStr
in an UnsafeCell
is not useful, I don't see any use-case for putting CStr
in an UnsafeCell
.
Maybe as future-proofing, we should just start panicking in size_of
for UnsafeCell<CStr>
or UnsafeCell<StructWithCStrTail>
after a crater run.
While looking for past material about this issue, I found this Github issue, and running a search for UnsafeCell<CStr>
in all of Github gets 0 results. Which is to say, if this is actually being used, it is exceedingly uncommon. We have made breaking changes for things that were used more than UnsafeCell<CStr>
before.
It seems that I was wrong, and that rustc does allow you to construct a raw pointer to a CStr
. Stable code is allowed to cast from *mut [c_char]
to *mut CStr
, a cast which would break if the metadata of CStr
was changed from usize
to ()
.
Even though its the layout of a pointer to CStr
is explicitly unspecified[1], it is observable on stable, and changing it would be breaking (in theory, not sure how common code casting a slice pointer to a CStr
pointer is in practice)
From the
CStr
docs:Note that this structure does not have a guaranteed layout (the
https://doc.rust-lang.org/nightly/std/ffi/struct.CStr.html ā©ļørepr(transparent)
notwithstanding) and should not be placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverageCStr::as_ptr
and the unsafeCStr::from_ptr
constructor to provide a safe interface to other consumers.
Observable != semver breakage. Semver is all about the documented API guarantees, CStr
has been documented to not have a guaranteed layout its entire existence so relying on that is not allowed.
AFAIK the intent was to keep this invariant. Why would that cost strlen
scans?
from_bytes_until_nul
effectively needs to do a strlen
to ensure the nul byte is present, as does _with_nul
to ensure only one nul byte. Currently, this length is kept in the fat pointer until the CStr
is consumed (at least, if it doesn't go through as_ptr
), but if this isn't the case, later use of the string will duplicate the strlen
.