In what situation do you have a Mutex<CStr>
that you can extract the length from? Could you give an example of code that would be broken by making CStr
thin?
You canāt construct a standard library Mutex
that contains an unsized type, but a custom thrird party Mutex
could potentially allow it. Yes, itās unlikely to actually happen in practice. But unsoundness in safe code is not acceptable, even if itās unlikely
You can, though I don't think there's any blessed way to do so with CStr
(and its unspecified layout).
Yeah, I should have clarified āother than via unsizingā.
Currently the only (safe and stable) method to construct a DST is through unsizing coercion (via the Unsize
and CoerceUnsized
traits). While this works for slices (eg. [u8; 6] -> [u8]
) and trait objects (eg. i16 -> dyn Debug
), you cannot unsize to CStr
as there is no sized equivalent (ie. there does not exist a T
such that T: Unsize<CStr>
).
As CStr
has an unspecified layout, and there is no way to get a *const CStr
(without features like ptr_metadata
or set_ptr_value
), stable code cannot manipulate CStr
s in the same way it can manipulate slices (which have a specified layout). This, I don't believe this is a breaking change, but I would be interested to see what code you think this will break.
Yes, it is specified, itās a null-terminated sequence of bytes. The layout of &CStr
is not specified, but nor is that of &[u8]
.
Yes there is, just coerce from &CStr
.
There wouldn't necessarily be any unsoundness since size_of_val
could simply panic or abort in cases where the CStr
is wrapped in an UnsafeCell
. While this is a breaking change, the breakage is extremely limited, since you can't safely put CStr
behind an UnsafeCell
. And even if you did it wouldn't be very useful, since there is literally no way to get a &mut [u8]
from it without relying on unstable implementation details (i.e. via pointer casts).
So I think that the "use-case" of putting CStr
directly inside a Mutex
or other interior-mutability constructs is completely useless, and can be completely discarded. There is no useful stable APIs you could provide that does more than just &CStr
.
That said, maybe there's a usecase for a trailing CStr
in a mutex, to access other fields. But in this case, you should just factor out the Sized
fields and put them in the mutex outside the CStr
.
So if trailing CStr
inside an UnsafeCell
is not useful and a plain CStr
in an UnsafeCell
is not useful, I don't see any use-case for putting CStr
in an UnsafeCell
.
Maybe as future-proofing, we should just start panicking in size_of
for UnsafeCell<CStr>
or UnsafeCell<StructWithCStrTail>
after a crater run.
While looking for past material about this issue, I found this Github issue, and running a search for UnsafeCell<CStr>
in all of Github gets 0 results. Which is to say, if this is actually being used, it is exceedingly uncommon. We have made breaking changes for things that were used more than UnsafeCell<CStr>
before.
It seems that I was wrong, and that rustc does allow you to construct a raw pointer to a CStr
. Stable code is allowed to cast from *mut [c_char]
to *mut CStr
, a cast which would break if the metadata of CStr
was changed from usize
to ()
.
Even though its the layout of a pointer to CStr
is explicitly unspecified[1], it is observable on stable, and changing it would be breaking (in theory, not sure how common code casting a slice pointer to a CStr
pointer is in practice)
From the
CStr
docs:Note that this structure does not have a guaranteed layout (the
https://doc.rust-lang.org/nightly/std/ffi/struct.CStr.html ā©ļørepr(transparent)
notwithstanding) and should not be placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverageCStr::as_ptr
and the unsafeCStr::from_ptr
constructor to provide a safe interface to other consumers.
Observable != semver breakage. Semver is all about the documented API guarantees, CStr
has been documented to not have a guaranteed layout its entire existence so relying on that is not allowed.
AFAIK the intent was to keep this invariant. Why would that cost strlen
scans?
from_bytes_until_nul
effectively needs to do a strlen
to ensure the nul byte is present, as does _with_nul
to ensure only one nul byte. Currently, this length is kept in the fat pointer until the CStr
is consumed (at least, if it doesn't go through as_ptr
), but if this isn't the case, later use of the string will duplicate the strlen
.
Maybe from_bytes_until_nul
should have a fast path that checks whether the last byte is nul? I suspect that for many use cases, the last byte would usually or always be nul. And if CStr
is thin, then from_bytes_until_nul
would only need to check that there's a nul byte somewhere in the slice, not necessarily calculate where the first nul byte is.
That doesn't help with from_bytes_with_nul
though.
This doesn't work because from_bytes_until_nul
is documented to stop at the first nul byte.
If the first byte is a nul character, this method will return an empty
CStr
. If multiple nul characters are present, theCStr
will end at the first one.
So a scan is necessary. This also can't happen for from_bytes_with_nul
because it needs to check that there are no interior nul bytes.
In the case where &CStr
is thin, checking the last byte is sufficient if it's a nul byte, as it's guaranteed that a nul byte exists, even if later use of the string will stop at an earlier nul. This is also a somewhat common trick in C, to write to a buffer, put a nul in the last byte, then use a pointer to the buffer as a nul terminated array pointer, which will either be the written array or terminated at the buffer size.
That this trick doesn't work for _with_nul
was acknowledged.
Won't this break deallocation of the CStr of allocated with a Rust allocator? If there are any internal nul bytes and CStr is thin, how would you know what length of allocation to pass?
Unlike malloc/free rust allocators except the length on deallocation after all.
EDIT: In fact, couldn't you cause unsoundness with Rust allocated thin CStr by simply writing an internal nul byte in the string, without any usage of unsafe?
Since CStr
does not own the allocation this should not be a problem. CString
has the safety invariant that there are no internal nul bytes. How would you write a nul in a CStr
or CString
without unsafe?
Ah you are right, there appears to be no safe way to write into a CString. And since CStr is borrowed I guess that solves that. I withdraw my concern.
CStr
is not borrowed. &CStr
is (immutably) borrowed.
If CString
tracks allocation size separately, there's no problem handing out a &mut CStr
to everything up to the first NUL. If the allocation size is only known from the location of the only NUL, then it can't hand out &mut CStr
s.
Currently it seems that CString is a Box<[u8]>
internally. Which is a fat pointer and thus tracks size.