Planned CStr changes seem inefficient

In what situation do you have a Mutex<CStr> that you can extract the length from? Could you give an example of code that would be broken by making CStr thin?

You canā€™t construct a standard library Mutex that contains an unsized type, but a custom thrird party Mutex could potentially allow it. Yes, itā€™s unlikely to actually happen in practice. But unsoundness in safe code is not acceptable, even if itā€™s unlikely

You can, though I don't think there's any blessed way to do so with CStr (and its unspecified layout).

4 Likes

Yeah, I should have clarified ā€œother than via unsizingā€.

3 Likes

Currently the only (safe and stable) method to construct a DST is through unsizing coercion (via the Unsize and CoerceUnsized traits). While this works for slices (eg. [u8; 6] -> [u8]) and trait objects (eg. i16 -> dyn Debug), you cannot unsize to CStr as there is no sized equivalent (ie. there does not exist a T such that T: Unsize<CStr>).

As CStr has an unspecified layout, and there is no way to get a *const CStr (without features like ptr_metadata or set_ptr_value), stable code cannot manipulate CStrs in the same way it can manipulate slices (which have a specified layout). This, I don't believe this is a breaking change, but I would be interested to see what code you think this will break.

1 Like

Yes, it is specified, itā€™s a null-terminated sequence of bytes. The layout of &CStr is not specified, but nor is that of &[u8].

Yes there is, just coerce from &CStr.

2 Likes

There wouldn't necessarily be any unsoundness since size_of_val could simply panic or abort in cases where the CStr is wrapped in an UnsafeCell. While this is a breaking change, the breakage is extremely limited, since you can't safely put CStr behind an UnsafeCell. And even if you did it wouldn't be very useful, since there is literally no way to get a &mut [u8] from it without relying on unstable implementation details (i.e. via pointer casts).

So I think that the "use-case" of putting CStr directly inside a Mutex or other interior-mutability constructs is completely useless, and can be completely discarded. There is no useful stable APIs you could provide that does more than just &CStr.

That said, maybe there's a usecase for a trailing CStr in a mutex, to access other fields. But in this case, you should just factor out the Sized fields and put them in the mutex outside the CStr.

So if trailing CStr inside an UnsafeCell is not useful and a plain CStr in an UnsafeCell is not useful, I don't see any use-case for putting CStr in an UnsafeCell.

Maybe as future-proofing, we should just start panicking in size_of for UnsafeCell<CStr> or UnsafeCell<StructWithCStrTail> after a crater run.

While looking for past material about this issue, I found this Github issue, and running a search for UnsafeCell<CStr> in all of Github gets 0 results. Which is to say, if this is actually being used, it is exceedingly uncommon. We have made breaking changes for things that were used more than UnsafeCell<CStr> before.

1 Like

It seems that I was wrong, and that rustc does allow you to construct a raw pointer to a CStr. Stable code is allowed to cast from *mut [c_char] to *mut CStr, a cast which would break if the metadata of CStr was changed from usize to ().

Even though its the layout of a pointer to CStr is explicitly unspecified[1], it is observable on stable, and changing it would be breaking (in theory, not sure how common code casting a slice pointer to a CStr pointer is in practice)


  1. From the CStr docs:

    Note that this structure does not have a guaranteed layout (the repr(transparent) notwithstanding) and should not be placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverage CStr::as_ptr and the unsafe CStr::from_ptr constructor to provide a safe interface to other consumers.
    https://doc.rust-lang.org/nightly/std/ffi/struct.CStr.html ā†©ļøŽ

Observable != semver breakage. Semver is all about the documented API guarantees, CStr has been documented to not have a guaranteed layout its entire existence so relying on that is not allowed.

4 Likes

AFAIK the intent was to keep this invariant. Why would that cost strlen scans?

from_bytes_until_nul effectively needs to do a strlen to ensure the nul byte is present, as does _with_nul to ensure only one nul byte. Currently, this length is kept in the fat pointer until the CStr is consumed (at least, if it doesn't go through as_ptr), but if this isn't the case, later use of the string will duplicate the strlen.

3 Likes

Maybe from_bytes_until_nul should have a fast path that checks whether the last byte is nul? I suspect that for many use cases, the last byte would usually or always be nul. And if CStr is thin, then from_bytes_until_nul would only need to check that there's a nul byte somewhere in the slice, not necessarily calculate where the first nul byte is.

That doesn't help with from_bytes_with_nul though.

2 Likes

This doesn't work because from_bytes_until_nul is documented to stop at the first nul byte.

If the first byte is a nul character, this method will return an empty CStr. If multiple nul characters are present, the CStr will end at the first one.

So a scan is necessary. This also can't happen for from_bytes_with_nul because it needs to check that there are no interior nul bytes.

2 Likes

In the case where &CStr is thin, checking the last byte is sufficient if it's a nul byte, as it's guaranteed that a nul byte exists, even if later use of the string will stop at an earlier nul. This is also a somewhat common trick in C, to write to a buffer, put a nul in the last byte, then use a pointer to the buffer as a nul terminated array pointer, which will either be the written array or terminated at the buffer size.

That this trick doesn't work for _with_nul was acknowledged.

2 Likes

Won't this break deallocation of the CStr of allocated with a Rust allocator? If there are any internal nul bytes and CStr is thin, how would you know what length of allocation to pass?

Unlike malloc/free rust allocators except the length on deallocation after all.

EDIT: In fact, couldn't you cause unsoundness with Rust allocated thin CStr by simply writing an internal nul byte in the string, without any usage of unsafe?

Since CStr does not own the allocation this should not be a problem. CString has the safety invariant that there are no internal nul bytes. How would you write a nul in a CStr or CString without unsafe?

Ah you are right, there appears to be no safe way to write into a CString. And since CStr is borrowed I guess that solves that. I withdraw my concern.

CStr is not borrowed. &CStr is (immutably) borrowed.

If CString tracks allocation size separately, there's no problem handing out a &mut CStr to everything up to the first NUL. If the allocation size is only known from the location of the only NUL, then it can't hand out &mut CStrs.

Currently it seems that CString is a Box<[u8]> internally. Which is a fat pointer and thus tracks size.