Pre-RFC: Make `*CStr` a Thin Pointer

  • Make it thin
  • Keep it fat
0 voters

Summary

Make *CStr a thin pointer via extern type (RFC 1861). CStr::from_ptr() will become zero-cost, while CStr::to_bytes() will incur a length calculation.

Motivation

The CStr type was introduced in RFC 592 during Rust 1.0-alpha as a replacement of the slice type [c_char], where one of the motivations was

… in order to construct a slice (or a dynamically sized newtype wrapping a slice), its length has to be determined, which is unnecessary for the consuming FFI function that will only receive a thin pointer. …

However, Rust at that time only supported three kinds of dynamic-sized types: str, [T] and trait objects, where all of them become fat pointers when referenced. An attempt to introduce DST with thin pointer was made as RFC 709, but due to time constraint close to the release of 1.0, it was postponed and kept as a low-priority issue.

Thus the implementation of CStr chose to wrap a [c_char] and provides the following FIXME:

pub struct CStr {
    // FIXME: this should not be represented with a DST slice but rather with
    //        just a raw `c_char` along with some form of marker to make
    //        this an unsized type. Essentially `sizeof(&CStr)` should be the
    //        same as `sizeof(&c_char)` but `CStr` should be an unsized type.
    inner: [c_char]
}

Fast forward to 2017, extern type (RFC 1861) was introduced to represent opaque FFI types which are fairly popular in C as a way to hide implementation detail. These types have unspecified size in the public interface, and also are represented as thin pointers. The extern type RFC was accepted and implemented as an unstable feature in Rust 1.23.

With the introduction of extern type, suddenly we have a way to fix the FIXME by changing the inner slice into such extern type:

extern {
    type CStrInner;
}
#[repr(C)]
pub struct CStr {
    inner: CStrInner,
}

Thus this RFC is proposed to gauge interest if we really want to fix this issue, and sort out potential unsafety before merging into the standard library.

Guide-level explanation

The main implication of making *CStr thin is that the length is no longer stored alongside the pointer. Some signficant changes are:

  • CStr becomes #[repr(C)] and its pointer type should be compatible with char* in C.
  • CStr::from_ptr becomes free.
  • CStr::to_bytes and other getter methods now require length calculation.

Fortunately the documentation of std::ffi::CStr already included tons of warnings about future changes, so we could assume users not relying on these performance characteristics in code.

Reference-level explanation

An implementation of such change is available as the thin_cstr crate, and the source code is available at https://github.com/kennytm/thin_cstr.

The change only affects the unsized CStr type. The owned CString type will not be modified.

Drawbacks

Assuming the C string has length n,

Function Before After
from_ptr O(n) O(1)
from_bytes_with_nul O(n) O(n)
from_bytes_with_nul_unchecked O(1) O(1)
as_ptr O(1) O(1)
to_bytes O(1) O(n)
to_bytes_with_nul O(1) O(n)
to_str O(n) O(n)
to_string_lossy O(n) O(n)
into_c_string O(1) O(n)

Here, only CStr::from_ptr has become a zero-cost function, all other methods either still have the same cost or become even slower. One particular issue is CStr::into_c_string, which was stabilized in 1.20 but without the performance warning.

In rustc alone, most use of CStr will immediately convert it to a byte-slice or string, which gives no performance advantage or disadvantage. Even worse, if we create the &CStr via CStr::from_bytes_with_nul, the length calculation cost will be doubled.

let s = CStr::from_ptr(last_error).to_bytes();

Rationale and alternatives

The main rationale of this RFC is that *CStr being fat was considered a bug. An obvious alternative is "not do this", accepting a fat *CStr as a feature. In this case, we would modify the documentation and get rid of all mentions of potential performance changes.

We currently use extern type as this is the only way to get a thin DST. Extern types currently implements none of the standard auto traits (Send, Sync, Freeze, UnwindSafe, RefUnwindSafe), while a [c_char] slice implements all of them. Currently Freeze cannot be manually implemented as it is private in libcore. Furthermore, it means whenever a new auto-trait is introduced (probably by third-party), it will need to be manually implemented for CStr. If this semantics of extern type cannot be modified, we may need to consider reviving the custom DST RFC (RFC 1524) for more control.

Unresolved questions

How to make the thin CStr implement Freeze.

7 Likes

It’s awesome that this works now! We always intended on doing this eventually, so AFAIK the libs team would all sign off on such a change as this.

1 Like

I’m excited about and in favor of this change, but we will need to do a crater run for code which is currently transmuting a CStr or otherwise observing its representation, right?

2 Likes

We generally don’t predicate changes to the layout of a struct on a crater run.

1 Like

Sure but it seems much more likely for someone to be relying on the repr of CStr than most std types because of the analogy to str and slice which have a well known and stable representation.

Compatibility with #[repr(C)] is super valuable for me. Big +1

1 Like

I do think a crater run is needed, but that would be review-detail :smile:. The change is necessarily insta-stable, so the regular beta crater run will tell us if this introduced any breakage as well.

1 Like

What does this mean for Box<CStr>? Does it remain fat?

It will be thin too, due to how extern type was implemented.


Edit for details: In the current situation as well as RFC 1524 (custom DST), the metadata is unaffected by the container type, i.e. once *CStr is thin, so are Box<CStr>, Rc<CStr>, Arc<CStr>, external::smart::Pointer<CStr> etc. If we want to make &CStr, *CStr thin but keep Box<CStr>, Rc<CStr> etc fat, that would require much more complicated changes.

@kennytm I think an interesting question here is whether Box<CStr> would stay functionally equivalent to CString as it currently is. And whether it wuold make sense to change CString in much a same way (i.e. so that it would be essentially a Box<CStr>).

1 Like

Interesting question, although CString is named like String, it does not contain any push_c_str like functions that allow it to grow or shrink, making it really equivalent to Box<CStr> in terms of functionality. However, I think that it is better to introduce CString::push_c_str() and similar methods (will need another RFC), instead of making it equivalent to Box<CStr>.

1 Like

Hm, @kennytm, does this change mean that CStr ABI is guaranteed to match *const char? That means that we can finally remove CString::new("foo").unwrap().as_ptr() footgun for good!

The original idea is by @eddyb: https://github.com/rust-lang/rfcs/pull/1642#issuecomment-276684186.

The latest instance of the footgun in the wild: https://www.reddit.com/r/rust/comments/7h4blh/missing_first_letter_when_sending_cstring_with/

1 Like

While it means Rust’s *CStr is compatible with C’s char*, I don’t think this will prevent the foot gun at all as the problem is the CString is being dropped. So even if you could write &*CString::new(...).unwrap() it is no safer than calling .as_ptr() directly.

2 Likes

I don’t think that Reddit post is an instance of that footgun. In the linked code, the str_len function takes a *mut c_char and passes it to CString::from_raw, and the CString then frees the allocation on drop. That’s a common thing to want to do in FFI, but then the JS code then passes the same pointer to ptrToString and accesses the freed memory, so the JS is as much at fault as the Rust code.

In other words, the str_len function would be fine as long as it were named str_free, and the JS code didn’t use the freed memory. I have a hard time imagining a change to the language that could prevent that bug.

1 Like

Hm, I would think it is safer: you'll get a &'a CStr which makes the compiler check that the temporary CString is alive while 'a is alive. That is, it seems to me that the first snippet bellow is UB, while the second one is fine:

printf(CString::new("hello").unwrap().as_ptr())
printf(CString::new("hello").unwrap().as_ref())

Ouch, you are absolutely right, thanks for pointing that out!

During discussions around the time CStr was introduced, the prevailing idea was that CString should not become a mutable container in parallel to String/Vec, but be convertible to and from these.

One more thing that needs to delay implementation of this. PR #46108 is going to introduce the DynSized trait, which will not be implemented for extern types, as required by RFC #1861.

The DynSized trait in the PR is currently just a marker trait, one cannot manually implement it and specify how the size and alignment can be computed (these are still handled in the compiler backend). This will break stable code that works today:

struct CStrTail {
    header: u8,
    tail: CStr, // will compile-error if CStr: !DynSized
}

So we need to be able to supply a custom DynSized implementation as a prerequisite:

impl DynSized for CStr {
    fn align_of_val(&self) -> usize { 1 }
    fn size_of_val(&self) -> usize { self.to_bytes_with_nul().len() }
}
3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.