Rust char and C char32_t ABI compatibility


#1

In reference to @Gankro’s tweet: Currently, the ABI of Rust char is not documented.

What steps are needed in order to document that Rust char is ABI-compatible with C char32_t (for values that are valid for Rust’s char)?

This should be pretty non-controversial considering that:

  • Rust char transmutes to and from u32 indicating that it has to have the same representation as u32.
  • C11 and C17 define char32_t as the same type as uint_least32_t, so char32_t clearly has unsigned behavior when passed in registers on 64-bit platforms.
  • C++ defines char32_t is a distinct type that has the same size, signedness and alignment as uint_least32_t.
  • uint_least_t can have a width other than 32 bits only if the C implementation doesn’t provide 32-bit-wide integers. Rust is already incompatible with such (theoretical AFAIK) C implementations. Therefore, in C implementations that Rust is compatible with, uint_least32_t is uint32_t.
  • Rust u32 is ABI-compatible with uint32_t.

#2

It looks like you’ve done the legwork that Gankro refers to in their tweet. Rust’s char type is defined as having the same representation as u32, except that values outside of the unicode ranges are undefined behavior (that is, we’ve already guaranteed that aspect of our ABI). How does the invalid values caveat impact the ABI compatibility? I know that for other types in Gankro’s chart (&T and &mut T) the listed C/C++ type can have values that would be UB for the Rust type.


#3

I’d expect invalid values to be as UB as transmuting u32 to char with invalid value.