Possibility of making &str FFI safe

I apologise if this is a naive question, but what is the capability/is it a good idea/anything else of making &str FFI safe.

I am aware it is not actually a struct, but essentially making &str

#[repr(C)]
struct str {
    pointer: usize,
    len: usize,
}

work in this way so that it can be reliably passed through FFI to an equivalent C struct. I am aware you can just turn the string into a C string, then map it in C, but I would rather avoid null terminated strings.

Is this thought silly? Does it feel pointless? Would it be a nightmare to implement? Would it also be worth doing for array slices? Would array slices come for free?

IIRC strings in Rust are UTF-8 and the char type isn't necessarily a byte, where the encoding of C strings is not specified and a char is always a byte. There's a lot more that str does under the hood than provide a length-specified string.

You can already get a *const u8 out of an &str, which is an FFI-compatible type, so I don’t think the hypothetical freedom of implementation for C’s char is really an obstacle here. In C23 there’s a named type char8_t for “UTF-8 code unit”, but it’s just defined as a standard alias for unsigned char. That would be close enough if we wanted to make &str FFI-compatible, as only a slight step beyond making &[u8] FFI-compatible. (I agree that plain char, which might be signed, doesn’t quite match up, however.)

3 Likes

Yep, yep. That makes a good bit of sense. Thank you kindly

Using usize for the pointer is wrong, as that looses provenance. It has to be a pointer / reference type.

I don't believe the field order in a thick pointer in Rust is guaranteed stable, though I would be surprised if it ever changes.

2 Likes

Another consideration: If you ever want to use the str in C, it probably needs to be null-terminated

Last I checked it is not, though the standard library does rely on it on a few places.

1 Like

https://github.com/rust-lang/rfcs/pull/3775

7 Likes

The standard library can generally assume this kind of stuff because it's shipped together with the compiler, so if the compiler ever changed the representation the stdlib would be updated at the same time.

1 Like

I'm aware, but ironically enough the locations where I've encountered it have no local checks to ensure the layout is correct. I can't imagine it wouldn't fail CI, but it's not the best case scenario where it would merely fail to compile.

I don't think Rust should make str FFI safe, because it would be a footgun for FFI novices, making them think just using str in an extrrn C function would magically convert it into/from a nul-terminated string instead of realizing they would need that struct in the C side.

I guess it may be fine to have str as FFI safe but with a lint that explains this possible pitfall enabled by default.

AFAIK Rust generally avoids having lints like that that are basically "yes I have read the rules" lints rather than actual problems in your code that would be applicable no matter your experience level.

The general idea is that you shouldn't need to allow any lints every time you use some feature (unless using the feature is basically always a bug like std::mem::uninitialized() is)

Not inherently. As long as you know it isn't, many functions can work with it. printf can print it with %.*s for instance.

2 Likes

Isn't this subject to ABI rules? For example, a &str argument may be passed in a wide register, as a pointer, two registers, or passed on the stack depending on things like argument position and such which may be different than a bare sequence of int and const char* arguments.

Also, for the specific example given, %.*s requires the precision be an int, so truncation may occur.

Part of this effort would be deciding on a lowering for &str specifically in extern "C" functions*, and not just embedded in structs, without necessarily affecting the representation in Rust-ABI functions. If we wanted to say “this is always passed the same way as struct { char8_t *start; size_t length; } would be”, we could do that (and by “we” I mean “the FFI working group”).

* and C-unwind, and any other ABI expected to match up with some kind of C or C++ header declaration.