FFI types with unsized arrays


#1

Many language features that I need for winapi are now either unstable (unions, alignment) or have an accepted RFC (packing). However, there are a few things that still need to be taken care of. The big thing that I still really need is a way to express a struct whose last field is an unsized array, as well as unions whose variants are such structs.

To give an example of such a type that I am having difficulty representing in Rust:

typedef struct _CM_NOTIFY_EVENT_DATA {
    CM_NOTIFY_FILTER_TYPE    FilterType;
    DWORD                    Reserved;
    union {
        struct {
            GUID    ClassGuid;
            WCHAR   SymbolicLink[ANYSIZE_ARRAY];
        } DeviceInterface;
        struct {
            GUID    EventGuid;
            LONG    NameOffset;
            DWORD   DataSize;
            BYTE    Data[ANYSIZE_ARRAY];
        } DeviceHandle;
        struct {
            WCHAR   InstanceId[ANYSIZE_ARRAY];
        } DeviceInstance;
    } u;
} CM_NOTIFY_EVENT_DATA, *PCM_NOTIFY_EVENT_DATA;

Each of those arrays with a size of ANYSIZE_ARRAY is really an unsized array whose size is typically known only at runtime. There are several things I need from a language feature designed to solve this problem:

  1. The syntax to define such structs and make it clear that the last field is an unsized array.
  2. A way to heap allocate such types given a length for the array, and also have them automatically freed sanely.
  • (Optional) A way to allocate such types on the stack given a compile time length for the array.
  1. Is !Sized so that you can’t accidentally pass it around by value or do things like mem::swap.
  2. Has a thin pointer. This is essential for FFI purposes.
  3. Given a pointer to such a type from a C api as well as the length of the array (often from a field in the type itself), a way to obtain a (possibly mutable) slice to the array. The API to do so will almost definitely have to be unsafe, but that’s fine.
  4. Given the length of the array, a way to calculate the total size of the object, and vice versa. Some FFI apis ask for or provide the size of the object rather than the array length.
  • Assuming you have offsetof you can perform the calculations like so: object_size = offsetof(array) + array_length * sizeof(array_element) and array_length = (object_size - offsetof(array)) / sizeof(array_element).
  1. Needs to work even when the struct is a variant in a union, or the last field of another struct (or any nested combination thereof). Makes the above point a bit harder.

Please, if you can come up with a way to solve this problem, I will be very grateful. This one hole in the language makes a huge amount of FFI vastly more painful than it has to be.

Remember, this needs to work with FFI, so the layout of the structs cannot be changed, and the pointers must be thin. In addition, some structs will know their size, some will know the length of their arrays, and some will know neither with that information only available separately!


#2

You can do

struct DeviceHandle {
    event_guid: i32,
    name_offset: i64,
    data_size: i32,
    data: [u8]
}

but it looks like this can’t go in a union right now. This works like any other unsized object, in that a pointer to it is a fat pointer containing its length. IIRC they are possibly a bit broken

edit: you can make a type like this as so:

#[derive(Debug)]
struct DeviceInterface {
    class_guid: u16,
    symbolic_link: [u16]
}

fn main() {
    let x: &DeviceInterface;
    let y = [1u16, 2, 3, 4, 5, 6];
    let len = 5usize;
    let y = &y as *const [u16; 6];
    x = unsafe {std::mem::transmute((y, len))};
    println!("{:?}", x);
}

So I would say:

  1. The syntax is already here
  2. Use a box to allocate some bytes & do some transmuting as above
  3. This struct is !Sized
  4. You can always cast an object to a thin pointer by doing &x as *const _ as *const u8.
  5. You could do transmute on (*const u8, len)
  6. mem::size_of_val
  7. Doesn’t work with unions yet

#3

@retep998 sounds like you need extern type / OpaqueData foremost? That will give you the bare minimum needed to do the rest with unsafe hacks?

Second, we should allow unsized last fields (DSTs or OpaqueData) in enums and unsized unions.

Third, use generics so you can swap the OpaqueData for array types. (Const generics should help too.) That should take care of stack allocation.

Fourth, box syntax with array initializers ought to allow you to heap allocate with a fat pointer; let’s fix this if it doesn’t. Through away the size after, though deallocation won’t be nice.

Finally, the holy grail here is custom DSTs that allow you to use existing DSTs as final fields—Rust will generate the code to get size of contained DST fields by subtracting size of sized fields from whatever the size_of_val method returns.

(You would still probably generics in that last case to optimize for static stack allocation, but some VLA thing would be nice too.)


As an aside, virtually all of this is needed for things like network packets too—the needs are broad.

Also, and I think this is a really cricuial point, while these features may be advanced for library authors, they lend themselves to far and away the easiest, most intuitive, and safest interfaces downstream. I really think then we should prioritize, rather than deprioritize, them as part of our ergonomics push.


#4

Isn’t this already possible? Unless I’m misunderstanding you

#[derive(Debug)]
struct Dst {
    len: u8,
    data: [u8]
}

#[derive(Debug)]
struct Container {
    tag: u8,
    dst: Dst
}

fn main() {
    let x: &Container;
    let bytes = [1u8, 2, 3, 4, 5, 6];
    x = unsafe {std::mem::transmute((&bytes, 4usize))};
    println!("x = {:?}", x);
    println!("sizeof(x) = {}", std::mem::size_of_val(x));
    println!("sizeof(x.dst) = {}", std::mem::size_of_val(&x.dst));
    println!("sizeof(x.dst.data) = {}", std::mem::size_of_val(&x.dst.data));
}

The only thing missing is use with unions

edit: In safe rust, without transmute:

#[derive(Debug)]
struct Dst<T: ?Sized> {
    len: u8,
    data: T
}

#[derive(Debug)]
struct Container<T: ?Sized> {
    tag: u8,
    dst: Dst<T>
}

fn main() {
    let x = Container { tag: 1, dst: Dst { len: 4, data: [2, 3, 5, 6] } };
    let x: &Container<[u8]> = &x;
    println!("x = {:?}", x);
    println!("sizeof(x) = {}", std::mem::size_of_val(x));
    println!("sizeof(x.dst) = {}", std::mem::size_of_val(&x.dst));
    println!("sizeof(x.dst.data) = {}", std::mem::size_of_val(&x.dst.data));
}

/* Prints:
x = Container { tag: 1, dst: Dst { len: 4, data: [2, 3, 5, 6] } }
sizeof(x) = 6
sizeof(x.dst) = 5
sizeof(x.dst.data) = 4
*/

#5

@djzin Your example has a [u8] as the last field which currently causes the type to be a DST. The problem with DSTs is that they require fat pointers and are thus ABI incompatible with C FFI.

Having to make sure I never use pointers to that type, but always a pointer to some other fake alternate type just to get a thin pointer, along with casting, and making sure every single user of winapi remembers to do the same, is a friggin massive footgun and a huge annoyance.

Also, imagine I am given a thin pointer by an external API which I need to convert to an actual pointer to the type, and let’s say the size of the object is stored inside the object itself (this happens a lot in winapi). How would I construct the fat pointer to the object? I need the size to create a pointer to the object, but I can’t get the size until I have a pointer that I can access the size through!

Current DSTs with fat pointers simply do not work for FFI purposes

@Ericson2314 One problem with custom DSTs is that sometimes the object doesn’t know its own size. If I get a thin pointer to such an object and I get the size of the object separately, how would stuff like size_of_val work?


#6

The dynamic part must always be the tail, right? So I would first create a pointer where that part is assumed 0, then read the size and create the correctly-sized fat pointer.

(if there were some kind of API like unsafe fn dst_from_raw(ptr, size).)


#7

Ah, then for this type you don’t need custom DSTs. When given the size, you can swap the OpaqueData argument for [u8] and (sadly manually) build the fat pointer by hand. So you will want to thin-to-fat functions, based on how the length is given, and one fat-to-thin.

Rust should also support it’s existing DSTs-by-final-field better, but at least once you have the fat pointer it’s easy to get the array length from it the obvious way.