Bindgen and C++


#1

So yesterday I tried using bindgen on a C++ library, and as my luck would have it, immediately ran into a bug. Wondering if @lang_subteam has any opinions on that.

TL;DR: if a C++ class struct has non-trivial copy constructor or a destructor, it must be passed to callees by-pointer, even if it would otherwise fit into a register. Rust does not know about this distinction, so boom…


#2

I would probably be fine with a #[repr] that’s the opposite of #[repr(transparent)], but without affecting how you can use the value, only the ABI.


#3

There is going to be more weird shit like this. I think independent toggles for each ABI quirk, defined in terms of the assembly-level effects rather than the language-level rules that trigger them, are going to be both easier to work with and easier to implement. So for this case, keep using #[repr(C)] for the data layout, but add another annotation that means exactly “pass this by invisible reference”, in those words:

#[repr(C)]
#[pass_by(invisible_reference)]
#[derive(Debug)]
pub struct Bar {
    pub data: usize,
}

#4

Okay, sounds like people are mostly in favor of a new attribute (as opposed to, say, overloading the Copy trait). Now we just need to bikeshed what it should be called :smile:

IMO, there’s two ways to go about this:

  • Emphasize the FFI by-pointer semantics. Possible names: #[repr(pass_by_pointer)] , #[pass_by_pointer], #[pass_by(pointer)], etc.

  • Pros: Obvious about what it does.

  • Cons: Possibly too specific. Are there other ways an ABI could deal with such objects?

  • Emphasize type properties. Possible names: #[repr(non_pod)], #[repr(significant_address)], #[non_pod], #[significant_address], etc.

  • Pros: May be useful in other contexts (though I am not sure right now what those would be).

  • Cons: Less obvious about what it does. One would need to consult the reference to find out what specific FFI effects it has.


#5

#[repr(C++)]?

I prefer annotations that describe what is the goal, rather than how to achieve it. I would prefer to not need to know the low-level quirks C++ has in order to use Rust properly.


#6

Not every C++ type (not every class, even) has this FFI semantics, so it would be highly misleading.


#7

(previous message deleted)

OK, I didn’t realise the compiler can’t figure out it’s a POD struct.


#8

My point was that for this particular quirk, #[repr(C++)] does not provide enough information for the compiler to make the “in-register vs by-pointer” decision. It needs to know whether the corresponding C++ type is POD or not, which can be only determined by looking at the original C++ header.


#9

I think we might be getting ahead of ourselves here, there’s no guarantee that any of the C++ calling convention is compatible the C calling convention. So really I think rust should add extern "C++" before adding any attributes to designate what class of C++ type a struct is.


#10

I think this is infeasible and also the wrong way of thinking about FFI.

For something like #[repr(C++)] to be feasible, Rust’s type system would have to be able to represent every detail of C++'s type system. That is not only not a reasonable design goal for the language, it’s not even a desirable design goal; we actively want to be able to look at a C++ type-system feature and say “no, that is a bad feature and we are not copying it.”

#[repr(C)] is feasible only because C’s type system is simple enough that it does make sense to clone every detail of it, and you’ll note that there are still lacunae, e.g. last I checked transparent unions of structures were still not available, making it impossible to work with certain system APIs in Rust.

ABI-level annotations like the proposed #[pass_by(invisible_reference)] are feasible for the same reason #[repr(C)] is feasible: the possibility space is simple and orthogonal. There are only so many ways to pass arguments around, and only so many ways to lay out aggregate types in memory. (I would like to see Rust grow something like Ada’s representation clauses, in the long term.) But they’re also the right way to think about FFI, because they give you direct control over the properties that actually matter for interop. If you observe that

struct Bar {
    size_t data;
    ~Bar() { data = 0; }
};

needs to be passed by invisible reference, you can just toggle that property on the Rust shim structure; you don’t have to know why pass-by-value is unwanted.

It’s true that low-level annotations don’t directly solve the problem of FFI interaction with C++. Something, or someone, has to track all the details of the C++ ABI and work out which annotations to apply to which shim structures. To some extent my strategy just moves the complexity from rustc to bindgen, but I’d argue that bindgen is the right place for it.


#11

How am I supposed to know that a type should be pass by pointer instead of pass by value? What if on some platform non-POD classes are still passed by value? What about when platforms disagree on what POD is? Windows for example goes by the C++03 rules of POD to determine the calling convention even when you’re using cutting edge C++17. Does the POD status of a given type affect its ABI elsewhere or only whether it is passed by value in functions? Does that apply to it both being passed in as a parameter and also being returned? What about the position of the parameter in the function signature?


#12

By reading the ABI spec. In most cases, though, this would be bindgen’s job to figure this out.

Unlikely, because non-trivial C++ constructor might stash a pointer to the object in some global list (which, as I understand, is the whole reason for this exception in ABI rules).
However, possible ABI differences between targets is why I am leaning towards the #[significant_address] attribute. The target would be free to decide how to deal with those.

Yep, so maybe using C++'isms like “POD” is not the best idea.

Possibly. Though, like I said, I can’t think of other uses right now.

I think the answer is “yes”.

This is determined by ABI rules for the particular platform.


#13

Well, someone has to know, and it’s probably easier to keep bindgen up to date with all of the platform variations you describe than rustc proper. And it’s definitely easier to deal with a case where the toolchain gets it wrong, if the wrongness is manifest in (possibly generated) source code that you can edit.


#14

Just found this document by Agner Fog. Looks like passing complex objects by-pointer is pretty much universal (section 7.1).


#15

Of course, we could put bindgen in charge of transforming function signature in the presence of non-pod objects. But this would be a breaking change for bindgen.