Bindgen and C++

vadimcn · June 24, 2017, 4:33am

So yesterday I tried using bindgen on a C++ library, and as my luck would have it, immediately ran into a bug. Wondering if @lang_subteam has any opinions on that.

TL;DR: if a C++ class struct has non-trivial copy constructor or a destructor, it must be passed to callees by-pointer, even if it would otherwise fit into a register. Rust does not know about this distinction, so boom…

eddyb · June 24, 2017, 9:58am

I would probably be fine with a #[repr] that’s the opposite of #[repr(transparent)], but without affecting how you can use the value, only the ABI.

zackw · June 24, 2017, 2:20pm

There is going to be more weird shit like this. I think independent toggles for each ABI quirk, defined in terms of the assembly-level effects rather than the language-level rules that trigger them, are going to be both easier to work with and easier to implement. So for this case, keep using #[repr(C)] for the data layout, but add another annotation that means exactly “pass this by invisible reference”, in those words:

#[repr(C)]
#[pass_by(invisible_reference)]
#[derive(Debug)]
pub struct Bar {
    pub data: usize,
}

vadimcn · June 24, 2017, 9:35pm

Okay, sounds like people are mostly in favor of a new attribute (as opposed to, say, overloading the Copy trait). Now we just need to bikeshed what it should be called

IMO, there’s two ways to go about this:

Emphasize the FFI by-pointer semantics. Possible names: #[repr(pass_by_pointer)] , #[pass_by_pointer], #[pass_by(pointer)], etc.
Pros: Obvious about what it does.
Cons: Possibly too specific. Are there other ways an ABI could deal with such objects?
Emphasize type properties. Possible names: #[repr(non_pod)], #[repr(significant_address)], #[non_pod], #[significant_address], etc.
Pros: May be useful in other contexts (though I am not sure right now what those would be).
Cons: Less obvious about what it does. One would need to consult the reference to find out what specific FFI effects it has.

kornel · June 24, 2017, 9:59pm

#[repr(C++)]?

I prefer annotations that describe what is the goal, rather than how to achieve it. I would prefer to not need to know the low-level quirks C++ has in order to use Rust properly.

vadimcn · June 24, 2017, 10:23pm

Not every C++ type (not every class, even) has this FFI semantics, so it would be highly misleading.

kornel · June 24, 2017, 11:16pm

(previous message deleted)

OK, I didn’t realise the compiler can’t figure out it’s a POD struct.

vadimcn · June 24, 2017, 11:28pm

My point was that for this particular quirk, #[repr(C++)] does not provide enough information for the compiler to make the “in-register vs by-pointer” decision. It needs to know whether the corresponding C++ type is POD or not, which can be only determined by looking at the original C++ header.

parched · June 25, 2017, 12:20pm

I think we might be getting ahead of ourselves here, there’s no guarantee that any of the C++ calling convention is compatible the C calling convention. So really I think rust should add extern "C++" before adding any attributes to designate what class of C++ type a struct is.

zackw · June 25, 2017, 3:37pm

I think this is infeasible and also the wrong way of thinking about FFI.

For something like #[repr(C++)] to be feasible, Rust's type system would have to be able to represent every detail of C++'s type system. That is not only not a reasonable design goal for the language, it's not even a desirable design goal; we actively want to be able to look at a C++ type-system feature and say "no, that is a bad feature and we are not copying it."

#[repr(C)] is feasible only because C's type system is simple enough that it does make sense to clone every detail of it, and you'll note that there are still lacunae, e.g. last I checked transparent unions of structures were still not available, making it impossible to work with certain system APIs in Rust.

ABI-level annotations like the proposed #[pass_by(invisible_reference)] are feasible for the same reason #[repr(C)] is feasible: the possibility space is simple and orthogonal. There are only so many ways to pass arguments around, and only so many ways to lay out aggregate types in memory. (I would like to see Rust grow something like Ada's representation clauses, in the long term.) But they're also the right way to think about FFI, because they give you direct control over the properties that actually matter for interop. If you observe that

struct Bar {
    size_t data;
    ~Bar() { data = 0; }
};

needs to be passed by invisible reference, you can just toggle that property on the Rust shim structure; you don't have to know why pass-by-value is unwanted.

It's true that low-level annotations don't directly solve the problem of FFI interaction with C++. Something, or someone, has to track all the details of the C++ ABI and work out which annotations to apply to which shim structures. To some extent my strategy just moves the complexity from rustc to bindgen, but I'd argue that bindgen is the right place for it.

retep998 · June 25, 2017, 9:04pm

How am I supposed to know that a type should be pass by pointer instead of pass by value? What if on some platform non-POD classes are still passed by value? What about when platforms disagree on what POD is? Windows for example goes by the C++03 rules of POD to determine the calling convention even when you’re using cutting edge C++17. Does the POD status of a given type affect its ABI elsewhere or only whether it is passed by value in functions? Does that apply to it both being passed in as a parameter and also being returned? What about the position of the parameter in the function signature?

vadimcn · June 25, 2017, 9:50pm

By reading the ABI spec. In most cases, though, this would be bindgen's job to figure this out.

Unlikely, because non-trivial C++ constructor might stash a pointer to the object in some global list (which, as I understand, is the whole reason for this exception in ABI rules).
However, possible ABI differences between targets is why I am leaning towards the #[significant_address] attribute. The target would be free to decide how to deal with those.

Yep, so maybe using C++'isms like "POD" is not the best idea.

Possibly. Though, like I said, I can't think of other uses right now.

I think the answer is "yes".

This is determined by ABI rules for the particular platform.

zackw · June 25, 2017, 9:52pm

Well, someone has to know, and it's probably easier to keep bindgen up to date with all of the platform variations you describe than rustc proper. And it's definitely easier to deal with a case where the toolchain gets it wrong, if the wrongness is manifest in (possibly generated) source code that you can edit.

vadimcn · June 25, 2017, 9:55pm

Just found this document by Agner Fog. Looks like passing complex objects by-pointer is pretty much universal (section 7.1).

vadimcn · June 26, 2017, 7:41pm

Of course, we could put bindgen in charge of transforming function signature in the presence of non-pod objects. But this would be a breaking change for bindgen.

system · March 25, 2019, 8:28am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Random thought: #[repr(C++)]? libs	8	1974	March 25, 2019
Creating 1-ZSTs guaranteed to have same extern "C" ABI as () Unsafe Code Guidelines	18	1192	August 31, 2023
Idea: Automatic marker traits for repr(C) and friends language design	2	1073	March 25, 2019
Can we pass `Copy` values by immutable reference? compiler	38	2975	April 13, 2023
Better C++ interoperability internals	30	26621	March 25, 2019

Bindgen and C++

Related topics