Pre-RFC: Aarch64 Indirect Return ABI

(Most recent version on GitHub)


Summary

Introduce a new target-specific calling convention aarch64-indirect-return, which will pass the first argument as if it was an indirect result location pointer (that is, through the x8 register). This would make it possible for Rust to interoperate with C++ functions that return non-trivial C++ objects on aarch64.

Motivation

Currently, thanks to C++ ABI being very similar to C ABI on most platforms, it is possible to interoperate with C++ code in Rust via carefully-crafted #[repr(C)] types and extern "C" functions.

One tricky case to handle is functions returning non-trivial C++ objects (trivial objects have C ABI). For the purposes of ABI, C++ considers any object that has a non-trivial copy or move constructor or a non-trivial destructor to be non-trivial:

struct NonTrivialObject {
    int data = 42;
    ~NonTrivialObject();
};

NonTrivialObject get_object();

In case such an object is returned from a function, Itanium ABI requires the caller to allocate a location for the returned object to be stored in and then pass a pointer to that location to the callee. Most platforms pass this pointer as a hidden argument before all other arguments and before this, if any. Therefore, it's possible to make an ABI-compatible definition for such a function in Rust:

unsafe extern "C" {
    fn get_object(result_location: *mut NonTrivialObject);
}

aarch64, however, uses register x8 for this, which otherwise does not participate in argument passing. Because of this, it is not possible to call or implement get_object on aarch64 in Rust.

It is possible to use either C++ or assembly wrappers conforming to the extern "C" calling convention.

C++ shims add a dependency on C++ compiler and complicate the build process. Assembly shims are quite tricky to get right. And both of those approaches are quite unergonomic and become more complex once the user wants to implement such a function or call it through a function pointer.

Guide-level explanation

This ABI is only useful for interfacing with C++ on aarch64 platform.

When a non-trivial C++ object is being returned from a function, Itanium ABI mandates that the caller must pass a pointer for the target object to be constructed at as a hidden parameter before all other parameters or this. On top of that, aarch64 ABI mandates (see section "GC++ABI §3.1.3 Return Values") that the pointer parameter must be passed using Indirect Result Location Register (x8), which is otherwise unused for parameter passing.

The extern "aarch64-indirect-return" calling convention makes the compiler pass the first function parameter as if it was that hidden result location pointer, causing it to be allocated to the x8 register. It is otherwise equivalent to extern "C" calling convention.

This ABI requires the function to return () and for the first argument to be a mutable pointer.

This calling convention also has an unwinding equivalent: extern "aarch64-indirect-return-unwind".

Example Usage

For the following declaration of function get_object:

struct NonTrivialObject {
    int data = 42;
    ~NonTrivialObject();
};

NonTrivialObject get_object();

Use these rust definitions to call it:

#[repr(C)]
struct NonTrivialObject {
    data: c_int,
}

unsafe extern "aarch64-indirect-return" {
    fn get_object(result_location: *mut NonTrivialObject);
}

It is also possible to implement a function compatible with this declaration:

unsafe extern "aarch64-indirect-return" get_object(result_location: *mut NonTrivialObject) {
    unsafe {
        *result_location = NonTrivialObject {
            data: 42,
        };
    }
}

Reference-level explanation

extern "aarch64-indirect-return" calling convention has requirements that are checked by the Rust compiler:

  • The function must not be an async function or a generator
  • The function must be marked unsafe
  • The return value has to be ()
  • The first argument has to be a thin mutable pointer
  • The calling convention must only be used when targeting aarch64

The first argument of extern "aarch64-indirect-return" has to be passed through the x8 register, as-if it was an indirect result location pointer.

It can be implemented by marking the first argument to the function with an sret LLVM attribute, or using another equivalent mechanism provided by the backend.

Drawbacks

This feature does add work for alternative Rust implementations/code generators.

Due to limited applicability (C++ FFI without shims on aarch64), it is going to benefit a small fraction of Rust users.

Rationale and alternatives

Alternative: marking types as non-trivial

An alternative approach would be to introduce an attribute #[repr(non_trivial)] that you would put on types in addition to #[repr(C)] to signify that they should be passed through a pointer regardless of their size. This would take effect both in return value and argument positions, leading to function signatures closer to those of C++:

struct NonTrivialObject {
    int data = 42;
    ~NonTrivialObject();
};

NonTrivialObject get_object();
void pass_object(NonTrivialObject arg);
#[repr(C, non_trivial)]
struct NonTrivialObject {
    data: c_int,
}

unsafe extern "C" {
    fn get_object() -> NonTrivialObject;
    fn pass_object(object: NonTrivialObject);
}

This approach, however, may lead to unsoundness when the object is not trivially movable. There is no mechanism in Rust to call the move constructor on object move, but calling the above-defined bindings does require moving the objects.

The aarch64-indirect-return-based approach is explicit about the placement of the returned object and does not rely on Rust moves. This provides a way to soundly work with non-trivially moveable objects.

Alternative: a marker for an argument instead of whole function

Another alternative might be to mark the function argument with a marker that would be effectively equivalent to LLVM's sret:

#[repr(C)]
struct NonTrivialObject {
    data: c_int,
}

unsafe extern "C" {
    fn get_object(#[sret] return_storage: *mut NonTrivialObject);
}

The attribute has to be a new part of function's signature, requiring sweeping changes throughout the compiler for the benefit of a single target.

Alternative: a target-independent "C-indirect-return" calling convention

Since the sret LLVM attribute (and their equivalents in other backends) exist not only for aarch64, but for all targets, it is possible to implement this calling convention as a target-independent one.

This can, in theory, allow users to make API bindings without #[cfg(target_arch = "aarch64")] conditional blocks, since on other platforms the convention will be equivalent to extern "C".

However, as per Tracking issue for unsupported_calling_conventions (cdecl, stdcall, fastcall) · Issue #137018 · rust-lang/rust · GitHub, there seems to be a negative sentiment towards target-specific calling conventions with fallbacks on other platforms.

Prior art

bindgen supports C++ FFI to some extent. It does this by generating ABI-compatible function definitions and linking to them. Supporting this calling convention (or some other way to pass non-trivial C++ objects) is a prerequisite for this approach to work on aarch64. Although bindgen will not be able to utilize it right away, because it can't currently determine if a type is non-trivial because libclang does not expose it (Wrong ABI used for small C++ classes on Linux. · Issue #778 · rust-lang/rust-bindgen · GitHub).

cxx crate also provides C++ interop, but instead of directly linking to the C++ functions, it generates C wrappers around them. It also does not allow passing non-trivial objects by-value, instead either requiring indirection or conversion to C types.

Unresolved questions

  • Is the name aarch64-indirect-return a good fit?

Future possibilities

This feature is relatively isolated and limited in scope, so it is not expected that this feature will be extended in the future.

Maybe it should be called extern "C++" instead and doing the right thing for all architectures rather than being specific to aarch64?

Edit: You already suggested something like that.

I don't see it as a target-specific calling convention with fallback on other platforms. I see it as the calling convention that C++ would use, which just so happens to be representable with extern "C" on some targets.

3 Likes

Devils advocate: a macro (or possibly proc-macro) could automate this. Something to add to alternatives considered with an explanation as to why not.

This is very unfortunate. It makes it harder to write cross platform bindings. It would be much better to have an "indirect c++ return" ABI which would map to the appropriate ABI on x86-64 Linux, x86-64 Windows, ARM Linux/Windows/Mac etc.

I see this is mentioned in the alternatives, but as a user I disagree. If cxx generates the bindings sure (but less code means lower compile time, so not having to do logic on the target triplet would be preferable).

And if you need a macro to manage this anyway, why not have the macro generate inline asm directly.

1 Like

I like your C++ ABI suggestion, but I guess you would need some attribute to specify if a type is trivial or not. That is not easy to know on the Rust side otherwise.

1 Like

Right

That's a good idea. I actually have been trying to make a macro that would do it. It can go pretty far, thanks to recently stabilized naked fns, but it's still full of caveats and requires restricting the set of function signatures it will work with.

Honestly I wouldn't recommend this. Getting calling conventions right is hard even for rustc. And a proc-macro wouldn't benefit from ABI fixes by rustc. In addition to know the right calling convention you need to know the full layout of all arguments, which you can't get from proc-macros without forcing your users to define the structs/enums/unions that are used as arguments inside of the proc macro invocation and then manually recomputing it inside the proc-macro.

1 Like

Pretty much what I wanted to illustrate^^

Would a special pointer type Sret<T> be possible? It could have a trivial conversion to a Pin pointer. I don't know if this is more or less complicated that the special attribute for a parameter.

So the first argument to the function would then be the Sret<T>

This would allow building a struct with move constructor in a function that is called by the ffi function and prevent accidentally moving it.

You can't Pin a pointer. It doesn't implement Deref. If you mean a Pin reference however, that isn't going to work as before the function returns the Sret isn't initialized yet.

1 Like

Ah i forgot, it would habe to be a Pin<&mut MaybeUninit<T>>> which isn't actually useful

The goal of "cross platform C++ bindings" is probably still out of reach, because MSVC passes this pointer and indirect result location in different order than Itanium ABI :/. You wouldn't be able to interface with methods using same Rust function declarations with just extern "C++-indirect-return". Also, x86-targeting MSVC passes this in ecx, so you would need to use extern "thiscall", which is only available on x86.

To be cross-platform you could make a 3D cube of calling conventions answering the questions:

  • Are we calling a method (needs to pass this in ecx on x86 MSVC)
  • Are we doing an indirect return (needs to pass the result location in x8 on aarch64 + swap argument order if calling a method on MSVC)
  • Can the function unwind

8 portable C++ calling conventions in total.

Also, tomorrow we might get a new platform doing something else weirdly and we would need to add another dimension. Feels a bit risky to include all of this complexity into rustc, considering its usual stability guarantees.

If we limit ourselves to Itanium ABI-based platforms though, we will be able to get away with four calling conventions (C, C-unwind, C++-indirect-return and C++-indirect-return-unwind).

1 Like

This is fundamentally based on a hack of trying to express invisible parts of the C++ ABI in terms of the C ABI, which evidently isn't always possible. The solution doesn't even make the hack work in general, but only patches one broken case on one platform, with no guarantee that there won't be any more broken cases on other platforms.

Rust's extern "C" is for all platforms, but this solution is very narrowly targeted. It's not even aarch64-wide problem, since the solution is not appropriate for aarch64-pc-windows-msvc.

C++ also diverges from C when passing non-trivial types via function arguments. Luckily the difference happens to be expressible using C on the current platforms, but this isn't guaranteed. Imagine if also function parameters needed to be hacked this way:

extern "novelarch-arg1-is-non-trivial" fn(_: *const NonTrivialByValue);

You'd get a microsyntax or combinatoric explosion when trying to define all ABIs of all parameters and return types inside the ABI name.

And the whole concept of using C ABI for C++ ABI exposes ABI details as explicit and semantically different argument and return types in Rust.

Rust's extern "Rust" fn get_object() -> Object technically works as extern "C" fn get_object(*mut Object), but it doesn't need the return type written as an argument.

I'd prefer extern "C++" with some #[repr(C++, non_trivial)] on the struct, or at least attr on the return type:

extern "C++" {
   fn() -> NonTrivialObject;
   // or
   fn() -> #[cpp_non_trivial] NonTrivialObject;
}

This would allow one definition to work on all platforms, and wouldn't require developers to know how each compiler's C++ ABI looks like through C's lens on each platform.

3 Likes

Would you then consider the recently stabilized extern "thiscall" to be a hack?

It exists to deal with a very similar problem: to interoperate with C++ MSVC on i686 target, where member functions expect this argument to be passed in ecx register. It exists to patch one broken case of interoperating with C++ code on i686-*-windows-msvc targets.

Should this be grounds for not including support for calling such functions at all it in search of greater all-encompassing solution?

I'd prefer it too, but it won't work with non-trivially-movable objects, unless Rust gets move constructors, which, AFAIU, isn't likely to happen. To be sound, indirectly-passed objects have to have explicit pointer parameters. (This is also mentioned in alternatives section in the starting post).

It is a good goal to strive to. But maybe rustc isn't the level at which you would want to implement this?

With the proposed calling convention it would already be possible to implement a decently cross-platofrm macro providing C++ FFI, which would use #[cfg] to engage extern "C", extern "thiscall" and extern "aarch64-indirect-return" and swap arguments (for i686 MSVC) as needed:

extern_cpp! {
  fn do_something(arg: i32) -> i64;
  fn do_something_member(#[this] this: *mut AnotherObject, arg: i32) -> i64;
  fn get_object(#[sret] result_storage: *mut NonTrivialObject);
  fn get_object_member(#[sret] result_storage: *mut NonTrivialObject, #[this] this: *mut AnotherObject);
}

The hack isn't in having a calling convention different from SysV, nor a calling convention that is useful only on one platform, but in changing types of arguments in Rust (on the source code level) to match implementation details of a particular calling convention.

The hack is in using one language feature intentionally incorrectly (declaring a wrong return type) to compensate for lack of another language feature (C++ ABI). Instead of digging deeper into the hack, and adding more features for working with intentionally incorrectly declared return types, Rust should instead gain ability to understand the calling convention when the types are declared in their semantically correct straightforward way (defining what the function is doing, not how it's implemented through lens of a C ABI)

If there was a Z-- language that had an ABI almost identical to C's, except pointers had to be passed in floating point registers, then IMHO extern fn take_ptr(f64) would be a hack, and extern fn take_ptr(#[ptr] f64) would be digging deeper into the hack. OTOH extern "Zmm" fn take_ptr(*const u8) would be fine, even if codegen of all the options was identical.

Function pointers and calling conventions operate on a higher level of abstraction. Some ABIs pass structs by value or by pointer depending on struct size, but you don't conditionally define fn args as integers or pointers to explain that to Rust.

If you're proposing to change the language, you don't need to hack around the inability to change the internals. Calling non-C functions through C functions that happen to be close enough is something you'd do if you had to implement it without changing the compiler. But if the compiler implements it, it doesn't need users to provide a workaround syntax.

2 Likes

I don't think Rust adding native support for a C++ calling convention can solve this problem on its own. In Rust returning a value from a function is semantically a trivial move (memcpy). This is fundamentally incompatible with a C++ type that's not trivially movable.

You somehow need to enable the caller to pass a place to the function where the output of the function will be stored, and a pointer as first parameter is the easiest and safest way to do that.

I do agree that simply using extern "C" with the result as first parameter on platforms where that happens to be compatible with C++ is an undesirable hack.

I'd go with something like:

unsafe extern "C++-result-reference" {
    fn get_object(result_location: *mut NonTrivialObject);
}

or

unsafe extern "C++" {
    #[ResultByReference]
    fn get_object(result_location: *mut NonTrivialObject);
}

The attribute has the advantage that a single extern block can be used for both functions that return an object and functions that return void. But it means that an attribute influences the effective ABI, and I don't think Rust currently does something like that currently.

2 Likes

Rust syntax actually allows us to place macros on arguments, so fn get_object(#[ResultByReference] result_location: *mut NonTrivialObject); would work. There is no precedent of it being used for anything that isn't related to cfg-gates AFAIK though

Placement new

This problem is quite similar to placement new, so researching what approaches were proposed for that could be interesting.

One important difference is that placement new is primarily intended as an optimization, while avoiding the copy is safety critical here.

Guaranteed copy elision

Guaranteed copy elision could solve this problem as well, keeping the intuitive signature where the result is returned from the function.

  • On its own it will result in very subtle safety critical code, since it's easy to hit a case where it doesn't apply
  • Features like unmovable or unsized types could make this less unsafe (if these are added for other reasons, like better async/pinning or extern types)
  • ptr::write and MaybeUninit::new wouldn't be compatible with copy elision without further hacks, since the value is moved into it. So the common use-case of writing the result into an uninitialized location would be difficult or even impossible.
  • Implementing a function using this convention in Rust would be difficult, since there is currently no way to obtain a pointer to the result. NRVO might solve this.
  • It would need the #[repr(non_trivial)] marker OP proposed

Overall this doesn't feel like a solution that will be usable any time soon.


I prefer having it on the function. The single result parameter would need to be in a specific position anyways (always first or always last), so having the attribute on the parameter would add any flexibility.

At most I'd add a warn-by-default lint that the parameter should be called result.

1 Like

On second thought, simply adding a C++ ABI might not work, since I assume that C++ can apply this mechanism on top of different C ABIs, like C and fastcall.

Having the ResultByReference marker on function without a marker on the type isn't compatible with generic return types, since the type determines if the data is passed by reference or by value. Luckily those are almost never used in bindings.

I'm not sure that is correct. As I understand it, the Linux kernel use cases is about correctness as they need self-referential data structures on the stack that need to be initialized as self referential.