Better C++ interoperability


#1

Hi there!

I’d like to provoke some discussion about what can be done for better interoperability between Rust and C++, especially when templates are involved. I think everybody around here is going to agree that easy Rust/C++ interop would be a boon for Rust adoption among the current C++ users.

State of the art

To recap, I know of three projects that currently offer some level of C++ interop:

  • rust-bindgen can generate Rust FFI declarations for C functions and structures based on a C header. No direct C++ support, however one can manually create C wrappers around C++ API.
  • rust-bindgen/sm-hacks adds some support for C++ classes.
  • rust-cpp allows to insert C++ statements inline in the Rust code.

[In my reckoning, the latter is the more promising approach, since it does not involve trying to express in Rust some of the C++ features that do not have direct Rust equivalents (constructors, overloading, inheritance, etc). The opposite, i.e. expressing Rust constructs in C++, seems much easier.]

Sadly, none of the above allow using C++ templates directly.

Well, what can be done? The ideas I’ve seen to date fall roughly in two categories:

Early (explicit) specialization

The idea here is to require developer to specify all template specializations upfront. The code generator can then create wrappers for all of the requested specializations.
Optionally, such instantiations can be bound together, via a common trait (in a manner similar to this), so that at the Rust level users would see a semblance of a generic type, rather than N distinct types.

The downside is, of course, that the set of specializations is closed. If author of the Rust crate wrapping that C++ library had neglected to provide a specialization for your favorite type, well, tough luck…

Late template specialization

The other end of the spectrum is to delay wrapper generation until after type inference and codegen, so that we’d know exactly what needs to be instantiated:

  1. Allow foreign functions to be generic.
  2. Monomorphize FFI fn’s by generating a new extern symbol for each type parameter combination encountered.
  3. After all Rust codegen is done, generate C wrappers for each FFI monomorphization encountered in step 2.

[If anybody is curious how this looks in practice, I actually I went ahead and implemented the above (in a somewhat hacky way) in rustc and in rust-cpp.]

The biggest issue with this seems to be the “infectiousness” of generic FFI: if a generic function, that internally depends on a generic FFI API is exported from the current crate, downstream crates will be able to cause new monomorphizations of the FFI stub to be created, and thus will become dependent on the codegen plugin, the C++ compiler, as well as the C++ library all being around to generate code for the new monomorphizations.

So…
Thoughts/comments/suggestions?
How many people would consider early specialization sufficient for most purposes, and how many think that we need to go the whole way?


#2

To be honest. I think that Rust/C++ interop is impossible as there is no name mangling standard for C++. Also there is no structure that could simulate C++ classes.

D has C++ interop and I heard that they have really hard time managing that. For me there is too many drawbacks in C++ interop to consider it as “nice feature”.


#3

Theoretically this is true, but in practice each platform has de-facto standard mangling or else C++ shared libraries would be impossible to manage (eg: on Linux, everyone has to do what g++ does).


#4

This is the case in theory - every implementation can have its own ABI - but in practice, there are essentially only two ABIs that disproportionately matter (as far as I’m aware): the Itanium ABI and MSVC. Kind of like x86 and ARM in ISAs. The Windows world is standardized on the MSVC ABI, and essentially the whole rest of the world - *nix - on the Itanium ABI. So from a 10,000-meter view if you support those two, you’ve got most of everything covered, though I’m sure there’s a lot of subtleties. And you can delegate to Clang for that support. (Someone correct me if this is inaccurate, but this is the impression I gained the last time I looked into it.)


#5

You are correct, but this is not a problem. Because C++ libraries consist in large part of uncompiled code (headers), sometimes 100% of them, the involvement of a C++ compiler in Rust - C++ interop is a given. We can rely on it to deal with mangling and ABI for us.

Consider for example, how rust-cpp approaches this:
The syntax extension gathers all inline C++ snippets and puts them in a .cpp file, wrapping each in a function with a “C” calling convention. On the Rust side, it generates FFI stubs matching these wrappers. The .cpp file gets passed through the system C++ compiler, the output is linked with LLVM output.
If the C++ compiler on hand happens to be a compatible version of Clang, we could even ask it to generate LLVM IR instead of the object file, then pass this IR module to LLVM along with Rust’s IR and thus have a cross-language inlining.

IMO, the interesting question at this point is "How do we deal with C++ templates?"
Edit: I am implicitly assuming that we’d want templates to be mapped to Rust generics.


#6

@vadimcn I was looking at your fork earlier today, and I think you’re trying a solution for code generation with specialized generics similar to what I had in mind (https://gist.github.com/mystor/7df6fa1b7e81c6c45c8e is my super-duper early with lots of things to be ironed out pre-pre-RFC with the ideas which I think you partially implemented - @eddyb was also talking with me about it before). I also have concerns about the infectious nature of the plugin requirement, and I don’t feel super comfortable with it.

I think that a monomorphic rust-cpp style plugin (though much more stable and less buggy around edge cases like generics - namely if it touches generics it should barf and reject the type - passing an opaque reference to C++ - right now it probably does Bad Things™), has its place, especially for establishing rust <-> c++ interop in existing code, but getting templates right is a hard problem. It definitely isn’t solvable without some type of infectious specialization :-/

Finally, with regard to getting clang to generate IR, and performing cross-language linking, I was originally planning to do that (in fact, I was planning to link rust-cpp directly against clang, so that I could guarantee that the backing llvm versions matched between rustc and clang as part of the build process), but it would mean some seriously crappy build times, and require the user to have a rust clone to ensure that I had the right llvm version etc. so I didn’t. Maybe at some point in the future that optimization could occur, but I’m in no rush to implement it anymore.


#7

Would you care to elaborate, what’s so hard about templates from your POV?

As I mentioned, infectious-ness does worry me, but if that’s what it takes to have best C++ interop in class, maybe it’s worth it. I’d love to hear what the core team thinks of this.

Re IR: I view IR generation as an advanced feature, which you would use only when performance is paramount, kinda like -C lto is now. Since this would be an advanced feature, IMO it isn’t super-important to make it easy to use.


#8

@mystor: btw, for the “early specialization” option I was imagining something like this


#9

I’m not concerned about need for a C++ compiler at all. A C++ compiler is not an unreasonable thing to install for developers who are building native code and were able to install Rustc.

Everybody who has some C++ code to integrate with Rust is going to have a C++ toolkit already, and Cargo packages that wrap C++ libraries via other means need a C++ compiler anyway.

If Rust could share LLVM parts with Clang’s cpp that would be even better.


#10

C++ calling is useful thing, which I need too, but now I use ths following ways to interop:

  1. Wrap C++ with C ABI (some libs already have it)
  2. Rewrite lib in Rust (if it’s small, but I feel I can be more brave with this way)
  3. Connect interop items by popular IPC (TCP/IP, Pipes, etc.)

I don’t think it’s good idea to invest time to solve it, C++ has a wild ABI that reminds me about COM, DCOM, CORBA and others when ABI used as rich interoperability technology which has been killed by network. It draws versioning hell, which a little deeper than mangling trouble.


#11

The only thing that I really want to see in Rust compiler is it’s abillity to assembly and link just as a GCC and Clang does. This would be really helpful. But C++ interop is something that will IMO only bring gods of Chaos into this peaceful language of ours.


#12

My own personal feelings hare that I’m all for boosting Rust’s interop with other languages, systems, runtimes, etc, but also keeping in mind the impact on the compiler and language itself. For example developing this as a codegen plugin seems like a great idea as it allows fast iteration outside of the compiler while also keeping a clear list of what’s needed in the compiler to support this kind of plugin.

I don’t have too many opinions on early vs late specialization, but I would hazard a guess and say that the “infectious nature” may not matter too much given Rust’s conventional compilation model. If the standard library required this, for example, that would be a non-starter because that binary is being distributed all over and would impose extra requirements. Most builds are through cargo, however, where binaries are never shipped anywhere and imposing requirements on a downstream crate basically just means that in a few seconds you’ll be required to run the same C++ compiler again. Basically because compiles are relatively self-contained I’d guess that it may not be too much of an issue.

On a related note, the idea of cross-language inlining has definitely been brought up before and is certainly a desired topic! I think Servo wanted it awhile ago for small function calls into SpiderMonkey and we’ve wanted it in the past for small calls into LLVM as well. If that kind of support could be bolstered (or brought into existence) as part of this work, that’d be awesome!


#13

I suppose a lot can be learned from SWIG experience: http://www.swig.org/Doc3.0/SWIGDocumentation.html#SWIGPlus

Because of its complexity and the fact that C++ can be difficult to integrate with itself let alone other languages, SWIG only provides support for a subset of C++ features. Fortunately, this is now a rather large subset.


#14

Hmm. So, I agree that improving our C++ interop story would be great. The hard part is figuring out just what set of features etc we can support and which we can’t. I had been thinking more of supporting some means I had hoped at some point to read more into what D does – I believe they have defined a subset of C++ where interop works pretty smoothly, and perhaps SWIG would be a good place to look as well.

As far as supporting templates go, I don’t see any way around the “infectious plugin” problem, except of course baking it into rustc (the ultimate infectious plugin). It’s worth keeping in mind that the plugin is only infectious if you fail to encapsulate the C++ library, as well. Hard to say what will be the common case.

I’m curious to dig into this plugin a bit more. This is a good example of a plugin that goes beyond syntax extensions in terms of its requirements. What does it need precisely from rustc? It clearly cannot run as a pre-pass, unless you declare in your source a kind of list of all the instantiations you require. We certainly could just standardize a means for rustc to inform the plugin of what is needed. Alternatively, I wonder though if it could run without deep integration with rustc as a post-pass by just scraping off unresolved symbols from a resulting artifact or something like that. I guess I should look at your example prototypes.


#15

There is kind of third way: specialize to a trait type, possibly with a bit of dynamic glue like Any.

This would be similar to how Vala handles its generics. In Vala a generic gets a type descriptor that has a pointer to ref/copy function (corresponding to Rust’s Clone::clone) and unref/delete function (corresponding to Rust’s Drop::drop) and pointers to any method functions the template refers to. And values of the parameter type are handled by generic void *. Well, that is pretty much the same thing as Rust’s boxed trait type.

While C++ templates can generate very different code for different substitutions, in the end I’d expect for most of them instantiating them once for such dynamic type that would handle the rest with dynamic dispatch should be possible. It would not be as efficient as specific instantiations, but it would remain generic without requiring to compile more and more C++ code.

Also doing similar dynamic monomorphisation of Rust interfaces seems like a possible way to bind them to other languages that don’t have generics (like python, perl, ruby, java etc.), which would allow Rust to act as kind of common middle layer for combining with multiple other languages, which I think might be very useful.


#16

Yes, and I’d love to hear whether the community thinks this problem is worth tackling. For now I am assuming it is.

My prototype is not entirely implementable with plugins, sadly. It still required a change of the language to allow foreign functions to be generic. The other part is a post-trans plugin that enumerates all generated monomorphizations of foreign generics and produces the corresponding C++ instantiations for them.

Unfortunately, that is not enough. Even if you embed the C++ compiler inself into rustc, in order to generate additional instantiations, you’ll need [at least the the headers of] the original C++ library. I am afraid the latter requirement is not a reasonable one.

My current thinking is that the following compromise might be acceptable:

  • The “wrapper” crate (i.e. the original Rust library wrapping a C++ library) should be able to pre-generate a number of C++ template instantiations as the author sees fit.
  • Downstream crates can use this crate without incurring any extra dependencies as long as they stay within this pre-generated subset of types. Preferably, this should be enforced by the type system, for example via trait implementations for the allowed combinations of types (as I tried to illustrate here).
  • Downstream crates may request new instantiations to be generated, but in this case they opt in to being dependent on the plugin, the C++ compiler and the C++ library.

Example: let’s say you set out to create a wrapper around std::map;
So you go on to define a generic Rust wrapper type, CppMap<K,V>, and use the cpp! macro in implementations of its methods to call out to std::map.

You also specify that the underlying template should be pre-instantiated for (K,V) in (i32,i32), (String, String), (i32, String) … and so on.
Downstream crates may freely use CppMap with those combinations of type parameters. To make sure of that, there is a trait CppMapParams, which is implemented only for those type combinations, and CppMap is constrained on implementations of this trait.

Later, somebody else creates a new type Foo in their crate, and really would like to use it as a key in CppMap. So they reference the “cpp” plugin from their crate, then write something like cpp_impl!( CppMap for (Foo,i32), (Foo, String), ...), which kicks off generation of the additional std::map instantiations. It also implements CppMapParams for (Foo,i32), (Foo,String), etc.


#17

Why is that unreasonable? That seems pretty typical to me, actually, of working with C++ templates. But also, since we currently build all Rust code from source, if you have the headers available to start, you’ll have them available the whole way through.


#18

You are right, if everything is being built from source, this is not a problem. But at least on Windows, distribution of pre-compiled libraries is not uncommon, and this is the scenario I was worried about.
If this isn’t something that Rust aims to support, the late template instantiation option is the way to go, no doubt.


#19

I assume that these pre-compiled libraries must contain the header files in some form, at least if the libraries expose templates that can be instantiated in an open-ended way (versus to some fixed set of types). Basically, C++ has the same problem we do here, since they also monomorphize.

As far as Rust goes, we certainly aim to stabilize the ABI (eventually), and that should make distribution of binaries possible. However, there are a lot of advantages to building from source, so my expectation is that this will remain the primary way of distributing libraries. Basically there are a lot of modes one might use when compiling, and if you use binaries, you have to make these choices on behalf of your consumer: e.g., what target? debug info or not? what allocator? can we do LTO? do you want landing pads? overflow checks? etc. That said, to have the benefits, you don’t necessarily have to distribute SOURCE, it might also be serialized MIR or something similar (which would be roughly analogous to java .class files).


#20

For C++ - yes. Rust, however, abstracts a lot of that away by storing AST for generics in crate metadata, so from the user’s point of view Rust crates are self-contained. With C++ in the mix though, this abstraction breaks.
So yes, Rust ergonomics wouldn’t be worse than that of a comparable C++ library, but IMO the bar could be slightly higher.

What are your thoughts regarding the generic foreign functions? Would something like this be considered for inclusion in the language for the sake of c++ template interoperability? (note that this feature is useless on its own, it depends on a codegen plugin)