This code is currently rejected:
extern "C" {
fn foo<T>(x: T);
}
This makes sense: without having access to the body of foo, the compiler cannot monomorphize the generic at compile time.
However, there are a number of reasons you might want to call a generic function external to Rust (key among them FFI with higher-level languages like C++). This cannot be made to work with Rust's current instantiation paradigm.
Link-time monomorphization
All modern C++ compilers implement templates the same way we implement generics: when compiling a translation unit (i.e., a crate codegen unit, for us), the template is used to generate a concrete function which is marked as weak symbol (or COMDAT, depends on the linker). At link time, exactly one implementation is linked into the binary, since they're all identical.
However, some less popular/older compilers did it differently: they would emit relocations for the mangled symbols of the concrete functions, but only instantiate them at link time. This has a large number of downsides, including that a lot of diagnostics might be put off until link time, rather than compilation of a callsite. Yikes!
However, this gives us a way we could make "generic externs" work. When a generic extern
-declared function is called, the compiler emits a relocation for it, and places its name somewhere (e.g. in an object section.rust.missing_monos
, or whatever). It would be the responsibility of the user to look in this somewhere, parse the names, and then go and perform monomorphization manually, generating object files that contain the requested symbols. For example, in the C++ case, this might mean shelling out to clang to trigger template instantiation. This is similar to how you must provide object files containing extern
'ed symbols at link time.
The nitty-gritty
There's a lot of moving pieces to what is a pretty controversial idea. Here's how I see this implemented in practice:
-
extern
blockfn
s may have type and const parameters in addition to lifetime parameters, and theirwhere
clauses may include these parameters. - When such a function is called, the compiler will perform the usual analysis against its signature (checking bounds, expanding associated types, etc). However, rather then instantiating a generic template, it simply emits a relocation to the mangled name. Such monomorphizations are deferred.
- Upon completion of emitting the
.rlib
, rustc emits further data (be it in the.rlib
with a sanctioned way of extracting it, or in a sidecar file) that describes monomorphizations that were deferred. Each monomorphization is two parts:- Some machine-readable description of the parameters. We'd need to come up with some mini-language to describe things like
Foo<i32, false, Bar { x: -1 }>
. This may become a problem when closure types ordyn
s get involved. - A linker symbol, which is the symbol in the relocation. This will normally be the mangled name, but its contents are formally unspecified. Tooling external to rustc should not have to parse mangled names.
- Some machine-readable description of the parameters. We'd need to come up with some mini-language to describe things like
At link time, the caller must provide .a
files that include the requested symbols. Failure to do so is an ungraceful link error. Tooling could read the list of deferred monomorphizations and produce compatible Rust or (or another language) code that would compile to objects with the correct symbols.
Why Bother?
The main issue is that there is no way to write a generic function in Rust that calls out into external code (well, in a way that couldn't be achieved with a normal function). This is a problem for doing seamless interop with languages that have Rust-like monomorphization, even though it's otherwise completely feasible (with post-monomorphization errors, but that's a given with anything that involves externs).
There are a lot of problems with this approach. The top ones that come to mind are:
- Designing the machine-readable mini-language of type parameters. We should not stabilize mangling to make this work.
- We're pretty screwed for closure types. I don't think there's a way to make this work.
I'm sure others will point out other technical issues. I don't think this approach adds any new problems that weren't already present with the current way we link foreign code (e.g. cc
), except that Cargo build scripts don't have a good way of getting this information. This seems orthogonal, and something that can potentially be sorted out with a "post-compile" or "link-time" version of build.rs
.