In this post, I propose to instantiate all templated functions with dynamic dispatch, and rely on inlining for case-by-case optimization. Thanks to Rust’s typesystem, we don’t even need to type-check multiple times.
The benefits are: awesome binary/code size, and fast compile time like C/Go programs.
The drawbacks are: they are harder (slower) to inline, and overall it would have less optimized code compared to directly instantiated one. Plus that we will get a complicated ABI.
First, for treating functions:
If the argument is a (combination of) trait: pass a &move pointer, like unsized rvalues. Maybe we will have to use multiple vtables (new ABI).
If the argument is a generic sturct: in addition to pass the struct just as before, pass vtables for the generic arguments.
If the return type may vary in size, create a companion function that calculates the size and alignment. If it’s a trait method, include this function inside the vtable. The caller allocates the space for return value with alloca.
Second, for treating generic structs:
We create a function that computes size and alignment inside the vtable for each member which can have varying size.
Alternatively, a function that computes offset for each member that can have varying offset.
In both cases, there would probably be complex runtime code to compute layout which is not good for inlining. Additionally, it’s harder to preserve things like null-pointer optimization.
Above is my brief idea to achieve polymorphic codegen in all cases. Would this be worthwhile to implement? How does this compares to just boxing everything? Alternatively, should we make this a type system property (like enhanced unsized rvalues) instead?
A relevant talks about generics impl in Swift, which also try to hit the sweat spot of “by-value + dynamic dispatch by default with optional monomorphization + separate compilation”: https://www.youtube.com/watch?v=ctS8FzqcRug
I think something like this could be useful as a mechanism to reduce compile times while developing code.
In other words, a Rust REPL could use something like this to avoid work that we don’t care about until it is time to benchmark or ship something. As far as I’m concerned, this doesn’t even have to do with the language itself but can be an implementation detail in rustc.
There are many practical implementation issues with such a compilation strategy, even though it’s conceivable (as Swift demonstrates). However, for (near-future) Rust there’s also a problem that I fear might make it infeasible to avoid monomorphization completely: Type parameters can influence constant calculations (e.g., size_of, align_of, associated constants), and these constants can in turn influence which impls are instantiated and used. While currently not even things like fn foo<T>() -> [u8; size_of::<T>()] are implemented, we’ve already accepted basic constant generics as a feature and future extensions such as combining them with specialization seem plausible as well.
So for example, we might end up in a situation where a generic function can contain a line like NAryTuple::<T::CONST, T>::print(...) and not only are there a dozen completely different hand-written implementations of print that this might call, some of these impls might not even be known at the time this line is compiled – so you can’t even put them all in one neat vtable.
Type parameters can influence constant calculations (e.g., size_of , align_of , associated constants)
The first two are already available on vtable in the form of size_of_val, align_of_val. Associated constants... I don't know, but maybe we can put/compute it on vtable.
As for specialization, this idea is fine if we limited it to "always applicable impls", where you always have enough data at runtime to check if a specialization applies.
For the NAryTuple case: I'm not sure I understand. Writing/macro-generating a bunch of impls is already not great for the compiler, so I guess that's not an issue. Can you elaborate on "some of these impls might not even be known at the time this line is compiled"?
I wouldn’t want to see this happen by default, but I would love to have options for this, when attempting to drastically reduce compiled binary size even at the expense of performance.
As a historical note, we used to have such an implementation. It was horrifically complicated. I think that it might be better now, as we have MIR and other things that let us be a bit more careful — otoh, we also have a lot of features that rely on monomorphization (some have come up in this thread, I don't have a full list in my head).
I would be interested here but to my mind, this is not "low hanging fruit" when it comes to improving compile speed. I feel like MIR optimizations are more likely in that regard, as well as projects like CraneLift (nee Cretonne) (see e.g. this previous thread).
Yes, that's why I didn't cite the existence of these constants in general as a problem. However, this solution moves any calculations involving them to run-time, which means they aren't available e.g. when constructing vtables.
Sorry, to make explicit what I was thinking of:
// base case, "always applicable" if we extend that notion to const generics
impl<T, N: const usize> NAryTuple<T, N> { ... }
// a bunch of specializations for specific values of N
impl<T> NAryTuple<T, 1> { ... }
impl<T> NAryTuple<T, 2> { ... }
impl<T> NAryTuple<T, 3> { ... }
// ...
Now, NAryTuple::<T::CONST, T>::print refers to the specialization for the value of N = T::CONST if it exists, and the generic base impl otherwise. To implement this via vtable-passing, we need a unified vtable that gives us the print implementation for every n (since the caller doesn't necessarily know which values of N the callee wants to use, they can't create and pass a vtable precisely for this value). Normally – I believe even with type specialization involved – a unified vtable contains a relatively efficient flat array of precisely the methods that exist and anyone using the vtable knows at which offset each method is. This doesn't work in this case because any crate can add a new specialization for a new constant value for a specific type, e.g.
struct Foo;
impl NAryTuple<Foo, 47> { ... }
While I suppose you can put a more involved data structure into the vtable (e.g. a search tree with concrete specializations in the leaves) this makes virtual calls much more involved and expensive.
Perhaps a more feasible variant of this, one we have oft discussed but never implemented, would be to analyze the MIR of fns and determine to what extent it relies on its type parameters.
To start, we might look for things that do not not rely at all on any particulars of the type. Something like the len method of a Vec (or even Vec iterators) might fit the bill (particularly since we know that Vec<T> requires T: Sized). In those cases, we can collapse monomorphizations.
Once we have the base case working, we can probably extend. I imagine there are many functions that only care about the size/alignment of their type parameters.
We’ve had some PRs take a stab at this, but it’d be good to figure out a planned impl strategy beforehand, I think, since most of those ran aground before really getting started.
Actually before trying to implement what I would like to see is a PR that just figures out when this transformation would be possible. Then we can try to gather some numbers -- we can also argue about the various details. =)
I think monomorphization is a form of partial evaluation, if we have some sort of dependent types (we do; like in [u8;10]). What need to be done is just put all hidden parameters (sizes, vtables...) implied by the type together, making a fully desugared function signature, then partial evaluation on the static decidable values.
Some monomorphization annotations could enforce that a specific method doesn’t use a specific type argument (or const argument once we have const generics). If you annotate a method with that, and you use something that relies on the specified type/constant argument, it gives a compilation error. There are other possible similar annotations.
Yes, and I have no objection to such annotations, but if we want to have a real impact on compilation time, we have to infer automatically. It’s not reasonable to expect people to use such annotations outside of specialized cases.
then foo has a vtable for T: Debug, but bar needs a vtable for Vec<T>: Debug. So we have to allocate one of those and create it dynamically. I think we can use the stack, though. And yes, in the case of specialization, this could of course require doing trait resolution, which would require reifying all kinds of information. I think without specialization we can handle that statically?
Can this MIR analyzers be reused for outlining fuctions? I.E. Taking code like:
fn myfunk(a: impl AsRef<str>) {
let b = a.as_ref();
...
// long function that does not depend on the type of a,
// but still gets recompiled for each monomorphization
...
}
into:
fn iner_myfunk(b: &str) {
...
// long function that gets compiled once
...
}
fn myfunk(a: impl AsRef<str>) {
// small stub that gets recompiled for each monomorphization
iner_myfunk( a.as_ref())
}
In additional to just implementation strategy, I think the idea of “dynamic generics” does open some doors to language features.
For example today we have to restrict std::any::Any to 'static types because lifetimes are erased but RTTI information need to be static. However, if RTTI information can be created dynamically, we can then remove this limit.
Furthermore, with dynamic RTTI we can relax the current object safety rules, as most of the reasons for those rules do not apply any more:
“static” methods (methods without receiver): You can call them through runtime type identifier.
“generic” methods: pass runtime type identifiers.
"bound by Self": perform runtime checks if static analysis is not enough
So as the first step, maybe we can add a global annotation to enable dynamic RTTI? With dynamic RTTI, generic functions are just desugared as signitured with a set or “type arguments” and a set of “value arguments”.