Idea: Would Rust benefit from explicitly polymorphic generics?

Trequetrum · February 2, 2023, 5:55pm

Currently, when you see a generic type in Rust, implementation considerations are left entirely up to the compiler so long as the correct behavior is implemented for each type.

Rustc attempts to determine when fewer copies of functions are necessary and avoid making those copies - known as "polymorphization". As a result of polymorphization, items collected during monomorphization cannot be assumed to be monomorphic.

While it's great that this idea of polymorphization can be extended to more advanced cases, I think it might (emphasis on might) be worthwhile to surface some syntax for explicitly polymorphic code. Code that looks and is typed generically, but which is required to have only a single implementation.

Doing this will open up the full power of type-level generics for Trait Objects so long as some constraints are met. Certainly there are many cases where recognizing/enforcing polymorphism would be difficult (or undesirable), but I think there's a fairly expressive set of cases where it can be done and might be worth the effort.

The simplest example might be:

// polymorphic
fn pointer_identity<T>(v: &T) -> &T {
    v
}

Here we have a function that is generic, but there's really only one possible implementation. We don't need to know the size, alignment, or offsets for the generic type T. This is clearly a bit simplistic, but any situation in which you're managing pointers to types without de-referencing the pointer is a candidate.

If you're writing code with trait objects, this could happen quite a bit (though there's currently no object safe generic methods). Composing dyn closures should be doable polymorphically. If this were expressible in Rust, you could write fully generic code without any monomorphization.

fn pointer_compose<T, R, S>(
    a: &dyn Fn(R) -> S,
    b: &dyn Fn(T) -> R,
) -> Box<dyn Fn(T) -> S> {
    Box::new(|v| a(b(v)))
}

The size of T, R, and S will all matter at some point, but I think they don't matter here. Everything captured by this closure has a known size and the closure is boxed so it's known too.

Why does that matter? Well, it opens up trait objects to generics which means code is properly co-located and accessible on type-erased structs. There are situations where code size matters, like compiling to wasm.

It might also make compilation easier in some cases if the compiler doesn't need to consider the call sites for some generic code. If your API's generics are all polymorphic, users of your library don't need to compile new monomorphic versions of generic functions.

What are the downsides?

This might fragment the rust ecosystem a bit. Writing libraries which have a given size regardless of use (no monomorphization) seems laudable but may come at the cost of performance even in situations where monomorphization wouldn't have contributed to code size.

Pretty much everything generic could be done at least two ways. One with fat pointers and pointers and one with generics compiled into concrete types. Would we see an explicitly polymorphic implementation of std::Vec that only stores pointers?

It's yet another thing to learn and yet another choice to make as you design your software.

Update: What about struct definitions?

Another thing to consider under the same umbrella is that for this to expand how useful it might be, it would be beneficial to be able to have generic structs with a single implementation.

For example, consider the follow struct that carries two references. (I've dropped lifetime annotations here to avoid the extra clutter)

struct KnownSizeGenericStruct<T,R> {
    p1: &T,
    p2: &R,
    count: u8 
}

The way this struct is laid out in memory doesn't depend on T or R directly. A polymorphic function should be able to create a KnownSizeGenericStruct for any chosen T or R. Code that later dereferences p1 or p2 would need to know, but that's not a problem.

impl <T,R> KnownSizeGenericStruct<T,R> {
    fn print_sizes(&self) {
        println!(
            "p1: {}, p2: {}", 
            size_of_val(self.p1), 
            size_of_val(self.p2)
        );
    }
}

bjorn3 · February 2, 2023, 8:06pm

Note that polymorphization isn't even enabled by default yet. It also doesn't support polymorphization in many cases where it actually would be possible.

SkiFire13 · February 2, 2023, 8:30pm

Trequetrum:

fn pointer_compose<T, R, S>(
    a: &dyn Fn(R) -> S,
    b: &dyn Fn(T) -> R,
) -> Box<dyn Fn(T) -> S> {
    Box::new(|v| a(b(v)))
}
The size of T, R, and S will all matter at some point, but I think they don't matter here. Everything captured by this closure has a known size and the closure is boxed so it's known too.

Don't you need to know at least R in order to know a and b's calling conventions and thus generate the closure's code? For example a might take R in some registers but b returns it in some other registers, or maybe it returns it on the stack.

Also, though I'm not sure if it's possible, the way S is returned might be different between a fn(T) -> S and a fn(R) -> S, or the way T is takes might differ between a fn(T) -> S and a fn(T) -> R.

Trequetrum · February 2, 2023, 8:41pm

Yeah, I see. This may not work because a and b are used within the closure. Perhaps that's not the best example. Hmmm! I'll think about it a bit more and do some reading.

Does returning a generic struct instead have the same issues? At this point you'd need the size of T,R,&S, but perhaps boxing them solves that problem (As, in, it's not turtles all the way down).

chrefr · February 3, 2023, 8:57am

I still prefer the compiler to do it automatically.

tczajka · February 3, 2023, 10:05am

This already exists. For example this is a runtime polymorphic function:

fn hello_world(writer: &mut dyn std::fmt::Write) {
    writer.write_str("hello world").unwrap();
}

eggyal · February 3, 2023, 10:43am

Agreed. At best maybe an attribute that hints to the compiler that a polymorphic implementation is preferred, similar to how #[inline] is only a hint but not a guarantee?

CAD97 · February 3, 2023, 10:46am

The idea is to make polymorphic functions with monomorphic interfaces. The function itself works via dynamic dispatch, but the return type retains whatever type information was fed into it rather than being unsized. Kind of the opposite to the unsizing coercion; a resizing coercion.

The canonical example is fn identity<T>(it: &T) -> &T { it }; this function can have a single shared instantiation because it doesn't actually depend on the concrete type of T, but it can still promise to return the same reference it was given, and thus the caller can remember and continue using the known concrete type even though it's been passed through a function that temporarily forgot what the concrete type was.

Another trivially simple case is marker tag types which aren't utilized by the function but just forwarded unused into the return type.

This is essentially the option of treating type generics like lifetime parameters, with the cost of them acting like lifetime parameters and selecting behavior based on them being impossible. (If you want to, you'd need to pass in the specification information separately.) They're fully erased at runtime but still maintained and propagated as part of the type system.

The extent of how useful such a feature would be is a bit unclear, since the restrictions required to enable truly transparent full polymorphization of monomorphic interfaces are significant^[1], but it's certainly of potential use. I have a crate (erasable) where one of its two main purposes is to assist with the implementation of polymorphic data structures.

And even the restrictions required for nontransparent polymorphization — I've written this previously as fn f<dyn T: Trait>(_: &T) -> &T — are decently restrictive, in that the return type must be legally able to unsize from T to dyn Trait, such that it can also be "resized". ↩︎

tczajka · February 3, 2023, 11:01am

As far as I can tell, this is already the syntax that Rust has. If you have a function parametrized by <T>, then the type T is a parameter to the function. It could (with an appropriate ABI) always be implemented with a single function that takes T as an extra parameter. The identity function is simply a special case that makes no use of that extra parameter.

Monomorphization is an optimization. Similarly, the compiler could make two separate versions of a function that takes an i32 parameter, if it knows that function is always called with either 3 or 5.

This is same as in Standard ML. In you have a polymorphic function in Standard ML, some compilers (smlnj) will compile it to one function, while others (mlton) will compile it to multiple copies by monomorphizing.

Trequetrum · February 3, 2023, 1:20pm

Do you have any insight on why this optimization isn't lifted in the context of object safety?

CAD97 · February 3, 2023, 1:21pm

Yes, as is polymorphization, as noted in the OP; this is about explicitly requesting polymorphization and having the compiler ensure it's possible, as opposed to the standard generics interface which, while not strictly requiring monomorphization, is heavily biased towards it via all the non object-safe options.

This is I think a bit of a simplification of the full reality.

Lifetime parameters aren't extra parameters, they're fully removed before code generation. Proper language supported polymorphization would work the same way; a single version of the function exists at the end of the semantic frontend after monomorphization. The backend is of course free to duplicate the function and devirtualize it, but the point of using the feature is for cases where having a single polymorphic path is likely beneficial and saving the backend the effort of merging functions.

This feature is fundamentally about removing capabilities from what you can do in the surface language (when using the feature) such that the compiler can offer better properties about the generated code than when you're using the unrestricted scope of generics. It's not about new capabilities nor ergonomic improvements, but more influence over the choices the compiler makes. Obviously the language can't really offer any guarantees about what code generation looks like, but it absolutely can offer the knobs to turn to influence what choices are easy to make efficiently.

What the feature would do can be described in terms of source transform; using an arbitrary syntax,

pub fn example<dyn T: Bounds>(x: MentionsA<'_, T>) -> MentionsB<'_, T> {
    /* body */
}

becomes roughly

pub fn example<T: Bounds>(x: MentionsA<'_, T>) -> MentionsB<'_, T> {
    let _1: MentionsA<'_, dyn '_ + Bounds> = convert_unsize(x);
    let _2: MentionsB<'_, dyn '_ + Bounds> = _example(_1);
    unsafe { convert_resize(_2) }
}

fn _example(x: MentionsA<'_, dyn '_ + Bounds>) -> MentionsB<'_, dyn '_ + Bounds> {
    /* body */
}

Trequetrum · February 3, 2023, 1:26pm

Sure, the question becomes whether it's useful to have guarantees around when that will happen. Right now traits with generic methods are not object safe. Before the compiler could possibly accept polymorphic methods as object safe, it would need to know which functions are polymorphic in a backwards compatible way.

Trequetrum · February 3, 2023, 1:55pm

I suspect we have our vocabulary mixed up a bit.

I wouldn't consider that a polymorphic function. hello_world only works for a single input type (that type being a trait object). This explicitly has lost the type information about which struct implemented the trait.

It feels a bit polymorphic to the caller because rust will convert structs into trait object auto-magically for us. But really, the polymorphic feeling of such a function is that there is a well defined set of types that can be converted before being passed in. The type that a trait object was created from is lost in the conversion.

Aloso · February 3, 2023, 4:15pm

That's because there's an implied T: Sized bound. If we made T: ?Sized, then we would need 2 instantiations, one for Sized types and one for !Sized types. This is still better than having an instantiation for every single T though.

jrose · February 3, 2023, 5:38pm

For comparison, Swift has this (generic functions are ABI-polymorphic, with monomorphization as an optimization), and Gankra has a good article about how it works in Rust terms.

WorldSEnder · February 3, 2023, 6:10pm

A very similar thing was discussed in Idea/Pre-RFC: Runtime-erased types and reflections around object safety.

Trequetrum · February 3, 2023, 8:26pm

That's a very cool and informative read!

I think we can get a neutered version of what Swift has with considerably less work (though I suspect it would still be a lot of work), while leaving open the possibility to expand it further. As described, what swift is doing isn't easy and may not even fit within Rust's design philosophies, but can we make progress?

Gives me hope that this is possible. The post sort of expands on why/how fat pointers already get us part of the way!

CAD97 · February 3, 2023, 11:45pm

Two isn't strictly enough yet either. You need an instantiation for each unsized pointee "kind" which has a different type for its pointee metadata.

Because the two available unsized kinds currently are slices (usize metadata) and trait objects (roughly &'static Vtable), it feels like they could be treated the same, but some exotic platform ABIs do pass pointers and pointer-sized integers differently, and there's the possibility of introducing new kinds of unsized pointees in the future.

bjorn3 · February 4, 2023, 12:00am

For the rust abi we can decide to treat them identically on all platforms if necessary, even on those where the plarform ABI distinguishes between them. And for other abi's we can exploit the fact that we don't guarantee any particular layout for fat pointers and as such can use a layout containing two "pointers" even for slices.

dlight · February 6, 2023, 3:40am

I always kind of hated that in order to switch from static dispatch to dynamic dispatch you would need to switch from generics to trait objects, because the change is quite involved and may require large changes to the codebase.

I would like to both have some form of guaranteed polymorphic instantiation for traits like AsRef, AsMut and Into (like an automatic momo, performed by the compiler itself), but also just freely instantiate generics monomorphically or polymorphically in my application code.

And no, depending on compiler optimizations is not enough, because they can be brittle (small changes in the code may undo them)

Topic		Replies	Views
[Idea] Type erasure in Rust through parametricity language design	7	4283	March 25, 2019
Idea: polymorphic baseline codegen compiler	22	3261	March 25, 2019
Idea/Pre-RFC: Runtime-erased types and reflections around object safety language design	15	822	October 30, 2022
[Idea]Levity and Levity polymorphism? language design	16	1250	August 8, 2020
Dynamic polymorphism – alternative designs discussion language design	12	1962	December 31, 2020

Idea: Would Rust benefit from explicitly polymorphic generics?

Related topics