Tables-next-to-code optimization for boxed closures?


#1

Calling a closure trait object works the same as calling any method on any trait object, as you can see here. You dereference the vtable pointer (at some offset) to get a pointer to the call function, and then execute an indirect call. This double indirection could be replaced with a single indirection, using an optimization similar to GHC’s “Tables next to code”. You replace the vtable pointer with a pointer directly to the call function, and then store the rest of the vtable (size, align, drop glue) immediately before it in .text.

Has this optimization been considered for Rust? Clearly it’s more important in GHC, where essentially everything you do is a call to a boxed closure. But Rust could benefit too, and it seems like a relatively straightforward change. It could apply to any trait object with a single method (ignoring drop).

One issue is that this involves intermingling code and (read-only) data in .text, which may upset LLVM or some linkers. I think GHC added support to LLVM specifically for this use case.


#2

I think this optimisation would work in general for any trait with a single-method vtable, which would avoid the need to have Fn-specific logic.

edit: ugh could have sworn that wasn’t there earlier :stuck_out_tongue:


#3

That’s what I said in the penultimate paragraph :slight_smile:

edit: 'sall good


#4

You want prefix data and you can use it today:

Prefix data is data associated with a function which the code generator will emit immediately before the function’s entrypoint. The purpose of this feature is to allow frontends to associate language-specific runtime metadata with specific functions and make it available through the function pointer while still allowing the function pointer to be called.

To access the data for a given function, a program may bitcast the function pointer to a pointer to the constant’s type and dereference index -1. This implies that the IR symbol points just past the end of the prefix data. For instance, take the example of a function annotated with a single i32,