Rustc Optimization: Whole Program Devirtualization

Hello everyone,

Back in 2019, Matt Godbolt did an entertaining talk about compiler optimization, and briefly mentioned something that really caught my attention: speculative devirtualization in C++.

I found this exciting, because of Rust programs like this:

/********* devirt-lib crate: src/ *********/
pub trait Printable {
    fn print_line(&self, handle: &mut dyn std::io::Write) -> std::io::Result<()>;

pub struct PrintableArray(Vec<Box<dyn Printable + 'static>>);

impl PrintableArray {
    pub fn new() -> PrintableArray {

    pub fn push(&mut self, item: impl Printable + 'static) {

    pub fn print_all(&self, handle: &mut dyn std::io::Write) -> std::io::Result<()> {
        self.0.iter().try_fold(handle, |handle, item| {
        }).map(|_| ())

/********* devirt-test crate: src/ **********/
use std::io::Write;

use devirt_lib::PrintableArray;

struct Point {
    pub x: f32,
    pub y: f32,

impl devirt_lib::Printable for Point {
    fn print_line(&self, handle: &mut dyn Write) -> std::io::Result<()> {
        writeln!(handle, "Point: {{ {}, {} }}", self.x, self.y)

fn main() -> std::io::Result<()> {
    let mut vec: PrintableArray = PrintableArray::new();
    vec.push(Point { x: 10.0, y: 20.0 });
    vec.push(Point { x: 20.0, y: 40.0 });
    vec.push(Point { x: 30.0, y: 60.0 });
    vec.push(Point { x: 40.0, y: 80.0 });
    vec.print_all(&mut std::io::stdout())

That type erasure across a crate boundary looks just like a C++ virtual function call. In fact, here is the assembly for loop inside print_all, which is inlined into the main function (produced on a recent nightly):

    6f50:       add    r15,0xfffffffffffffff0
    6f54:       add    rbx,0x10
    6f58:       add    rbp,0x10
    6f5c:       cmp    rbp,r12
    6f5f:       je     6f89  # loop exit
    6f61:       mov    rdi,QWORD PTR [rbx-0x10]
    6f65:       mov    rax,QWORD PTR [rbx-0x8]
    6f69:       call   QWORD PTR [rax]  # virtual call
    6f6b:       lea    rbp,[rbx-0x10]
    6f6f:       mov    rax,QWORD PTR [rbp+0x8]
    6f73:       mov    rsi,QWORD PTR [rax+0x8]
    6f77:       test   rsi,rsi
    6f7a:       je     6f50  # loop top
    6f7c:       mov    rdx,QWORD PTR [rax+0x10]
    6f80:       mov    rdi,QWORD PTR [rbx-0x10]
    6f84:       call   r13
    6f87:       jmp    6f50  # loop top

It looks exactly the same, and could benefit from speculative devirtualization, which LLVM calls Whole Program Devirtualization (WPD).

At the time of the presentation in 2019, Matt said that LLVM was still working on it. A quick grep of the LLVM monorepo suggests it was implemented in LLVM 10, with additional improvements in LLVM 11.

At first, I hoped that LLVM would implement it, and Rust would get it for free. Unfortunately, reading the LLVM RFC and several of the patches, some explicit support by the compiler is needed. It must handle LTO item visibility in certain ways in order for the pass to trigger.

Given the amount of work that Rust has done on LTO, I imagine it is not an easy change. If it was, I would have written this as a GitHub issue instead.

My post here is to just to ask if this is on anyone's radar, and see if people think it's worth it for Rust.


Looks interesting, definitely sounds like something that could be beneficial to Rust!

There’s an open issue for this, and there are a few rough notes there on how it might be implemented.


Thanks for the pointer, @nathanwhit! I couldn't find that when I searched. I will give that a follow.