Hello everyone,
Back in 2019, Matt Godbolt did an entertaining talk about compiler optimization, and briefly mentioned something that really caught my attention: speculative devirtualization in C++.
I found this exciting, because of Rust programs like this:
/********* devirt-lib crate: src/lib.rs *********/
pub trait Printable {
fn print_line(&self, handle: &mut dyn std::io::Write) -> std::io::Result<()>;
}
pub struct PrintableArray(Vec<Box<dyn Printable + 'static>>);
impl PrintableArray {
pub fn new() -> PrintableArray {
PrintableArray(Vec::new())
}
pub fn push(&mut self, item: impl Printable + 'static) {
self.0.push(Box::new(item))
}
pub fn print_all(&self, handle: &mut dyn std::io::Write) -> std::io::Result<()> {
self.0.iter().try_fold(handle, |handle, item| {
item.print_line(handle)?;
Ok(handle)
}).map(|_| ())
}
}
/********* devirt-test crate: src/main.rs **********/
use std::io::Write;
use devirt_lib::PrintableArray;
struct Point {
pub x: f32,
pub y: f32,
}
impl devirt_lib::Printable for Point {
fn print_line(&self, handle: &mut dyn Write) -> std::io::Result<()> {
writeln!(handle, "Point: {{ {}, {} }}", self.x, self.y)
}
}
fn main() -> std::io::Result<()> {
let mut vec: PrintableArray = PrintableArray::new();
vec.push(Point { x: 10.0, y: 20.0 });
vec.push(Point { x: 20.0, y: 40.0 });
vec.push(Point { x: 30.0, y: 60.0 });
vec.push(Point { x: 40.0, y: 80.0 });
vec.print_all(&mut std::io::stdout())
}
That type erasure across a crate boundary looks just like a C++ virtual function call. In fact, here is the assembly for loop inside print_all
, which is inlined into the main
function (produced on a recent nightly):
6f50: add r15,0xfffffffffffffff0
6f54: add rbx,0x10
6f58: add rbp,0x10
6f5c: cmp rbp,r12
6f5f: je 6f89 # loop exit
6f61: mov rdi,QWORD PTR [rbx-0x10]
6f65: mov rax,QWORD PTR [rbx-0x8]
6f69: call QWORD PTR [rax] # virtual call
6f6b: lea rbp,[rbx-0x10]
6f6f: mov rax,QWORD PTR [rbp+0x8]
6f73: mov rsi,QWORD PTR [rax+0x8]
6f77: test rsi,rsi
6f7a: je 6f50 # loop top
6f7c: mov rdx,QWORD PTR [rax+0x10]
6f80: mov rdi,QWORD PTR [rbx-0x10]
6f84: call r13
6f87: jmp 6f50 # loop top
It looks exactly the same, and could benefit from speculative devirtualization, which LLVM calls Whole Program Devirtualization (WPD).
At the time of the presentation in 2019, Matt said that LLVM was still working on it. A quick grep of the LLVM monorepo suggests it was implemented in LLVM 10, with additional improvements in LLVM 11.
At first, I hoped that LLVM would implement it, and Rust would get it for free. Unfortunately, reading the LLVM RFC and several of the patches, some explicit support by the compiler is needed. It must handle LTO item visibility in certain ways in order for the pass to trigger.
Given the amount of work that Rust has done on LTO, I imagine it is not an easy change. If it was, I would have written this as a GitHub issue instead.
My post here is to just to ask if this is on anyone's radar, and see if people think it's worth it for Rust.