Will it ever be possible to optimize away trivial futures?

eddyb · July 22, 2022, 2:15am

If you look for declare in the LLVM IR, you'll see these futures_* functions:

<futures_executor::local_pool::ThreadNotify as futures_task::arc_wake::ArcWake>::wake_by_ref
futures_task::waker_ref::WakerRef::new_unowned
futures_executor::enter::enter
<futures_task::waker_ref::WakerRef as core::ops::deref::Deref>::deref
<futures_executor::enter::Enter as core::ops::drop::Drop>::drop
<futures_executor::enter::EnterError as core::fmt::Debug>::fmt

All of these functions are being imported from already compiled machine code in upstream crates, and without turning on LTO (or changing the code to use #[inline]) they can't be optimized away.

Some of them make sense to not get cross-crate codegen (like the fmt::Debug impl for the error), but not most of them.

Looking through though, seems like #[inline] is needed for futures_task::waker_ref::WakerRef:

<futures_task::waker_ref::WakerRef as core::ops::deref::Deref>::deref
futures_task::waker_ref::WakerRef::new_unowned
honestly, every function in that module should get #[inline], ironically the only one that already has it (waker_ref) doesn't need it because it's already generic
in particular, these block optimizations because they obfuscate access to an Waker, which contains a vtable, so any virtual calls through the Waker held in a WakerRef will be completely unknown to LLVM

While the rest could use it, they seem to be all leaves (i.e. likely not blocking optimizations):

<futures_executor::local_pool::ThreadNotify as futures_task::arc_wake::ArcWake>::wake_by_ref
- sets an AtomicBool to true (and optionally Thread::unparks)
futures_executor::enter::enter
- sets a thread_local! Cell<bool> to true
<futures_executor::enter::Enter as core::ops::drop::Drop>::drop
- sets the same thread_local! Cell<bool> back to false
while the AtomicBool or thread_local! state could in theory be important for optimizations, it can only block optimizing out branches, which is a much smaller issue compared to the devirtualization blocking

Rust doesn't have any closed-world assumptions (at least yet AFAIK) wrt virtual calls, so devirtualization is only const-prop (if we include "constant-folding loads from constant globals" under the same umbrella).

And the only things that can block const-prop are either true dynamism (e.g. switching executors at runtime) or hidden static knowledge. Always look for the latter, because it's much more common to forget an #[inline] (and/or mislead LLVM into thinking some value can mutate, though I don't see that here), especially when there's no hint of true dynamism from the source code.

Topic		Replies	Views
Blog post: A formulation for scoped tasks language design	20	2524	July 31, 2023
Vectorizable futures language design	1	454	January 11, 2024
Should the standard library have a basic Future runtime?	13	2322	November 14, 2019
An opportunity to improve performance of generators and futures language design	4	1533	August 29, 2020
Global Executors libs	45	5996	February 27, 2020

Will it ever be possible to optimize away trivial futures?

Related Topics