If you look for declare
in the LLVM IR, you'll see these futures_*
functions:
<futures_executor::local_pool::ThreadNotify as futures_task::arc_wake::ArcWake>::wake_by_ref
futures_task::waker_ref::WakerRef::new_unowned
futures_executor::enter::enter
<futures_task::waker_ref::WakerRef as core::ops::deref::Deref>::deref
<futures_executor::enter::Enter as core::ops::drop::Drop>::drop
<futures_executor::enter::EnterError as core::fmt::Debug>::fmt
All of these functions are being imported from already compiled machine code in upstream crates, and without turning on LTO (or changing the code to use #[inline]
) they can't be optimized away.
Some of them make sense to not get cross-crate codegen (like the fmt::Debug
impl for the error), but not most of them.
Looking through though, seems like #[inline]
is needed for futures_task::waker_ref::WakerRef
:
<futures_task::waker_ref::WakerRef as core::ops::deref::Deref>::deref
futures_task::waker_ref::WakerRef::new_unowned
- honestly, every function in that module should get
#[inline]
, ironically the only one that already has it (waker_ref
) doesn't need it because it's already generic - in particular, these block optimizations because they obfuscate access to an
Waker
, which contains a vtable, so any virtual calls through theWaker
held in aWakerRef
will be completely unknown to LLVM
While the rest could use it, they seem to be all leaves (i.e. likely not blocking optimizations):
<futures_executor::local_pool::ThreadNotify as futures_task::arc_wake::ArcWake>::wake_by_ref
- sets an
AtomicBool
totrue
(and optionallyThread::unpark
s)
- sets an
futures_executor::enter::enter
- sets a
thread_local!
Cell<bool>
totrue
- sets a
<futures_executor::enter::Enter as core::ops::drop::Drop>::drop
- sets the same
thread_local!
Cell<bool>
back tofalse
- sets the same
- while the
AtomicBool
orthread_local!
state could in theory be important for optimizations, it can only block optimizing out branches, which is a much smaller issue compared to the devirtualization blocking
Rust doesn't have any closed-world assumptions (at least yet AFAIK) wrt virtual calls, so devirtualization is only const-prop (if we include "constant-folding loads from constant globals" under the same umbrella).
And the only things that can block const-prop are either true dynamism (e.g. switching executors at runtime) or hidden static knowledge. Always look for the latter, because it's much more common to forget an #[inline]
(and/or mislead LLVM into thinking some value can mutate, though I don't see that here), especially when there's no hint of true dynamism from the source code.