Will it ever be possible to optimize away trivial futures?

akriegman · July 21, 2022, 7:26pm

I'm just curious if an optimization like this is possible, or if there's some reason why it's theoretically not practical. Consider this code:

use futures::executor::block_on;

async fn foo_async() -> i32 {
    return 5;
}

pub fn foo() -> i32 {
    return block_on(foo_async());
}

Theoretically this could be a very small program, but if you compile this on Rust playground and look at the assembly or llvm-ir output, you still get a very large program. Are optimizations for this type of thing coming, or is this just not doable?

jessa0 · July 21, 2022, 8:10pm

Most of that assembly comes from block_on, which has a functioning Waker implementation.. If you use noop_waker it optimizes down to a single mov: Playground Link.

jrose · July 21, 2022, 9:24pm

I think OP’s question is still relevant. The general API might indeed need to block, but a particular caller has a Future that’s already Ready. Inlining will get us to the example in the original post, and it would be great™ if the optimizer could go further.

Turning the problem around (and pushing it back towards users instead of internals), is there anything in block_on that could be implemented differently to handle that? Is it something all Wakers have to do for themselves? Given the signature of poll it’s hard to not make the Waker ahead of time, but maybe block_on should poll with a no-op waker first and then poll again with a real one?

CAD97 · July 21, 2022, 9:55pm

I'm not surprised that LLVM struggles to optimize async completely away; the waker API necessarily involves going through an erased vtable, so inlining requires devirtualization.

Ultimately, it's very likely that when you run a future it will involve at least one wait, so optimizing for const propagation is pretty unnecessary in the general case.

eddyb · July 22, 2022, 2:15am

If you look for declare in the LLVM IR, you'll see these futures_* functions:

<futures_executor::local_pool::ThreadNotify as futures_task::arc_wake::ArcWake>::wake_by_ref
futures_task::waker_ref::WakerRef::new_unowned
futures_executor::enter::enter
<futures_task::waker_ref::WakerRef as core::ops::deref::Deref>::deref
<futures_executor::enter::Enter as core::ops::drop::Drop>::drop
<futures_executor::enter::EnterError as core::fmt::Debug>::fmt

All of these functions are being imported from already compiled machine code in upstream crates, and without turning on LTO (or changing the code to use #[inline]) they can't be optimized away.

Some of them make sense to not get cross-crate codegen (like the fmt::Debug impl for the error), but not most of them.

Looking through though, seems like #[inline] is needed for futures_task::waker_ref::WakerRef:

<futures_task::waker_ref::WakerRef as core::ops::deref::Deref>::deref
futures_task::waker_ref::WakerRef::new_unowned
honestly, every function in that module should get #[inline], ironically the only one that already has it (waker_ref) doesn't need it because it's already generic
in particular, these block optimizations because they obfuscate access to an Waker, which contains a vtable, so any virtual calls through the Waker held in a WakerRef will be completely unknown to LLVM

While the rest could use it, they seem to be all leaves (i.e. likely not blocking optimizations):

<futures_executor::local_pool::ThreadNotify as futures_task::arc_wake::ArcWake>::wake_by_ref
- sets an AtomicBool to true (and optionally Thread::unparks)
futures_executor::enter::enter
- sets a thread_local! Cell<bool> to true
<futures_executor::enter::Enter as core::ops::drop::Drop>::drop
- sets the same thread_local! Cell<bool> back to false
while the AtomicBool or thread_local! state could in theory be important for optimizations, it can only block optimizing out branches, which is a much smaller issue compared to the devirtualization blocking

Rust doesn't have any closed-world assumptions (at least yet AFAIK) wrt virtual calls, so devirtualization is only const-prop (if we include "constant-folding loads from constant globals" under the same umbrella).

And the only things that can block const-prop are either true dynamism (e.g. switching executors at runtime) or hidden static knowledge. Always look for the latter, because it's much more common to forget an #[inline] (and/or mislead LLVM into thinking some value can mutate, though I don't see that here), especially when there's no hint of true dynamism from the source code.

eddyb · July 22, 2022, 10:40pm

Out of curiosity I checked the rust-lang/futures-rs repo and saw:

Thanks, @xfix, that was quick!

system · October 20, 2022, 10:41pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Blog post: A formulation for scoped tasks language design	20	2761	July 31, 2023
Vectorizable futures language design	1	507	January 11, 2024
Should the standard library have a basic Future runtime?	13	2430	November 14, 2019
Desire: async is IntoFuture, not Future language design	5	655	June 18, 2024
An opportunity to improve performance of generators and futures language design	4	1564	August 29, 2020

Will it ever be possible to optimize away trivial futures?

Related topics