I realize rust has many benefits with borrow checker, but I hope it's clear that the true power rust brings is async/await
As you probably know, the way this is done in C++ is terrible (usually with hacky callbacks) - I have a whole blurb about this @ Evolution of asynchronous programming - Balanced Thoughts
Rust's future state machine, which allows allocation on the stack, is next-level - it can make async programming in embedded systems magical
Unfortunately, it seems there are a couple of small things in future state machine creation that results in binary bloat (especially important in embedded programming) - most of this is around "we made this async API just in case you need to wait on something; but often you don't & just have non-blocking code"; rust does poorly at optimizing the latter
The 2 big examples of this which I came across in my short time coding in rust (my hope is to make Matter, the smart home standard, shift to using rust):
opened 10:47PM - 24 Jul 19 UTC
C-enhancement
T-compiler
A-coroutines
I-heavy
A-async-await
AsyncAwait-Triaged
C-optimization
Generator optimization in #60187 by @tmandry reused generator locals, but argume… nts are still duplicated whenever used across yield points.
For example ([playground](https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=af3e5e9233c45d6397c7b4b3e671f092)):
```rust
#![feature(async_await)]
async fn wait() {}
async fn test(arg: [u8; 8192]) {
wait().await;
drop(arg);
}
fn main() {
println!("{}", std::mem::size_of_val(&test([0; 8192])));
}
```
Expected: 8200
Actual: __16392__
When passing in futures, the future size can grow exponentially ([playground](https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=02a3e685de2b2bf868733e40e56e192b)):
```rust
#![feature(async_await)]
async fn test(_arg: [u8; 8192]) {}
async fn use_future(fut: impl std::future::Future<Output = ()>) {
fut.await
}
fn main() {
println!(
"{}",
std::mem::size_of_val(&use_future(use_future(use_future(use_future(use_future(
use_future(use_future(use_future(use_future(use_future(test(
[0; 8192]
))))))
))))))
);
}
```
Expected: 8236
Actual: __8396796__
I didn't find any note on this. But given how common arguments are used, I think it might be useful if they are included in the optimization.
opened 08:32PM - 04 Feb 26 UTC
T-compiler
C-bug
I-heavy
A-async-await
Trying to get `core::future::Ready<>` handling to be more optimal (this is usefu… l when supplying an async API that is smart enough to, at compile-time, not take the extra overhead if the API isn't actually async)
_Originally posted by @gmarcosb in [#62958](https://github.com/rust-lang/rust/issues/62958#issuecomment-3807931515)_
I tried to optimize with the following tricks in the hope that it would work (see [full usage in commit](https://github.com/project-chip/rs-matter/pull/366/changes/11ab6a1219848812f48d7d6ad818baec2d77280f)); unfortunately, it seems the rust compiler (even in this straightforward case) is still generating all the "cruft" for the state machine:
```
#[inline(always)]
pub const fn extract_ready_check<const B: bool>(_: ReadyCheck<B>) -> bool {
B
}
#[macro_export]
macro_rules! process_maybe_async {
($source:expr) => {
match $source {
fut => {
#[allow(unused_imports)]
// use $crate::dm::{MaybeReady, IsReady, IsNotReady, NotReadyFallback};
use $crate::dm::{MaybeReady, IsReady, IsNotReady};
let is_ready : bool = $crate::dm::extract_ready_check((&&fut).get_check());
// even with check for true vs is_ready, there's still binary size bloat!
if true {
fut.get_ready()
} else {
fut.await
}
}
}
};
}
pub struct ReadyCheck<const B: bool>;
pub trait MaybeReady : core::future::Future {
fn get_ready(self) -> Self::Output;
}
// 1. The General Case: Any Future
impl<F: core::future::Future> MaybeReady for F {
default fn get_ready(self) -> Self::Output {
const {
panic!("This future is not a Ready<T> type!");
}
}
}
// 2. The Specialized Case: Specifically Ready<T>
impl<T> MaybeReady for core::future::Ready<T> {
fn get_ready(self) -> T {
self.into_inner()
}
}
pub trait IsReady<T> {
#[inline(always)]
fn get_check(&self) -> ReadyCheck<true> { ReadyCheck }
}
impl<T> IsReady<T> for &&core::future::Ready<T> {}
pub trait IsNotReady<T> {
fn get_check(&self) -> ReadyCheck<false> { ReadyCheck }
}
impl<T, F: core::future::Future<Output = T>> IsNotReady<T> for &F {}
```
I then have:
```
#[inline(always)]
fn fn_1(o : &SomeObject) -> core::future::Ready<String> {
core::future::ready(o.do_stuff())
}
#[inline(always)]
fn fn_2(o : &SomeObject, s: String) -> core::future::Ready<String> {
core::future::ready(o.do_other(s))
}
```
I would expect **A**:
```
async fn my_example_a(o : &SomeObject) -> String {
let s = process_maybe_async!(fn_1(o));
process_maybe_async!(fn_2(o, s))
}
```
To result in the **exact same binary** as **B**:
```
async fn my_example_b(o : &SomeObject) -> String {
let s = o.do_stuff();
o.do_other(s)
}
```
But it doesn't; instead, it results in ~the same binary bloat (more bloat, surprisingly) as **C**:
```
async fn my_example_c(o : &SomeObject) -> String {
let s = fn_1(o).await;
fn_2(o, s).await
}
```
See full compiling code [in playground ASM](https://play.rust-lang.org/?version=nightly&mode=release&edition=2024&gist=868616cb1526de786b7f47c81899a115), where the binary delta between 3 methods is clear
Of course it would be nice if A, B, and C all resulted in the exact same compilation; but at a minimum, with all the hints given, A and B should result in the same binary; preferably with `if is_ready` {` as well vs just `if true {
I've also brought this up in discourse: https://internals.rust-lang.org/t/async-await-optimizations-could-make-language-even-more-powerful/23973
This has been brought up previously, too:
I'm just curious if an optimization like this is possible, or if there's some reason why it's theoretically not practical. Consider this code:
use futures::executor::block_on;
async fn foo_async() -> i32 {
return 5;
}
pub fn foo() -> i32 {
return block_on(foo_async());
}
Theoretically this could be a very small program, but if you compile this on Rust playground and look at the assembly or llvm-ir output, you still get a very large program. Are optimizations for this type of thing co…
2 Likes
josh
February 4, 2026, 10:05pm
2
Those are definitely optimizations we'd want to see happen, if someone is up for working on them.
2 Likes
This is a problem that we have been eyeing for a solution. #135527 could be a way to alleviate some of the pains. We have been thinking about cooperating with codegen backend(s) better so that better code could be emitted, when the backend understands coroutine dialect.
For that reason, we are proposing a project goal to enable a survey and experimentation .
2 Likes
I've touched bases with @dingxiangfei2009 (we're colleagues) & we're going to see if I can't help him make some magic & get these optimized, stay tuned!
5 Likes
vague
February 6, 2026, 7:28am
5
1 Like