Background
In other languages, asynchronous programming is usually implemented via completion callbacks, where when a sub-task finishes, the logical control flow is returned to the direct requester by calling the continuation function of the direct requester. The callback is done either directly (in which case the physical call-stack grows), or deferred by placing it in a task queue so that it can be called in the next event-loop iteration (in which case the physical call-stack "resets"). In either case, the async call-stack is never realized on the physical call-stack. This is the approach used by libuv, Boost.Asio and Javascript. Javascript's Promise is an abstraction of this model, but the underlying principle is the same.
However, Rust's Future follows a different model. When a sub-task finishes, the root task, rather than the direct requester, is awakened. The root task would have to recursively poll its sub-tasks until reaching the actual point of reentrance, realizing the async call-stack onto the physical call-stack in the process. This would incur a cost linear to the depth of the async call-stack.
Example
Here is an example demonstrating the behavior:
use futures::future::{BoxFuture, FutureExt};
use std::time::Instant;
fn recurse(n: isize) -> BoxFuture<'static, isize> {
async move {
tokio::task::yield_now().await;
if n > 0 {
1 + recurse(n - 1).await
} else {
0
}
}
.boxed()
}
async fn run_test(n: isize) {
let start = Instant::now();
recurse(n).await;
println!("{}", (Instant::now() - start).as_micros());
}
#[tokio::main]
async fn main() {
for i in 0..500 {
run_test(i * 10).await;
}
}
And here is a Javascript equivalent:
const { performance } = require('perf_hooks');
const fs = require('fs');
const fd = fs.openSync('perf_node.txt', 'w');
async function recurse(n) {
await Promise.resolve().then(() => {});
if (n > 0) {
return 1 + await recurse(n - 1);
} else {
return 0;
}
}
async function run_test(n) {
const start = performance.now();
await recurse(n);
fs.writeSync(fd, (performance.now() - start) * 1000 + '\n');
}
(async () => {
for (let i = 0; i < 500; ++i) {
await run_test(i * 10);
}
})();
Performance
Wall-time comparison (unit: microseconds):
Notice the quadratic growth in Rust's approach versus the linear growth in Javascript's approach. The Rust example can be fixed by spawning a new root task for each recursive call and immediately joining it. However, it feels like a hack and would probably incur more overhead than directly using a completion callback based Future design. Rust's current future design indeed minimizes dynamic allocation costs, but would inevitably introduce other costs like this. Any thoughts on this?
Thanks.