Are you thinking about an executor that recognizes when its threads are blocked and spawns new ones or a mechanism to tell the executor if a future is doing heavy work on the CPU?
The first could result in a lot of threads, which effectively just add overhead, so I think it would have to be the latter. Then there is the question of how to handle futures that start out with heavy-IO and as soon as that I/O is completed does heavy CPU usage like in your example.
There might be a need to indicate to the executor whether the code after a .await
is/can be heavy-CPU or is always light on CPU usage. That way the executor wouldn't have to guess when deciding if a future should be polled now (mainly I/O) or wait (because the CPU is already busy).
I think that'd be relevant in the "read lots of files and compute hashes" use-case you mentioned:
Let's say the executor has 100 open futures it can poll and 4 cores/8 threads or so. All futures are waiting for the I/O to complete, 50 of them would continue waiting, the other 50 would start with CPU-heavy computation. If the executor (8 threads for execution) is currently processing 7 heavy-CPU futures (thus having the CPU busy). It can now do one of the following:
- Wait until one of the CPU futures finishes - Thus blocking all other futures like for the UI, which would finish quickly, OR
- Take any of the existing futures and hope it is one for I/O and not another one with heavy CPU usage.
Without knowing the difference between those futures there is no way to choose correctly. So unless I'm mistaken you'd need something like the following (either manually or automatically added by the compiler):
async fn do_something() {
let data1 = read_io().await;
let data2 = read_io().await_then_compute;
// Alternative syntax suggestion (this syntax unfortunately conflicts with futures that return functions)
let data2 = read_io().await(estimated_compute_time_that_follows_or_priority);
let hash = compute_somethingdata1, data2);
}
Which the compiler could convert into a state machine that has a will_do_compute_if_ready() -> usize
function or so, indicating that if this future is polled and ready it will take a while to execute. Maybe with some estimation on how long it thinks it takes.
With this knowledge the executor could look at the futures ready to be polled and decide which one to poll first if the CPU is already busy.
It's like (when there was one big computer for an entire university/company) when you had to tell the computer how long you think something takes, such that it can schedule smaller tasks earlier and stop doing work once it reached its limit. Might make sense to take some ideas of that time, as that was effectively a kind of cooperative multi-tasking, too.