Make thread::sleep() panic in async context

Take a look at the generators. That's a more general syntax which backs the current async compiler, and which is aimed at general coroutines. In particular, you can have different types for yield and return, there is no Poll::Pending anywhere in sight (which doesn't make sense for coroutines), and there is also no Waker parameter (which exists specifically to efficiently handle blocking operations). You can also pass arbitrary data into the Generator::resume call, thus the coroutine has bidirectional communication with it its caller. Finally, you don't need any async runtime to deal with generators, they can be easily polled from a simple loop.

I mean, I don't know what exactly happens in your code, but in general it's easy to do using worker threads and channels between them. You should really consider whether you're not pulling in redundant complexity. One legitimate use case for async would be if you have something like an interaction with an embedded interpreter (like Lua or Julia) which cannot be interacted with from other threads.

They didn't say it "makes sense". They said "it generally only makes sense if ...". I read it as "can't stop you from using tokio this way, but at least try to limit the damage". They follow "Putting any network IO on the same runtime would be a pretty bad idea", and I/O is the primary reason to write async code. If you're not doing I/O, then likely you're using async as poor man's coroutines, as I said above.

It's likely that people use tokio this way just because they don't have any significantly better option, can't (or don't want) to write a special-purpose runtime for the coroutine case, and tokio is popular, while better-suited runtimes are not (you may already pull tokio for providing a web API).

That seems limiting. I want communication between different tasks via a channel, not just with the caller. Precisely what async await gives me.

I have thousands of tasks at the same time, with some finishing and new ones being created. I don't want the overhead of creating thousands of threads constantly and cooperative multitasking is perfect for me.

Isn't this what async is all about?

I can only think of one reason to use an async runtime for CPU-bound computations, and I'm pretty sure that is not your use case. If you are every so often placing a yield_now in your CPU-bound function, that allows you to trivially cancel the computation based on an io-event. For example, if you are computing the next best move in a chess engine, you don't know if you may be able to finish some analysis at a specific depth, so you may want to abort if it isn't finished after some amount of time.

Other than that, rayon is definitely what you want. Yeah, tokio has a work stealing threadpool too, but it is just not meant for that.

In what way exactly is tokio a bad runtime for this?

It may well be, I am curious. Maybe I should use or write a different runtime.

That sounds like you need a thread pool. Take a look at rayon's implementation.

I believe that doesn't work. When a thread is stuck waiting for something, rayon is not going to reuse the thread to do some other work in the meantime. Or am I missing something?

You'd have an even worse problem if an async task gets stuck waiting for something. You'd block the entire runtime's thread, and async runtimes typically use either a single thread, or thread per core.

With a thread pool, you can have a thousand simultaneous threads. If one of them is stuck, it shouldn't be a big deal. And didn't you say above that you have no issue of a thread sleeping for a while?

By the way, note that async runtimes also use a thread pool, to implement non-blocking calls of blocking system functions. You may get the same issue of all threads in a pool waiting for their syscall to complete, it's just way less likely.

No no, by "stuck waiting" I mean waiting for data from another task. I await on an async channel. This doesn't block the entire runtime.

I could do that, but it seems unnecessary overhead if I only need as many threads as there are cpus.

Well in some cases, but not if it's a computation that will take a long time, in that case I don't want to waste cpu cycles waiting for that, I'd rather await and have the runtime pick another task to make progress on. If I block the thread, I might even deadlock forever!

FWIW, switchyard is an existing async runtime specifically intended for compute workloads. (I have not yet tried using it for anything.)

1 Like

"Optimal" is a value decision. Rust should never stop someone from implementing bubble sort because quicksort exists.

For you, sure. But for others it might find other uses. See generators/coroutine use cases. And last I checked, neither you, me, nor any other single person here has the power to say what specific Rust features are "for".

2 Likes

For example: to me, the biggest benefit of async is the possibility of select! — that any combination of async operations (multiple event sources) can be combined as alternatives,[1] whereas when using plain threads in a complex application, you often have to create threads just to run specific blocking calls that offer no multiplexing and no timeout (which, in edge cases, might be effectively leaked if the event they are blocking on never happens).

This is orthogonal to whether async offers improved performance — it's about simplicity, composability, and correctness.


My take on the original topic: I think a good tool to introduce would be a lint that flags blocking calls in async. A lint is suitable because it is sometimes correct to ignore it (e.g. for a mutex where all uses are "fast" but still might technically block waiting for each other).

The problem with implementing this is that ideally the set of functions detected would be auto-trait-like: any fn that calls thread::sleep should also count as a blocking function. This makes it a part of function signatures, increasing language complexity in a way we don't have a solution for yet.


  1. (Cancellation safety is of course a problem which async introduces here.) ↩︎

3 Likes

Please don't make normal functions panic unnecessarily. I've used sleep() in the past when testing and trying various things out. Even though it's not something I'd ever do in production, I'd be incredulous if sleep() panicked.

Use a linter or something less intrusive to detect unnecessary sleeps. But don't halt my program.

5 Likes

That's why I specified it should be able to be opted out locally. Outer contexts like latency optimized async runtime turn the flag on, but if you know this specific operation is ok to be blocked, just turn it off locally. But you should only do so if you know the problem and have considered about it enough.

They're clearly functions which are intended to be blocked. Mutext::lock() would be controversal, but I don't think it's expected to be blocked in happy path.

True, it is (thread) global state affecting behavior significantly which is a big downside. I believe its benefit - to make common mistakes visible early - overcome its cost but people may disagree.

I think it's a good example why it should fail early by default but can be opt-out locally with more verbose API.

I have two conerns:

  1. What if std::thread::sleep is buried deep down in the callstack?
  2. What is the migration policy?

First, consider kanal. It provides fast sync and async channels. All its functions go through a lock, that may sleep(see source).

How would one use e.g. try_send method? Does one needs to mess with the global flag around the call? Or is it a responsibility of the library author? What if the author does not care about async (so doesn't correctly raise the flag)?

Second, it is a breaking change to add panic to std::thread::sleep. At minimum it will definitely break kanal tests. Futhermore it might break some private code. So we cannot change what std::thread::sleep does.

We might try to deprecate std::thread::sleep, but it is complicated.

Does it call thread::sleep even when using the async channel?

Yes, it does. To be precise, there is only a single channel type, and SyncSender and AsyncSender are newtypes over shared Internal channel data.

try_send is defined here and implemented here. This function always acquires the mutex, which in turn may sleep under high contention.

That depends on how it got stuck. If it's a rayon blocking call, like waiting for the second half of a join to complete, then yes the thread will go into work-stealing in the meantime. But if it blocks on something outside of rayon, like I/O or thread::sleep, then you're just blocked -- rayon doesn't have any opportunity to preempt that.

Basically in my use case I want to pipeline my computation. I use channels (multi-consumer channels specifically). It might look something like this:

Task n:

  • compute compute
  • let a = output of task n-3 (wait)
  • compute compute
  • let b = output of task n-2 (wait)
  • compute compute
  • let c = output of task n-1 (wait)
  • compute compute
  • broadcast the output of task n

If you want rayon to stay busy during those waits, we would need some kind of rayon-enhanced channel. There's also a deadlock risk though -- if task X blocks and work-steals task Y, which needs X's output, then they'll be in the wrong order on the stack and unable to complete. We don't have a good design to avoid this kind of problem yet...

1 Like

Isn't async exactly the mechanism designed to solve all these problems? There are async-enhanced channels, and async runtime's job is to make sure that those tasks that are not stuck get to run next.

1 Like