Make thread::sleep() panic in async context

Calling thread::sleep() in async context is bad, we should fail it by default. Or should we?

It's common for beginners (for the async or the Rust in general) to call some blocking functions in async context and wonder why it's messed up without any clue. Ideally those functions should return Err instantly which describes that this function should not be called in this context, unless the caller explicitly passed some i_know_what_im_doing_now flag. But it's too late to fix those function signatures. The only way remaining to blame caller is panicking.

What I want to propose is:

  1. Expose a new thread local flag from stdlib with some scary named APIs which is off by default.
  2. Declare that modifying this flag is a breaking change.
  3. Make functions which are intended to be blocked panic early with good message if this flag is on. Consider it's not a breaking change.

With this change async runtimes can turn the flag while running its tasks. Some advanced use cases may expliticly turn the flag off during its specific operation.

I know, it sounds pretty dangerous. And that's why I omitted anything subject to bikeshed here. Should we do this? What do you think?

1 Like

use lint, it could raise an warning if you use thread::sleep() in async context.

panic does not solve the problem.

(I know custom lint could solve such problem, but I don't know how to define such custom lint.)

6 Likes

Is this always the case? I'm not sure. I could imagine some sort of co-routines where async isn't used for real "concurrency" but just to yield control flow. In such a context, truly sleeping might be desirable. But I haven't really thought this through.

3 Likes

thread::sleep() isn't in any way worse than just hogging up a thread with a long intensive computation with no opportunity to yield. The latter is even easier. thread::sleep can at least be linted against project-wise using the #![deny(clippy::disallowed_methods)] lint.

1 Like

Lints would definitely improve the situation. It can't prevent every cases as it doesn't perform global analysis, but to prevent basic mistakes is really nice and possibly good enough so we don't need any runtime barriers like I proposed.

Yes, panic does not solve problem. It only make problems visible and debuggable. That's why we recommend to .unwrap() instead of to ignore them and even have compiler warnings for unused Results.

Unfortunately most popular async runtimes are not optimized for this use cases. It is bad for them. If you make another async runtime which is optimized on blocking operations, which would be really nice, you don't need to touch the "don't block" flag. If blocking is needed anyway, which should be considered as advanced use case I mentioned in original post, you can opt-out the flag.

Which also is bad like thread::sleep(). Only reason why thread::sleep() is worse is that it seems too simple, common and innocent. But general purpose async runtimes tend to behave weird on long blocking operations.

What I'm focusing here is beginner experience. I can't expect them to follow instructions more than "run cargo clippy and fix all warnings". Whatever we do for them, it should be turned on by default and can be opt-out for advanced use cases. It would be nice if that lint is turned on by default on async context.

You could elaborate on that? Is it due to the runtime(s)? Or due to their use cases?

What I mean is something like this:

async fn inner() {
    for i in 1..=3 {
        println!("step{i}");
        std::thread::sleep(std::time::Duration::from_millis(1000));
        my_little_runtime::yield_task(format!{"Hello {i}"}).await;
    }
}

async fn outer() {
    inner().await;
}

fn main() {
    my_little_runtime::run(outer(), || {
        // ideally I would somehow get the value passed to `yield_task` as an
        // argument here
        println!("yielded");
    });
}

(Playground)

My point is: Being in an async context doesn't necessarily mean that we're doing green-threading or something like that. Maybe it's just a coroutine with no concurrency at all.

But a runtime can be. Futures in the language where designed around being mostly agnostic of runtimes, and there is no default runtime shipped with std. Making a std function panic should only be done if it is wrong to use in all runtimes, which it isn't. A more accurate design would be an extension point that involves the async runtime in thread::sleep, but I'm curious how one would find out if a function is inside an async context in the first place (to acquire some handle to the runtime)?

True. It's not a fundamental property of the async Rust, just a common property of popular runtimes (for good reasons of course). My original suggestion address this problem by not forcing async runtimes to set the flag. But the lint can't know if the runtime has such property which makes this approach less appealing...

It's mainly because they're userland thread scheduler. OS schedulers do its maintenance works regularly by freezing currently-running thread via interrupt. But userland async schedulers can only do so on .await point. It's worth mentioning that many of them are highly influenced by the Go runtime which can trust its compiler automatically insert await points regularly during compilation.

If/when keyword generics gets off the ground, it would be incredible to have a generic version of thread::sleep which chooses the correct version depending on context. It seems very likely that thread::sleep should continue do what it does (sleep the thread and not panic in any context), but in the case that a generic version exists it can be easily linted towards.

Until then lints are probably the best we're going to get.

Long computation at least does something productive and in some situations may be difficult to avoid. thread::sleep is just a mistake and a waste.

1 Like

How do you plan to detect that a function is in fact called in an async context? Also, if we're doing thread::sleep(), why not File::open? Why not TcpStream::write? Why not UdpSocket::send? The list could go on pretty far.

2 Likes

thread::sleep deschedules the current thread and allows other threads to run. It is an opposite of "waste", unless you're optimizing for latency.

In any case, it is very disruptive when functions panic for any reason. Plenty of code tries to enforce the no-panic approach. The last thing I want is to have perfectly reasonable stdlib functions suddenly start panicking.

3 Likes

It's a waste also in terms of resource usage. It keeps OS threads occupied for pathologically long time in the async runtime, which means that the runtime will need to spawn more OS threads to remain responsive or will be starved of threads that can make progress in async tasks. On busy servers latency spikes/runtime stalls like this can cause work to pile up, which increases memory usage. Starved tasks can time out, which wastes work already done, and when these tasks are retried, it's more wasted effort. Besides, having decent latency in an async runtime is often important too, and lags are disruptive too.

I can think of a few reasonable cases where thread::sleep would be called from an async function. It should definitely not panic!

An async function might call a non-async function that uses short thread::sleep as part of polling something -- coming from another thread, or from a device, etc. That function should definitely not stop working properly just because it's called from an async function further up the chain.

One might be using thread::sleep as a placeholder for a CPU-heavy computation that will be implemented later.

Somebody just learning Rust might be using thread::sleep for experiments to understand how things work.

Someone might want to put a whole single threaded async runtime on hold because there are better things for code to do in a different thread rather than doing the async stuff.

thread::sleep should do what it advertises, not panic.

6 Likes

That's a specific scenario. Busy servers that are latency sensitive have certain constraints. Not all async code is for busy latency sensitive servers with many users. One shouldn't break basic functionality for everybody in order to help with debugging some specific use case.

1 Like

Those are all true, but that's also true for any kind of blocking call, and for any compute-intensive operation. Neither of those should be used inside of an async task, they should be delegated to a thread pool. thread::sleep isn't in any way worse than them.

That's definitely a bug. You would block your async runtime entirely while that function waits for completion. It should be rewritten to use non-blocking calls. That's part of the problem: you need to track not only direct blocking calls inside of async blocks, but also any transitive calls through sync functions.

You shouldn't do CPU-heavy operations inside of async tasks, see above.

Well, it's nice to experiment, but the topic started specifically with someone learning Rust and calling thread::sleep by mistake. It would be nice to avoid that error.

There are valid reasons to stop a single-threaded runtime, but those decisions should be made by the runtime itself, which runs normal sync code. Putting blocking methods inside of async code is a violation of layering and a footgun. That said, the runtime may provide specific "sleep the runtime" calls for specific cases, that just shouldn't be the error-prone thread::sleep.

On that point I agree. Adding panics is an API break, because your users may depend on lack of panics for correctness, and you silently change it from under them. Now, Rust gives precious little control over the lack of panics, but thread::sleep certainly doesn't have valid reasons to change its panic behaviour.

Overall, the issue of misused stdlib APIs is something which should be solved by Clippy lints, not changing the API from under its users.

One specific reason where one may want to make blocking calls in async code is where one uses the current async machinery as a poor man's substitute for proper coroutines, which are unsupported on current Rust. If I use async just as a way to access the state machine compiler, I don't care that the thread blocks (I would use different threads for different tasks anyway).

That said, that use case would certainly require a different runtime than tokio or async-std to run properly. Since the proposal is to give the runtime itself the option to set the panic flag, it is orthogonal.

So what? Maybe I'm OK with blocking my async runtime while the function waits for completion. If that only takes 0.1% of my total computation time, that's perfectly fine.

So you're telling me I have to optimize my code, even if I don't care about the performance of this particular part because it doesn't bother me? That's not how optimization should work. What if that code is from some other library, and it works fine for me -- I have to rewrite the library because you're calling it "a bug"?

Smart people disagree.

1 Like

If you don't care about optimization, then why are you writing async code? Async is basically an optimization for dealing with IO-bound tasks.

Part of the Rust's philosophy is to encourage correct and performant code at the language level. It shouldn't be easier to do a suboptimal thing than an optimal one.

You have linked your question. Which part of that thread specifically is "smart people disagree"?

I mean, you can write all kind of weird code if you really want to. You can write a webserver in Bash, or in a Rust proc macro. Doesn't make it a good idea, or something that the language should support.

2 Likes

As far as I can tell, async implements standard coroutines. I don't understand what is "poor man's" about them.

Because my code is asynchronous -- I want one function to wait for data from another function.

I do care about performance, but that doesn't mean it's a bug if I don't optimize everything. I optimize the parts that are performance bottlenecks.

Alice, who I understand is a core tokio developer, said in that thread that running CPU-heavy computations in a Tokio run-time "makes sense", and that my use case is "reasonable".