Pre-RFC Local Wakers

Pre-RFC Local Wakers

Currently, wakers in rust are all Send + Sync. This implies that waker implementations must all be thread safe, which forbids useful optimizations in thread per core runtimes.

The following API additions would be proposed in order to add support for local wakers.

LocalWaker

LocalWaker would be a struct analogous to Waker, but without the Send + Sync trait bounds. Just like thread safe wakers it would be constructed from a RawWaker and a RawWakerVTable.

Context

Context would get two additional methods: local_waker() to get a LocalWaker, and set_local_waker(&mut self, waker: &LocalWaker) to set it.

If a local waker isn't set, Context will use the Waker it was given at construction to create a LocalWaker. This way, all runtimes would support local wakers by default, while having the ability to specialize if they want to. Opting out of Waker is also possible if needed, by just panicking on wake().

LocalWake (possibly)

LocalWake would be a trait analogous to Wake, that would use Rc instead of Arc. It would look roughly like this:

pub trait LocalWake {
    fn wake(self: Rc<Self>);
    fn wake_by_ref(self: &Rc<Self>) {
        self.clone().wake();
    }
}
impl<W: LocalWake + 'static> From<Rc<W>> for LocalWaker {
    fn from(waker: Rc<W>) -> LocalWaker { /* .. */ }
}

Drawbacks

Supporting both local wakers and thread safe wakers would likely require two allocations instead of one, which disincentivizes specialization. This could lead runtimes to pick one and stick with that one. However, nothing prevents runtimes to support this behavior to be customized to their user's needs.

Also, if a runtime decides to support local wakers only, then it is going to be incompatible with most futures in the ecosystem, since most would only use waker().

Example

This is a use case example that shows the kinds of things we would be able to do with local wakers, which are just too expensive or too dificult to do with thread safe wakers.

Lets say that we want to implement a join! macro that doesn't poll spuriously. We might want to give each joined future a separate waker, so we can tell which futures were woken.

We might try something like this:

pub struct JoinWaker {
    task_waker: Cell<Option<LocalWaker>>,
    // this tells which futures have been woken
    flags: Cell<u64>,
}

This join macro would be limited to up to 64 futures, since there are only 64 flags. Then we can create an array of raw waker vtables and use them to construct a different waker for each future. Each waker vtable will flip a different flag when woken.

// each raw waker vtable would flag a different bit.
// and they would all wake the task_waker
const JOIN_RAW_WAKER_VTABLES: [RawWakerVTable; 64] = /* .. */; 

It would also be necessary to replace the task_waker on every poll, since we are not guaranteed to always be given the same waker. Therefore, we store it on a Cell, so we can replace it on poll.

If we wanted to make this kind of macro today, we would need to write it like this:

pub struct JoinWaker {
    task_waker: Mutex<Option<Waker>,
    flags: AtomicU64,
}

On every call to wake, and poll we would need to lock the mutex, and the flags now need to be atomic, when they didn't use to be before. If our runtime implementation is of the thread per core architecture, this amount of unnecessary synchronization might be a deal breaker.

Prerequisites

5 Likes

(NOT A CONTRIBUTION)

For executors that only want to support single threaded operation, its a bit awkward to create a Context from a Waker that panics, and then set the LocalWaker. A constructor for Context which takes a LocalWaker but not a Waker would probably be good as well.

I think the drawbacks are overstated also:

This isn't necessarily true. Waker is already an Arc usually If the type has a separate code path that benefits from knowing its on the same thread it was created on, it can use that without creating another allocation. You just increment the ref count and create the local waker from the same RawWaker.

That's the case right now, but a large portion of them could move to using wake_local and if the API were stable there'd be no downside for them. Also, executors and reactors seem to ship together and aren't necessarily compatible with other libraries right now (ie tokio reactors don't work unless you're running on a tokio executor). A library with a single threaded executor only would probably have to ship its own reactor primitives as well, but that's already how other executor libraries tend to work.

1 Like

I like this proposal and would love to see an RFC for it.

For executors that only want to support single threaded operation, its a bit awkward to create a Context from a Waker that panics, and then set the LocalWaker. A constructor for Context which takes a LocalWaker but not a Waker would probably be good as well.

Do you think this should panic on waker() or on wake()?

Also, while thinking about the implementation for local_waker(), I realized Context always returns a reference. Considering the case where a Context is not specialized, I think that the best implementation for this should be to cast the &Waker into a &LocalWaker. The layout for Waker currently looks like this:

#[repr(transparent)]
pub struct Waker {
    waker: RawWaker,
}

If LocaWaker looked the same, then casting the reference should be fine.

A panic when attempting to retrieve the waker from the context seems significantly better; the stack trace will contain the call to .waker(). Whereas waiting until the .wake() to panic means instead whichever worker thread made progress with little way to track from whence the bad waker came.


If a common case is an Arc waker with a fast path for local wakes, it would probably make sense to add a local_wake[_by_ref] to the Wake trait. The one difficulty is that then turning that into a normal waker to create the context from doesn't have a way to communicate those through, at unless those are added to the normal vtable as well.

There's essentially 2 possible ways to structure the context that I see:

  • 1: Store (Option<&Waker>, Option<&LocalWaker>). Wakers can be data-specialized for thread locality. .waker() panics for threadlocked context.
    • 1.b: always have a threadsafe waker, but it's a dummy panicking waker for threadlocked context.
  • 2: Store a single &Waker. Wakers cannot be data-specialized for thread locality. Waker has local_wake methods called through .local_waker(). Threadlocked wakers must still clone() and drop() threadsafe, but panic on normal wake().
    • 2.b: store a flag in the context (or on the waker vtable) to indicate threadlocked wakers, and panic in .waker() if it's set. Threadlocked wakers do not need threadsafe ownership.

Either 2.b or 1.a would be what I would go for, personally. It depends on whether I expect anyone would want to independently track local-optimized waker ownership instead of just being able to optimize the actual wake operation. There's no real difference for local-only wakers.

... Why do I expect someone to make a "quantum rc" which has both a threadsafe part and a threadlocked part.


Completely independently, the new(ish) unstable provide_any API surface seems ideal for the async context, so executors are able to provide arbitrary context without resorting to thread local state. That requires combinators which wrap the waker to keep that context somehow, though... If contexts are providers, a LocalWaker could be 3rd party and then request_refd.

2 Likes

I think one of the questions that a full RFC should answer is whether Context should be mutable and the local waker should be set after creating the Context or whether

  1. There is a new way to build Contexts (a Builder?) which allows to set the newly added optional arguments
  2. Contexts wrap other existing Contexts, and augment them with new properties - as e.g. Golang's Context::With_timeout does. This might be in addition to 1) or a later follow-up
2 Likes

(NOT A CONTRIBUTION)

This is a good point. If you use &mut self methods, a poll method (which gets context by &mut) could change the context in a way that will escape that method; with RawWaker not capturing lifetimes this is a bit of a footgun: imagine something like FuturesUnordered using this to set the waker to its newly constructed waker not considering the escape aspect. Probably &mut self methods are not a good idea here.

1 Like