What shall Sync mean across an .await?

We recently tripped over the Sync requirement for the argument passed to hyper’s Body::wrap_stream constructor: normally, there’s no reason for Stream to be Sync. A Stream describes a task to be submitted to an executor which will then poll until finished, possibly from different threads, but never concurrently. In contrast, the Body for an HTTP response in hyper needs to be Sync due to async/await: awaiting inside a match block captures borrow too eagerly · Issue #57017 · rust-lang/rust · GitHub and related behavior.

To make the long story short: handling &Body in an async fn requires them to be Send if held over an .await, which requires Body to be Sync, which in turn requires the wrapped Stream to be Sync.

From the documentation:

Send

The Send trait indicates that a value of this type is safe to send from one thread to another.

Sync

The Sync trait indicates that a value of this type is safe to share between multiple threads.

These make perfect sense and enable Rust’s fearless concurrency, a very important and beneficial feature as I see it (as you may know I’m one of the main authors of Akka and I wish the JVM or the languages atop offered affine types!).

With the introduction of async/await, some usage patterns have manifested that warrant a discussion on the exact meaning of the above traits — please let me know if this discussion has already happened elsewhere and I’ll read up on it.

The issue

While Sync means unrestricted sharing of a value between threads, the “sharing” over the lifetime of an async fn body is of a very particular nature: there will be no concurrent access to the value (barring bugs in the underlying async runtime implementation).

In practice I see two ways for users to deal with this right now:

  1. make the referenced object Sync by changing its internal structure, e.g. wrapping values in mutexes
  2. unsafe impl Sync on the grounds that in the case above Stream requires a &mut self in its poll_next function, thereby requiring the caller to guarantee mutual exclusion

Neither of them feels right: the mutex is not actually needed and has a performance overhead (mostly by being an optimization barrier to both the compiler and the CPU), and unsafe should not be required by any end-user code, it should be purely opt-in if you want to optimize something.

So, is there something else that we can do?

Tangential point

&mut is named after “mutation”, but its type-system function is to guarantee exclusivity. Exclusivity is also required for non-mutating access to a shared data structure in order to guarantee consistency, which is the reason for having to take the mutex for just reading from the structure — calling it &mut in the syntax is a bit more confusing than calling it &excl but I think that ship has sailed very long ago :wink:

Why I’m bringing it up is that the &mut fixes the Sync issue for Stream, at least to some degree, but not all users will understand this point without help.

5 Likes

I’d be not too sure if it is impossible to create concurrent access with “safe” code. I’m thinking about copying a reference out of the future into thread-local storage or something, but that doesn’t really work with the lifetimes. Maybe it is safe in some sense... hmm...

This is an interesting idea. It just inspired me to do a wrapper type about which I’m curious if it is:

(a) safe, and

(b) already implemented in some crate.

This wrapper would, I believe, remove the need of using unsafe directly, if you wrap the problematic fields inside the Stream. Or you could even have the whole stream be contained inside such a wrapper (and then perhaps wrapped again for the Stream instance... or even make the wrapper #[fundamental] or something):

(playground)

#[repr(transparent)]
pub struct SyncWrapper<T: ?Sized> (T);

impl<T> SyncWrapper<T> {
    pub fn new(t: T) -> SyncWrapper<T> { SyncWrapper(t) }
    pub fn into_inner(self) -> T { self.0 }
}

impl<T: ?Sized> SyncWrapper<T> {
    pub fn get_mut(&mut self) -> &mut T { &mut self.0 }
}

// reason for safety: an immutable reference to SyncWrapper<T> is worthless,
// hence it’s safe to share such a reference between threads
unsafe impl<T: ?Sized> Sync for SyncWrapper<T> {}
2 Likes

It is sound: there is indeed literally nothing you can do with an &SyncWrapper.

But I fail to see how this UnusableIfShared wrapper could help resolve this .await-point-crossing problem where using a classical &mut wouldn't?

1 Like

It may be a solution to the more concrete problem @rkuhn mentioned. As far as I understand, he wants to implement a Stream to use in hyper::Body::wrap_stream, which requires the Stream to be Sync, even though the main (most important) method of Stream takes &mut self anyways.

Yes, indeed. And it may well be that this wrapper helps us in this case.

My question is more generic, though: is Sync really the right abstraction to use for references that cross .await boundaries? Sure, Sync is sufficient, but I have a strong feeling that it is not necessary — something less heavyweight might suffice. The price would probably be the addition of a new abstraction, which may well be deemed too high. Which is why I asked whether this has already been weighed earlier.

Perhaps a concrete example of references crossing .await where Sync appears to be too restrictive might help. With such an example, one could then try to answer the question: How could the compiler know that all of the potential aliases of this reference are all contained inside the same Future so that it’s safe for the Future to be transferred to another thread?

3 Likes

More concretely, what things are you talking about that might be broken if they are accessed from another task context on the same thread? (Send and Sync are designed for access between different threads, and in an async context, they only make sense if the underlying executor is multithreaded.)

Yes, but also, only multithreaded executors require Future + Send, right?

Note that there's no guarantee that an &Body is uniquely referenced even on entry to the async fn. Even if the function itself never hands out copies of the reference, the caller could have done so before calling it. And if Body is Sync, those references could escape to other threads. Various examples:

  • The &Body could come from an Arc<Body> owned by some caller async fn, in which case anyone could have a reference.
  • It could come from synchronous code that block_on'd the future, in which case that code could have also used a scoped thread construct to hand references to another thread, even without Rc. (You can't soundly do scoped threads within an async 'call stack', which is why I mention synchronous code.)
  • It could be a 'static reference.

Okay, but what if it wasn't Sync? What if we had some trait that somehow allowed a type to be sent across threads only in the specific case where a single async call stack is being migrated from one threads to another? Well, that still wouldn't be good enough, because after that migration, there could be additional references from the original thread, e.g. from a reference stashed in thread-local storage.

Basically, we want to treat async call stacks like threads, but the language doesn't guarantee that they're well-behaved enough for that to be valid.

4 Likes

What you’re saying is correct, there are many cases where the compiler will pessimistically have to assume that references are sent to other threads without further (formalized) restrictions — in this whole thread I always assume multithreaded executors because that is the main purpose of using Futures as far as I can see.

The core of the issue I would like to raise is that the current async/await mechanism is brittle, too brittle to be intuitively understood and used — I only raised this because in our team we have several smart people working with async Rust since about a year and they routinely get held up by surprising or inscrutable compiler errors.

The one example that prompted this thread boils down to that a user of hyper has to make their payload streams Sync — not for fundamental reasons but because some other user of hyper may otherwise run into a corner case of the compiler (https://github.com/hyperium/hyper/issues/2159).

As far as I can see from this thread, the current approach is that an async body uses some special treatment to send references across .await boundaries without having to require them to be Send (and thus have the referenced value be Sync). I tried without success to find other cases or to understand the logic, but all my attempts compiled successfully despite deliberately holding references across .await boundaries. This is confusing in itself, because it violates the stated intent of what the Sync marker trait should mean.

Put differently, a foundational tool (like a programming language or a core library) should work predictably. This does not only mean that it follows a set of rules, it also means that these rules need to be simple enough to be understood and applied by a human. A compiler that tries to be clever on the behalf of the programmer is more frustrating and time-consuming than one that is stricter but simple. Users will then build complex things from simple principles.

Maybe I am wrong, and maybe I don’t yet know the full picture, in fact I very much assume this to be the case, so please help me fill in the gaps while responding to my criticism — I very much want Rust to succeed.

2 Likes

@steffahn The hyper issue I just linked to is exactly one such example: the Body is owned by the function scope, but still the compiler demands it to be Sync.

Minimal repro:

fn example (not_sync: ::core::cell::Cell<usize>) -> impl Send
{
    async move {
        match not_sync.get() {
            | n => {
                async {}.await;
            },
        }
    }
}

fails.

This is the main strawman / flaw with the current design. There is a &not_sync temporary used to call .get() on it, and since the temporary is created in an expression that is evaluated for a match, it is only dropped ad the end of the whole match, beyond the .await point.

This does thus not seem to be as much of an issue with Sync as it is an issue with temporaries: in an ideal world, we could imagine some form of NLL-like reasoning that would say: there is a borrow but it is not being used anymore, so let's ignore it Sync-wise. But it doesn't really seem possible to implement it for "trait properties".

macro_rules! no_temps {(
    $expr:expr
) => ({
    let it = $expr;
    it
})}

fn example (not_sync: ::core::cell::Cell<usize>) -> impl Send
{
    async move {
        match no_temps!(not_sync.get()) {
            | n => {
                async {}.await;
            },
        }
    }
}

for instance, compiles fine.

(Also, in this case the UnusableIfShared wrapper does not seem useful, hence my question, @steffahn)

4 Likes

(I for one am not disagreeing with your criticism so much as noting difficulties in fixing it.)

1 Like

I think @rkuhn is talking about the more general case where the reference is used after the await. In that case, accessing the object without it being Sync is unsound in general. But if we could somehow prove that the object in question uniquely belongs to this async 'thread' (task, chain of futures, whatever you want to call it) – in other words, that if the current async fn is sent to another thread, all other code accessing the object will also be sent to that same thread – then it would be sound.

2 Likes

Yes, that is part of what I’m asking. The bigger question is whether this “we could detect this and it would be sound” is really the right approach — it makes the compiler response to any given code less predictable. I’d prefer a feature that always works but is less powerful over one that works most of the time.

If someone can prove (or at least convince themself) that it is possible to make it always work, then that’s good; current behavior is then just qualified as a bug and needs to be fixed. But if after fixing all bugs there are still situations where a subtle local change to the structure of my code requires something to be Sync then the developer experience suffers.


A related question to demonstrate the other half of what I call intuitive: why does the following compile? Does it not share a reference with other threads by passing it over .await?

fn do_stuff3(
    stream: &'static mut (impl Stream<Item = u128> + Send + Unpin),
) -> impl Future<Output = Vec<u128>> + Send {
    async move {
        let x1 = stream.next().await;
        let x2 = stream.next().await;
        [x1, x2].iter().filter_map(|x| *x).collect()
    }
}

No, it does not create a shared reference across await points. (Note that an exclusive reference &mut T is Send if T: Send)

I think the near to medium term the best solution is for everyone writing async code that is expected to run on a multithreaded executor to always use Sync primitives. This is not a very restrictive limitation in my opinion: use Arc, use futures::Mutex (which you would want to use anyway), don't use Rc, RefCell, Cell, or std::Mutex.

Unfortunately, some libraries have - out of ignorance of this issue or mistake - implemented non-Sync types in their APIs. This will have to be fixed.

In the longer term, I'm not very optimistic that making this analysis more precise will ever be implemented, but I'll make this note:

This problem has a fundamental analogy to the lifetime problems that led us to the current async/await system. Previously, a future containing references of a lifetime 'a can also not exceed that lifetime: now the lifetimes are "closed over" to create futures that are 'static, unless the references came in externally. This problem stems from a desire to similarly "close over" internal non-Sync data. Of course the fact that lifetimes have a built-in escape analysis (fundamentally they are a tool for escape analysis) contributed to making the solution for lifetimes much more straightforward than it would be for the Sync trait.

4 Likes

This is incorrect, so all of the philosophical comments that follow (which in abstract I agree with) are not very relevant. It's really frustrating when our tools work in ways we don't understand, so I empathize! But there's no special treatment or any such thing going on in relation to Send and Sync with async/await. I'll try to explain:

A hint at the mistake in your understanding is when you write "send references across .await boundaries" - there is no such thing as "sending something across an await boundary." An await boundary represents a point at which a future can pause and yield control, nothing is sent across the boundary.

A multithreaded executor may send that future, while it is paused, to another thread, and resume polling it on that new thread. Hence, when a spawn API requires that a future implement Send, it requires that all of its state is Send. This is true universally - if any type held across an await point is not Send, the future will not be Send. There is no magic here.

But there are also several types in std which are Send but not Sync - and if a type is not Sync, shared references to it are not Send. Therefore it is perfectly allowed to hold these types by value across await points, but not allowed to hold references to them across await points. That distinction is probably the root of what seems like inconsistent behavior.

My recommendation is just not to use types that don't implement both Send and Sync in async functions, period. Then refactors can never cause these kinds of errors.

7 Likes

Thanks for clearing up the “no magic” part @withoutboats! Concerning the mechanism of future execution and its corresponding modelling in the type checker there was no discrepancy between what you describe and what I knew from the beginning — I was saying “send across .await” as a short-hand (.await introduces a potential thread boundary, across which values need to be transported). The missing piece was what @RustyYato wrote about exclusive references (thanks!).

Your conclusion and advice is a solid rule to make things work. There are two things missing so that the Rust ecosystem as a whole can implement this rule:

  • everybody needs to know this rule and why it exists
  • library authors need a better way to ensure that they comply with this rule

In particular, it happens quite easily that an async fn returns a Future that is neither Send nor Sync, and this will go unnoticed until someone actually tries to use that function in a multithreaded context. This is witnessed for example in this code; I have seen similar assertions in many places. Solutions could include making Future require Send and Sync, or adding syntax for async fn to declare these constraints.


The downside of the proposed rule is that it overachieves in many cases. We have seen in this thread that Sync is not actually required, rather all values held over .await need to be Send. More important to me than this technical point is the semantic point about asynchronous but not concurrent computation. An async fn is a context that behaves at runtime very similarly to a single-threaded function — there is no concurrency between any of its parts — with the difference that shared references that enter as arguments may now indeed be used concurrently. Demanding that all values used within this context be Sync negates much of the advantage of using the Future abstraction over just programming with threads and locks. The programmer would now need to consider concurrency even in those cases where it cannot happen (and I disagree with your statement that adding a Mutex has basically no cost — the runtime cost may be small, but the cognitive cost in the data structure is not).

Given the current constraints of the Rust compiler, perhaps the following rule gets us closer to the ideal world:

  • all async fn whose result may be run on a multithreaded executor need to return a Future that is Send

This includes all generic libraries and I’d opt to treat all internal utility functions and similar in the same fashion. If my understanding is correct, then this rule requires only those values to be Sync that are actually shared across threads, where sharing may occur extraneously due to how the compiler works. But at least the error message will appear in the lexical context that includes the offending code.

A very heavy-handed way of achieving this would be to add the Send bound directly to the Future trait, but that excludes useful use-cases that do not require multithreading.

A less heavy handed solution would be a linting tool that linted when public futures are not Send, which authors of libraries could use on their CI and while developing. I don't know if clippy has already considered implementing a lint like this.

Of course there are people who want to multiplex futures over a single threaded executor, for whom this lint would not be useful. But there's a pretty clear ecosystem split between those use cases (which mainly target embedded hardware) and libraries which would want to use this lint.

1 Like