Surprising soundness trouble around `PollFn`

(I haven't been able to find that discussion searching all the usual places, I'm positive I saw it somewhere a couple years ago).

There are many more functions in futures than just poll_fn this affects, every function which takes a closure does not structurally pin that closure since it's pointless. As an example there was this previous discussion thread

Will std similarly change all these to inherit their Unpinness from the closure when uplifted (and promise to pretend it's a FnPinMut and can internally pin its moved enclosure)?

(NOT A CONTRIBUTION)

I have mixed feelings about this.

It was always a known fact that its perfectly valid to declare a struct as unconditionally Unpin, and then you cannot pin project through that type. Here you've passed state into a struct you didn't define, which happens to have an unconditional Unpin impl. To me it seems like Pin::new_unchecked is the API that needs clearer documentation here, because you absolutely need to confirm that the closure state has been pinned down if you're going to call Pin::new_unchecked on a variable owned by a closure. This fits with my intuition that its the unsafe APIs that need to clearly document their requirements, rather than the safe APIs documenting that they don't meet the requirements of other unsafe APIs.

That said, I don't know why PollFn has the unconditional Unpin impl and I find it surprising that it does. Was there any discussion of this when it was defined in futures? @cramertj would be the most likely to remember. Is there a pattern that it enables at the expense of being able to call Pin::new_unchecked inside the poll_fn?

EDIT: I guess @Nemo157 provides the answer:

But that's not true is it? Like, yes, they cannot rely on closure state not being moved out (because the closure has unpinned mutable access to that closure state), but they can provide the guarantee that once they are pinned the state will never be moved again except from within the closure, thereby enabling the pattern that @RalfJung posted in the first post. If its really just that it seemed pointless, this strikes me as a grave error.

I guess we have some data for now why this might be a bad idea...

Which std APIs does this affect? The topic you referenced seems to be about stream combinators which are not in std (right)?

This surprise is indeed why I think it needs documentation also on poll_fn. Yes that code is all safe and thus technically not at fault, but given that even futures::join gets this wrong, clearly even async experts sometimes expect the closure body to be pinned, or are not aware that this is a condition they need to check -- so if we can just make sure it is indeed properly pinned, then that oversight ceases to be a soundness problem.

(NOT A CONTRIBUTION)

Yea, on second thought I think it makes sense to make a note in the documentation of any struct with an unconditional Unpin impl (what else does std have, just Box?).

Grepping though the code

  • Box, Rc, Arc, references, raw pointers
  • async_iter::FromIter (still unstable)
  • PollFn
  • Ready

So the only such stable by-value blanket Unpin are Ready and PollFn, and Ready is an entirely inert piece of data so I can't see how that would cause problems.

None yet. My impression was always that a lot of futures was intended to be slowly uplifted into std, it affects pretty much the entire API surface, including things like FutureExt::map that feel likely to be uplifted.

If we had a safe function called add3 that actually adds 2, that would also need careful documentation to mitigate the surprising mismatch of API and behavior. Now that's clearly a ridiculous case, but arguably poll_fn is also violating API expectations (to wit: the future and tokio join implementations), and those are worth documenting, even if the implementation is entirely safe and even if the surprise only affects unsafe code authors. As another example, Vec also documents a bunch of implementation details that are only relevant for unsafe code interacting with Vec, and while Vec does internally use unsafe, that doesn't affect whether it needs such documentation or not.

Yeah, and futures obviously had poll_fn for quite a while -- it's just the first of these unconditional Unpin to be uplifted and then stabilized, and now we see the problems they cause.

My understanding is that the reason these were prototyped in futures and not std is precisely to not stably commit to them prematurely. So now that we found an issue in these APIs, we do have the chance to fix that issue (well, barely -- poll_fn is stable since a week or so, so we need to act quickly before too much code relies on this Unpin). The question is just, do we try to use that opportunity, or is it too late and we just try "mitigation by documentation"?

1 Like

Will PollFn asserting that the inner F is pinned still be sound with a more correct fix for unique references + generators? My understanding is that the current &mut impl !Unpin -> not-unique is an over-approximation. If we get something explicit like AliasableCell for user code to use then the generated closure type will not be using this for its enclosure and so cannot promise to allow structural pinning. (Whereas an explicit FnPinMut trait + a syntax like static move |cx| unsafe { Pin::new_unchecked(&mut future).poll(cx) } will allow the closure desugaring to generate AliasableCell fields).

You still haven't expressed any opinion why you think that linting against pin of closure captures doesn't solve the problem.

The approach of Rust was always to eliminate the unsoundness, not say "you're holding it wrong" in documentation. I don't get the knee-jerk reaction to change an old API which is misused by the caller in unsafe code.

The fact that this API and its usages have existed for five years, and the unsoundness in some usages was found only once the API was stabilized in std, is evidence that it's not a pressing problem which needs to be solved at all costs.

Yep. I just think it would be good to consider a wider context rather than just the one function that has been uplifted so far.

If we decide to change the definition, does this apply equally to all the functions? Does this cause any ergonomics issues in cases where the closures contain some !Unpin state but don't actually want to pin it?

If we decide to just document, does that mean every API that takes a closure and returns a Future needs to now document that it does not consider the closure pinned? Are there other situations where Future implementing types expose internal fields that they also need to document whether they consider them pinned? (Does having a fn(self: Pin<&mut Self>) -> &mut Field suffice to document this, or do they still need to explicitly tell users not to Pin::new_unchecked)?

Or maybe poll_fn is just special because of it being a very low-level building block for transitioning between async and Poll functions that deserves changing/extra documentation and it doesn't affect the other functions.

If deciding on this takes time then it may still be a good idea to change the bounds now and relax them later after more discussion (as long as the bound is tightened but the docs are not changed to promise structural pinning of the closure).

By the way, if you want to update documentation, then it should be the documentation of Pin, Pin::new_unchecked or pin_mut!, rather than PollFn, because it's a misuse of those unsafe objects.

This fixation on PollFn is appalling. The specific unsoundness in futures, tokio and async-std can be easily fixed without any discussion. Outside of that, there is no reason to believe this API is widely misused. The pinning in closures is much more likely to be misused simply because it affects every closure, and that is what needs a solution.

So far I don't think we should necessarily disallow pinning of closure captures. Rust typically allows people to "do what needs doing" in unsafe code, while still providing the strongest possible guardrails.

If FnMut was a regular trait that could be implemented by users, then I think we would allow people to pin the fields of that type, and I expect there would be little question of which API needs fixing -- yes, poll_fn is safe, but it is violating the usual expectation that by-value types pin their contents (at least I would have that expectation), thus setting up a trap for unsafe code. Sure we could change the docs of all unsafe code to warn against safe code setting such traps, and maybe we should given that there could be similar code on crates.io, but we also should ensure that std is free of traps. The only difference now is that we are talking about FnMut, not any old trait.

Are you saying that there is no legitimate usecase of pinning closure captures so we should just generally declare it as forbidden, lint against it, and mark FnMut: Unpin? That is a position I can understand, and then indeed a lint would be the best solution. But I don't necessarily share that position. I am not myself a user of these APIs, so my intuition might be off here, but I would expect pinning of closure captures to be fine if done correctly -- the exact same stance we also have for other kinds of pinning that involve Pin::new_unchecked.

Please keep this constructive.

If we have a proper UnsafeAlias/AliasableCell then even the current code would be less problematic (as in, there'd be no miscompilation and no Miri warnings -- the UAF is still real, of course, but that involves actually exploiting the PollFn: !Unpin), since Rust wouldn't rely on Unwind for detecting the presence of self-referential generators.

So the situation will get better when we finally have that, but it remains a footgun.

Ah yes, with AliasableCell then the lucky_number field in the generator is all that needs marking with it. The f field of PollFn itself is never actually aliased, only the fields inside, so it wouldn't matter that it's going Pin<&mut PollFn> -> &mut impl FnMut -> Pin<&mut impl Future>; it would be sound as long as the future isn't moved.

That's the basic tenet of Rust's safety: unsafe code can never trust safe code, safe code can do whatever the safe API allows. That's what "soundness" is about, and it's quite shocking to hear you, of all people, propose that we should encourage unsound code, and that the safe code must bend over backwards to uphold unwritten expectations. You may change the std, but you can't change all other crates, which reasonably expect that the unsafe code is sound. It looks like a "not our problem" approach.

There are legitimate use cases, but they admit easy sound workarounds.

If the captures are borrowed, then I can't imagine a situation where pinning them outside the closure and capturing Pin<&mut T> instead of &mut T could be restrictive. This is also the case which is required for the join! macro.

If the capture is moved, then demanding an external pin is somewhat more restrictive, since you can't just pin it to the stack during execution. You may also require more space in the closure, if you're trying to pin a value of size larger than size_of::<usize>(). I don't consider it a significant enough restriction to warrant unsound code. The size specifically is not an issue, since async wastes way more space.

The workaround is also simple. You either pin to the heap, or use the trick similar to the suggestion by @pcpthm :

let mut pinned = Box::pin(async move {
    // this is just `pin_mut!(trouble)`
    let trouble = unsafe { Pin::new_unchecked(&mut trouble) };
    std::future::poll_fn(move |cx| trouble.poll(cx)).await
});

You are no worse than with PollFn<F>: !Unpin. I also don't think that the case of pinning to the heap (or an arena) being unacceptable is a common one. Most users of async can just Box::pin(trouble) and be done with it.


A use case which isn't covered by the above is when you replace the capture within the closure depending on some condition, and then pin it. Technically, it's possible to do both mutation and pinning of the same value within the closure in a safe way, depending on the values of other captures and some analysis of runtime behaviour. I do not consider it a legitimate use case, because it is brittle. It also can't be safely expressed with FnPinMut closures, because you would have to move a pinned value. It's very unsafe no matter how you deal with closures, and it's better to mutate the value via Pin::get_unchecked_mut, which explicitly signals fishy behaviour, rather than the current status quo where you could hide the brittle unsafety behind an unassuming pin_mut!.

(NOT A CONTRIBUTION)

I think there's an interesting issue here that is sort of implied by the discussion around closures. Assuming PollFn had the conditional impl, is that a guarantee that a user can rely on to go from a mutable reference from their closure state to a pinned reference?

In general, it is not considered a breaking change to go from impl<T: Trait> Trait for Wrapper<T> to impl<T> Trait for Wrapper<T>. Let's say that Rust had stabilised PollFn with the bounded impl of Unpin instead of the unconditional impl, but someone (using the same reasoning that went into the decision now) proposed to unrestrict this impl. This would be a breaking change if users were relying on the absence of the Unpin impl in the way Ralf did in the original post in this thread. In other words: when can Pin::new_unchecked rely on the absence of an Unpin impl in a third party library?

This is actually the same issue that caused the soundness issue in Pin in 2020: the safe Pin functions relied on the fact that &T: !DerefMut and &mut T: !Clone, but due to the fundamental attribute these facts were not actually guaranteed by Rust even though references are "local types" to std. We solved this then by added explicit negative impls and I remember even saying at the time that unsafe code should not rely on the absence of impls, only on the presence of positive negative proofs in the forms of negative impls.

So Ralf's original code was wrong not only because PollFn has the blanket impl, but also because it does not explicitly have a negative blanket impl. I think the solution here that's consistent with what was decided around the pin unsoundness issue would be to say that using Pin::new_unchecked to pin project from a third party wrapper requires an explicit guarantee of not-Unpin in the form of a negative blanket impl. I think this feature isn't stable still, which is its own problem.

I don't recall how specialization and negative impls interacted (and don't know if theres been any changes in this area) but I think we intended to allow impl<T: Unpin> Unpin for Wrapper<T> and impl<T> !Unpin for Wrapper<T> to work; if not that introduces another complication here.

6 Likes

No, that's just not right. You have fundamentally misunderstood something about unsafe. See here for a related recent discussion.

Whenever unsafe code imports some data structure, it trusts the implementation of that data structure to actually correctly implement a list/queue/stack/map/whatever. You can add a sorting crate as dependency, and use its sorting function, and rely on it sorting correctly from unsafe code, as long as you control which code will be imported. It makes no difference whatsoever whether that data structure or sorting function is implemented using safe or unsafe code.

Unsafe code totally trust safe code that it explicitly imports and calls. Unsafe code can never trust unknown safe code. If you write code that is generic over a sorting function, then you must not trust it. But if you pick a concrete sorting function, then you can trust it, since there is no way for users of your library to mess that up. That is what soundness means. It would be totally impossible to write sound code if we couldn't trust any safe code.

So, it is perfectly normal for unsafe code to rely on poll_fn behaving in a certain way, as long as poll_fn is documented to behave that way. (Currently it is not documented to behave that way, but there is a fairly reasonable expectation of it behaving that way, and it should be explicitly documented one way or the other.)

I know what soundness is, and I am not encouraging unsound code.

As I have stated numerous times: the existing safe code is sound. I am not saying it "must" bend over backwards in order to ensure soundness. But soundness is not the only thing that makes or breaks an API, and I am arguing above for why poll_fn is a bad API that should be fixed despite it being currently, technically, sound: I am not a fan of breaking people's code on "technically". I am a fan of people writing sound code, and us doing everything in our power to get them to write sound code. And in this case I think it is fairly clear that people are more likely to write sound code if poll_fn pins the closure than if it does not.

I am not sure how you want to hide this with pin_mut; this does not compile.

If we land my PR, then there is an explicit guarantee of pinning. So at that point I would say yes it is a breaking change.

If we just changed the Unpin without changing the docs, then arguably indeed this is not a stable guarantee and could be changed back in the future.

13 Likes

no no no wrong wrong wrong bad bad bad

pin_mut!(trouble) is

let trouble = trouble;
let trouble = unsafe { Pin::new_unchecked(&mut trouble) };

In this case the extra rebinding is not necessary because you wrote async move which already moves trouble out of the containing scope.

But pinning is tricky enough as it is without being outright misleading about pin_mut!. That comment should instead be something along the lines of // SAFETY: trouble is pinned in the async move captures.


So yes, the workaround to an unconditionally Unpin poll_fn is fairly simple:

async { pin!(poll_fn).await }

However, a poll function wanting to pin its Unpin state is far and beyond the common case. Wanting an unconditional Unpin on its captures is the exotic case, and in that case the developer can use the same technique as AssertUnwindSafe: instead of capturing UnpinData, capture Unpinned(UnpinData) with

struct Unpinned<T>(T);
impl<T> Unpin for Unpinned<T> {}
3 Likes

I would hope not using poll_fn + unsafe, we should have safe APIs for code structures like this if they are so common.

poll_fn is a low-level block for implementing futures, which is inherently unsafe. It's basically sugar for an anonymous impl Future type. Most cases should be served by using async rather than implementing polling directly.

1 Like

Using something like pin-project allows safely implementing Future directly. My assumption was that poll_fn was most commonly used with safe Unpin implementations and join! was an outlier; if that’s wrong then it seems like there’s a missing bit of sugar that gives the convenience of poll_fn with the safety of pin-project.