Surprising soundness trouble around `PollFn`

To back this up: I just spent the last two days of work implementing some Stream combinators, and while it was confusing as all heck to figure out how to actually use Pin in anger, I at least had the guarantee that there was no unsafe in my code due to pin-project, so at least I could assume there was no way for me to screw it up too badly. I wouldn't have even tried if I had to use unsafe here: Pin is simply too unnatural to me.

Hearing there's soundness issues around here is scaring the crap out of me: can I at least get confirmation that so long as I don't write unsafe myself (or, apparently, use known broken join! methods!?) my code is sound?

2 Likes

Please keep this constructive.

It's literally a safe function returning a struct with safe methods. Keep down your hyperbole and don't try to redefine the terminology to suit your point.

It's one of the simplest APIs in Rust, which can be written by almost every person writing Rust. If you argue that something that trivial need jumps through hoops to avoid unsafety, then sorry, but Rust's value proposition doesn't hold. I, and most people writing Rust, fully expect that if we write safe code (and poll_fn is 100% safe), then we can be certain that it won't cause safety issues. poll_fn is way too trivial_ to be a bad API. If you say that it's bad, it means that the basic building blocks of async Rust are bad. Pin is a bad API? Unpin is a bad API? I could even agree with that, but it can't be changed at this point.

But of course the above point is entirely moot, because nobody is writing the kind of code you talk about. Why? Stay tuned!

It's far and beyond not the common case. It's very much a tiny minority. Of that minority, basically everyone does it right and pins the !Unpin capture right outside the closure. There are only two crates which did it wrong: futures and tokio. Specifically, tokio::{join, try_join, select} and futures::{join, try_join}. And I'm quite certain that if they implemented those primitives as functions, rather than macros, then they would do everything right, like dozens of other crates, because those unsafe calls within the closure would stick out like a sore thumb.

Don't believe me? Check it yourself. Grep for \bpoll_fn\s*\( in the crates, excluding comments and string literals.

Here are the results of running the search on a simple project with the following typical dependencies: async-executor, async-io, async-std, blocking, futures, futures-macro, h2, hyper, stdlib, tokio, tonic, tower, tower-http.

This includes the two major async runtimes, and the http protocol. There are 80 total uses of poll_fn (this is a text search, so it includes all implementations of poll_fn). Of these,

  • 54 closures don't use Pin at all!
  • 9 cases where the value is pinned before moving into the closure.
  • 2 cases where we pin inside the closure an owned value which was created inside the closure itself.
  • 11 cases where the captured value is Unpin, and so can be safely pinned with Pin::new.
  • and 5 cases with the UB: the macros in tokio and futures.

Ok, maybe that project isn't that representative? How about something more real-world? Say, tauri? Nah, it uses a subset of those crates. Ok, deno? There are only 9 poll_fn calls in the entire 120,000-line codebase, excluding dependencies. None of those calls uses pinning.

Ok, let's look at Gecko. All poll_fn calls are in third-party dependencies, most of which we already counted above. Still, the count is around

  • 141 closures don't use Pin at all!
  • 4 cases where the value is pinned before moving into the closure.
  • 6 cases where we pin inside the closure an owned value which was created inside the closure itself.
  • 12 cases where the captured value is Unpin, and so can be safely pinned with Pin::new.
  • and 5 familiar errors in tokio and futures.

I may have miscounted a couple of functions, but I'm certain that the only UB-exploiting ones are the already familiar macros. As you see, 84% of function calls don't use any pinning. This is expected if you think why you would even use a poll_fn. In most cases you have some custom polling logic which doesn't involve any pre-existing pollables at all. In a smaller but still large number of cases you want to delegate to some other pollable object (stream, channel etc), which isn't itself a future. And of course those objects are Unpin. Why would you bother dragging along !Unpin data and suffer the ergonomic and safety issues?

And again, basically all people who need to pin a capture have done so soundly. Why wouldn't you use pin! outside the closure, when it's safe and easy to do? The exception - those pesky macros, which somehow have escaped proper audit for 5 years. You want to remove error-prone APIs? Do something with the damn macro system! It's impossible to analyze!

You're saying that it's a low-level API, great. So we can expect to see it more in low-level crates. Let's take everything that lib.rs gives us for the "async runtime" request. Add all crates to an empty project, drop a couple which have irresolvable dependency conflicts. We are left with about 100 crates. Of these, only around 40 ever call poll_fn (again, regardless of where it comes from). There are total 322 poll_fn calls. However, some heavy users were already counted above (e.g. tokio and async_std).

I won't be counting here. I have just looked through all use cases in new crates, and I can say that basically all of them either don't use pinning at all (again, vast majority), or use Pin::new, which is safe. There are just two exceptions: futures_micro and monoio. Both cases have errors in the macros. Both macros are very obviously ripped off the corresponding macros in futures or tokio.

I don't have the infrastructure to analyze the entirety of crates.io, but I'm quite certain that the results will reproduce.

So we have basically one culprit (which likely was tokio, because it's the flagship of async), which was copied without much thought in a couple of places. Everyone else uses the API soundly. Nobody depends on unsound and brittle "not-really-pinning" of the closure's captures.

So who exactly do you expect to protect from themselves with this API break? And why would you encourage unsound code, when nobody depends or wants to depend on it?

4 Likes

Yes, your code is sound.

The problem only starts when you build abstractions that are (intended to be) consumed by unsafe code. This is particularly true for the standard library just because it is going to be more widely used than any other crate.

Nothing around Pin is simple. It took about a week from when Jack Wren first encountered the Miri failure until we figured out it is this Unpin instance that causes the UB. I literally had to write a patch for Miri to help me debug this. This is a super subtle issue.

I have explained this point several times so I have no reason to believe that it will help to explain it another time. Safe code does not cause safety issues. I never meant to imply anything else. But the API design of safe code can make it easier or harder for unsafe code which uses said safe code to avoid safety issues, and that is what we are talking about here. std has always gone out of its way to ensure that it is maximally unlikely that other code, including unsafe code, accidentally misuses its APIs.

You have not yet given a single reason for why that Unpin instance should exist, aside from some dogmatism about how unsafe ought to work (it can exist and it is safe, so it should exist -- I don't know if you meant to say that, but it sometimes sounds like you do) and the desire to break people's code with subtle soundness bugs ("amplify errors") because you technically have the right to do so. Maybe we should have made Unpin an unsafe trait, not because it is technically needed but because then this discussion would be less dependent on the pragmatic nuances of API design in the presence of users writing unsafe code...

(I actually think we should have made Unpin unsafe, so that pin projections could use the presence of T: Unpin to deduce that one can safely pin-project to all fields. Accidentally it probably would have also helped here.)

Rust's value proposition is to avoid UB (that's a big part of it, anyway). The soundness theorem is a key part of that, but beyond this hard dogmatic core is a pragmatic shell of language and API design that attempts to steer people towards sound API usage even in unsafe code. We don't consider everything to "work as intended" just because we can now blame someone's unsafe code for the UB -- we keep thinking how we can ensure that that kind of UB doesn't happen even when unsafe code is being written. Restricting this Unpin instance helps avoid UB. You somehow managed to contort this situation to the extent that you now claim a patch that reduces UB is against Rust's value proposition, and I really don't understand how we got there.

To be clear, I am also totally in favor of having lints that help detect this kind of situation. But given that this problematic code is generated by a proc-macro, the lint would not have fired. So it is clearly not sufficient.

11 Likes

(I don't have an opinion in this debate, but nice job with the thorough research!)

2 Likes

Ok, let me summarize my position.

  • The Unpin instance already exists. It has been stable for more than 2 weeks. You are proposing an API break, and it's a very bad precedent, which requires stronger justification than "someone may accidentally write unsound code".

  • I have demonstrated that basically nobody in the wild does that mistake. So we're weighting a real API break vs a nonexistent mistake. tokio and futures need to be fixed.

  • You and @CAD97 have claimed that pinning !Unpin captures in a closures is somehow a legitimate pattern. I have shown that's not the case.

  • I don't consider leaving pre- and post-conditions in an unsafe block a legitimate pattern. If you have preconditions, your unsafe { } must be wider. If you disagree, make a separate thread.

  • It's not about poll_fn. The same pattern of pinning Unpin captures applies to any other closure, and there are other combinators and self-referential structs which may use it. Any solution which doesn't consider it is an unacceptable hack. I do not consider this pattern valid in the general case, I don't accept it for poll_fn.

This is a surprise to me. I fully expect that at least deny and forbid lints should fire in proc macro expansions.

Indeed, and that you are misrepresenting my justification. The justification is "experts have written unsound code". That is not a hypothetical.

It's clearly not nonexistent, it caused a miscompilation in real-world code written by async experts.

It is interesting that nobody else seems to be pinning inside the poll_fn closure though, so thank you for collecting that data. Now I wonder about the other combinators provided by futures.

"shown"? You have argued that. But this is a matter of opinion, and you have certainly not shown a proof that this pattern is not legitimate, nor could there even be such a proof.

(I think that is in reply to the Zulip thread I linked.) I linked to that Zulip thread because of the discussion of 'unsafe code calling safe code and relying on it behaving in a certain way'. The question about preconditions is, I think, unrelated to this thread so we don't have to discuss it here. (But if you want to make the stdlib follow your style, you'll have to open a PR. For example this unsafe block has a precondition of the capacity being at least one larger than the length. It took me <1min to find that so I am sure it is not the only example. Clearly, what you consider legitimate does not represent community consensus, so you cannot use it as an argument for how the stdlib API should look.)

Indeed all of these combinators should respect the Unpin of their closure field. poll_fn is the only stable one in std so it serves as precedent.

3 Likes

Many lints explicitly ignore macro expansion as those tend to generate warnings at no fault of both the macro writer and the user. In addition these warnings would not be actionable by the user at all and we prefer to not have unactionable lints as those lead to warning fatigue. I don't know if this specific lint would have fired on macro expansions.

5 Likes

I'm saying it's the common case of users of poll_fn that have F: !Unpin. I'm not saying it's the common case of poll_fn, let alone the common case of futures.

Nobody is saying that poll_fn should be pinned when F: Unpin.

In this case, the API you find at fault is Pin::get_unchecked_mut. You're claiming that it's categorically wrong to go Pin<&mut T> -> &mut T -> Pin<&mut T> where the &mut T crosses a safe function boundary (where T: ?Unpin), even if you wrote the safe code.

This is, unfortunately, just not how Pin works, and not how the unsafe contract works. Having &mut T where the T is considered pinned breaks the safety invariant of &mut _. But it is perfectly sound to call upstream safe functions so long as they are documented to not rely on said safety invariant. Similarly, it's fine to call downstream code with a &mut T and guarantee that the reference has been projected from a pin and the callee gets to choose whether the pinning of this T is structural or not.

In fact, this is essentially exactly the case with Drop::drop today. If your type is !Unpin and you construct pinned references to its fields, you get a "falsely unpinned" &mut Self from Drop::drop and must be careful to still respect the pinning guarantee manually. Calling Drop::drop doesn't force the value to be unpinned, it "just" makes implementing the safe function unsafe.

What makes it extra subtle with poll_fn is that the Self type and reference of the captures object is completely hidden. It's for this case that I think poll_fn should be harder to use incorrectly and provide the option to treat captures as pinned so long as the closure captures object is pinned.

7 Likes

I don't know the ins and outs of pinning like you do, but this are my two cents. I had noticed PollFn to be Unpin before, and I had stayed away from it because I didn't want to bother trying to figure out if projecting was sound. This was even though it simplified my code greatly.

I personally believe the Unpin implementation in poll_fn is useless. The API exists for the people that implement futures, not for the people who consume them. Having PollFn be Unpin restricts exactly the people who are suppose to use it. And what for, so I can call Pin::new() on a &mut PollFn? Great, next time I use a Unpin futures only runtime I'll keep it in mind.

I feel async features are developed more erratically than other parts of the Rust project. I can't help mentioning the Sync bound on Context, which I consider to be a huge mistake. It too, limits a useful API for no gain what so ever. And the reason it wasn't fixed, was as well, the very important backwards compatibility promise for a 1 week old API. I believe this religiosity to backwards compatibility, plus the rushed shipping of features is hurting the async ecosystem. Async is obviously hard, and these features should be payed extra attention.

I would hate to see these mistakes remain in rust forever, and I think that most of the people that consume these APIs are experienced enough to appreciate the fix, instead of complaining about the breakage. No beginner rust user goes abound their day sending Contexts and pinning PollFns. But actually, many experienced rust users have mistakenly constructed unsound Contexts and PollFns.

6 Likes

The more I think about it the more I agree with @RalfJung that Unpin should have been unsafe, and we'd save ourselves all these negative implementation hacks.

1 Like

(NOT A CONTRIBUTION)

It means that you can poll a PollFn you don't own and can't guarantee won't be moved again. I don't think thats very useful either, but your sarcastic characterisation of this trade off is not productive.

Marking Unpin unsafe would have done nothing here. I guess Ralf's basic concept is that if it were unsafe it would be considered a breaking change to add an Unpin impl to an already existing generic type (since some user could be depending on the absence of the Unpin impl for correctness), but this impl was added before stabilisation.

Negative impls are not hacks; they are explicit statements of intent by the author that the absence of this impl is a contract you can rely on.


EDIT: I would also like to see the breaking change to Context made, as I've been saying for the last 2 years (privately and then publicly). You can see how much influence this has over the direction of the project, and raising it in this discussion doesn't feel helpful either.

1 Like

Making Unpin unsafe would have made it more clear that impl Unpin is a promise: it promises that this type's invariant is the same regardless of whether it is currently pinned or not. This in turn has consequences for structural pinning -- it basically means pinning is not structural for any field (except possibly Unpin fields where the question is moot).


FWIW discussion is continuing on github, and the libs-api team does not seem inclined to change anything here. :confused:

3 Likes

Moderator notes: Some of the posts here were deleted for being too inflammatory. Please try to be respectful when disagreeing about priorities.

3 Likes

The PR is now FCP-merge, so barring new information the libs team intends to merge the change to make PollFn's Unpin implementation conditional.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.