Unwinding through FFI after Rust 1.33

jsgf · March 6, 2019, 8:22pm

It seems like we’re closer to “implementation defined” territory here. We can say something like “unwind over a Haskell frame on VAX 11/780 is UB, but unwinding over specific C frames on x86 and ARM is OK”.

Tom-Phinney · March 6, 2019, 9:08pm

The problem with such logic is that it might be true for specific code with specific compiler versions, but it is not guaranteed to be true for similar but different code on the same compiler versions, or for the original code on future compiler versions.

UB is not a machine concept; it is a compiler-writer concept that extends the set of optimizations that compilers are permitted to perform. Writing code that future compilers are free to break without notice, because it relies on something that the compiler says is not guaranteed to work, is simply asking for future grief, if not for you then for the unfortunate people who have to maintain your legacy code.

kornel · March 6, 2019, 9:50pm

Indeed, I’m aware I’m playing with fire here. There’s no ideal solution currently:

catch + panic unwinding through FFI is UB.
wrapping each individual call in setjmp in C is annoyingly boilerplate’y (I have to create a mirror of the entire API in C, wrapped, and make Rust side use the alternative API with alternative return values) and has runtime overhead,
calls wrapped in setjmp in Rust would be incredibly hard to use without causing more UB, and in usage typical for libjpeg, it’s pretty much guaranteed to leak memory.
wrapping each individual call in C++ try/catch is the next best thing, as it avoids the runtime overhead of setjmp, but it’s is also boilerplate’y, and AFAIK from C++'s perspective it’s as much UB as Rust’s catch + panic.

I don’t think Rust can make unwinding not UB in general case. But maybe it could be demoted to “implementation-defined” for x86 and ARM, with clang-compiled C.

BatmanAoD · March 6, 2019, 10:59pm

GCC (like Clang) also provides -fexceptions for ensuring that C++ style unwinding through C (or other non-C++) code can be done in a well-defined way.

And in MSVC, longjmp is a stack-unwinding operation, so I believe there's no way that this could fail (without Rust implementing a completely alien stack-unwinding scheme).

I'm having a bit of a harder time figuring out whether the Intel compiler supports this, but these release notes seem to indicate that the -fp option ensures that stack frames can be unwound regardless of language.

So I believe that this could be well-defined with any of the major toolchains.

Centril · March 7, 2019, 12:12am

And also make the language designers, who now (if enough people start relying on UB not being exploited...) have no choice but to define the behavior, sad.

As for "implementation defined behavior", I'd like to distinguish between toolchains and architectures. For the latter it seems unavoidable to have some such implementation defined behavior because hardware (but you can clearly define the semantics for important architectures). For the former, I think this just leads to platform dependent code and I think we should avoid toolchain dependence whether it is wrt. gcc/clang or different Rust implementations.

If a mechanism should be provided I would prefer to provide an attribute that has clear semantics (and which would need unsafe { .. } to uphold required invariants).

BatmanAoD · March 7, 2019, 6:13am

How would the language definition make that distinction? I'm not aware of any other languages that make such a distinction.

Aren't there cases where this is unavoidable, though? And in this particular case, this is a behavior that is expected of low-level languages. Rust is supposed to be empowering; there should not be limitations in the language prohibiting the use of commonly available platform features. Additionally, there won't be multiple "interesting" versions of the behavior; either safe unwinding is supported in non-C++ code, or it isn't.

I'm not sure unsafe is necessary. On platforms that don't support this feature, the automatic panic! on crossing an ffi boundary should be implemented. On platforms that do support this feature, it's only unsafe if any non-Rust code is involved...which is already unsafe.

mcy · March 7, 2019, 2:32pm

I think you'd be hard-pressed to come up with good standardsese to express such a distinction.

I think in a world with competing rust implementations (which, I think, we actively want to have some day), you're likely going to get a gcc/clang situation where each toolchain's non-standard extensions are different enough to be a problem. (I do not believe this problem is avoidable.)

RalfJung · March 8, 2019, 6:17pm

Rust is not a low-level language. There are limitations in the language. This is a direct and necessary consequence of being a heavily optimized language. A key part to Rust being empowering is for it to actually deliver the performance people expect. That is only possible by imposing strict rules about what unsafe code can and cannot do. See this example for where reasoning based on "it is supported by the platform" leads.

It is important that UB is taken seriously. A "laissez faire" attitude to UB does not lead to more empowerment, it leads to more trouble and sadness. For us language designers to be able to do our job, for us to be able to keep developing and advancing the language, we do rely on Rust users to respect UB.

This is a case of a missing feature. Rust does not support unwinding through a C library. You think it should. That's totally fair. So what do we do about this? When a feature is missing from libstd, there's an RFC that proposes adding it. When a feature is missing from the language because of the definition of UB, there's an RFC that changes that definition.
You make it sound like you expect Rust to have all the low-level features you can imagine, without having to add them via an RFC, just because you can write some code that looks like it implements this feature -- but we do not live in a fairy tale world where features just happen like this, we have to actually work together to make them a reality. This is the price we pay for the optimizations we get. There is no shortcut.

I am aware that this is very frustrating, and that I sound very pedantic -- but I believe that this very thread shows that I am right. We have a situation where Rust made a legitimate change that actually turns UB into predictable defined behavior, so you'd think everybody is happy. But instead people are sad because their crates don't work any more, and then language designers are sad because they broke code. This is exactly the expected outcome when UB is not respected.

But of course, respect goes both ways. In cases like this, where there is a legitimate use-case that cannot be implemented with the current definition of UB, we do understand that something needs to be done. And that "something" is specifying the conditions under which panics across FFI boundaries are not UB. Once we have that specified, all Rust code can rely on this working now and for the future, and we promise we'll never add nounwind attributes to FFI functions (if such a thing exists) which might silently break code like the one being discussed here.
(EDIT: Turns out we add an unwind attribute already! So doing unwinding across FFI boundaries currently actually already relies on LLVM not realizing that the attributes are wrong.)

It seems like the spec should be changed from "unwinding across the FFI boundary is UB" to

When unwinding across the FFI boundary, you must ensure that the stack frames that get unwound are created by code that satisfies $CONDITION. Otherwise, this is UB.

where $CONDITION is something about the code being unwind-aware or so, which boils down to something platform-specific... I don't actually know enough about how unwinding works to fill out the blanks here, I am just drafting the general structure I think this could have.

I hope some of the people here will turn this into an RFC

josh · March 8, 2019, 7:07pm

I think $CONDITION here is “C code called from Rust”.

If you’re in Rust, you call C, C calls back into Rust, and that Rust code unwinds, you can unwind through the C back into Rust.

mcy · March 8, 2019, 9:38pm

I think a lot of this comes from a continued belief that C (and languages that appear to be mere sugar over C, like C++ and Rust) is just "a PDP-11 macro assembler with racing stripes". While this was true in the Before Time, for whatever reason this myth persists. I think we have some responsibility in educating users of unsafe Rust on what unsafe, and the risk of UB that comes with it, really means, especially in light of the traditional examples of seemingly innocent UB in C: (you may not assume 2's complement integers; signed overflow; dereferencing 0x0), which, in practice, most of us don't think about.

The scariest bit in the nomicon (without going into the weeds) I could find on UB is What Unsafe Can Do - The Rustonomicon, which really does little to impress upon readers, especially readers coming from managed languages where all such mis-behaviors are trapped, precisely how scary UB is, because it isn't trapped. It might be prudent to improve our intrudoction of UB, preferably with more than "here be dragons!" or similar. Something like this exciting BoringSSL #define comes to mind.

This is all notwithstanding that we live in a world where the program your processor executes is very different from the program your compiler emitted, which seems to be the main thing your link is sad about, from my skim.

jcranmer · March 9, 2019, 4:38am

One of my personal philosophies is that we should start looking beyond C to reason and think about ABI details that include features that are not accessible in C. And unwinding is one component of that ABI. In practice, there are 4 ABIs for unwinding: Itanium, ARM EH (which is essentially Itanium with the formats changed, IIRC), setjmp/longjmp, and Windows SEH. I understand the Itanium ABI the best, so I’ll talk in terms of that, but I think the concepts involved are broadly similar.

From the perspective of the ABI, unwinding defines an exception that’s being thrown during the unwind. This exception has a code that identifies whose exception it is, and how to free it if it’s not your exception. This means we can distinguish between Rust exceptions and foreign exceptions, and we can define UB based on the two kinds. In addition, we also have the issue of the caller and the callee source languages as an orthogonal axis for decision making, so we have the following matrix to fill out: Caller source = {Rust, FFI} × Callee source = {Rust, FFI} × Exception source {Rust, FFI}. We also have to worry about the semantics of core::intrinsics::try, but here we just have to worry about the exception kind.

Right now, (Rust, Rust, Rust) is the only tuple that is well-defined; everything else is undefined behavior. We can define the other tuples, perhaps requiring some attributes to make the choice:

(Rust, Rust, FFI): Call destructors as appropriate (current semantics). We can leave it undefined if people think a future Rust compiler may want custom unwinding.
(Rust, FFI, Rust): Probably the best semantics are UB (i.e., mark the external function as nounwind) unless an #[unwind] attribute is added.
(Rust, FFI, FFI): This is an interesting case because if this tuple is always UB, we don’t have to worry about the other FFI exception object cases. But I think there are use cases for being able to catch and handle FFI exceptions, or at least pass them through. Obviously, FFI exceptions are going to be difficult to expose as anything more complex than an opaque-you’re-on-your-own target blob. Like the (Rust, FFI, Rust) case, they should be nounwind by default unless #[unwind]. I also suspect that core::intrinsics::try shouldn’t attempt to catch to FFI exceptions; there’d have to be a new facility to catch an FFI exception. And use cases such as catching C++ exceptions should be left to extension crates.
(FFI, Rust, Rust): I don’t think there is much harm in letting Rust exceptions escape. I can see concerns about ABI stability, but we can probably say that doing anything with the Rust exception other than rethrowing it or stopping execution is UB. (Itanium ABI outright states this). If we’re concerned about future versions wanting to define their own unwinding method, we can limit this to specially-marked functions.
(FFI, Rust, FFI): The only thing that’s really different from the above is there may want to be a specific way to pass-thru FFI exceptions without catching.
(FFI, FFI, Rust): As elaborated above, UB if you do anything other than continue throwing the exception or catch it and immediately destroy it using the normal process for destroying foreign exceptions. If the exception reaches the top of the stack, UB as well.

So a rough sketch of how you could implement an unwinding ABI in Rust:

Add an attribute #[unwind(native)]. In the absence of this attribute, unwind semantics remain as they are today. If the target cannot guarantee the below semantics, use of this attribute is a compiler error. (Panic mode being abort is a vacuous implementation of the semantics, I think).
If an external function with no #[unwind] attribute is called, and it causes an unwind into Rust, UB.
If an external function with #[unwind] is called, it may cause an exception to propagate into Rust. If the caller is not marked #[unwind(native)], UB. If the caller is marked #[unwind(native)], and the exception was generated by panic! or std::resume::resume_panic, the behavior is as if it were thrown by the caller function (with the observable exception of maybe having extra stack frames).
If a Rust function is called by an external function, and is not marked by #[unwind(native)], then UB.
If a Rust function is called by an external function, and is marked by #[unwind(native)], then a Rust function further up the callstack may legally catch it with catch_unwind (as mentioned above). Any non-Rust code in the middle may only catch-and-destroy or rethrow the exception without modification. If no one catches the exception, abort.
Add a std::ffi::ForeignException struct that encapsulates a foreign exception. Dropping this object causes it to be destroyed. There is also a method fn throw(self) -> !; that causes a rethrow of this object. Obviously, this is !Send and also definitely not UnwindSafe.
Add platform-specific std::os::ForeignExceptionExt that gets access to things like the unwind code or SEH details.
Add a method pub fn catch_foreign_exception<F: FnOnce() -> R + UnwindSafe, R>(f: F) -> Result<R, ForeignException>.
A foreign exception may be thrown from an #[unwind] function into an #[unwind(native)] function. This will cause a foreign exception to be thrown through Rust code. Destructors will run doing this unwind process. However, std::panic::catch_unwind will not catch such an exception. Only catch_foreign_exception may catch it. If this function falls out of Rust code not via an #[unwind(native)] function, then UB.

This proposal makes cross-language unwinding strictly opt-in only. Rust -> C -> Rust and C++ -> Rust -> C++ exception flows would be handled transparently if opted into, and can only be done if the target unwind ABI allows for it. Converting between Rust and C++ or SEH exceptions would be possible but largely left to user crates to handle.

H2CO3 · March 9, 2019, 8:47am

[off]

Yes. As someone who has spent years educating people on various platforms (from Stack Overflow to university intro courses) about C and C++, this is absolutely true, and quite frustrating. Give programmers a language with pointer arithmetic, and they'll instantly think it's legal to do absolutely anything and everything.

The problem is often exacerbated by those who teach these languages, too, as often the existence of such a thing as a C standard isn't even mentioned, and the general attitude is "it kinda-sorta vaguely makes sense based on my simplistic mental model of the machine, therefore it must be correct."

I really do think it's harmful, and that the correct solution is education. (That said, I am not against FFI unwinding, I'm pretty supportive of it and I'm not implying this is OP's case. This is simply a tangential that I felt I needed to mention.)

[/off]

BatmanAoD · March 10, 2019, 12:31am

I'm sorry, but this seems like something of an unhelpful tangent. I'm well aware of that article and have even linked it before on this forum. But there are, of course, different things meant by "low level", and I expected my use of the phrase, in context, merely to indicate the general "class" of languages to which Rust belongs, i.e., C competitors that permit manual memory management of some sort.

That's an example of failure to include the compiler-backend in one's conception of "the platform". I have done precisely the opposite above: I am asserting that all the compiler backends we might reasonably hope to target do in fact support the feature we're discussing.

You and I just had a discussion on this same forum about UB; I would have thought from that discussion you'd understand that I do not support a "laissez faire" attitude. But more importantly, I don't think anything in my post supports the inference of such an attitude. I did not state that the current state of affairs is fine and that users should continue to rely on the UB doing "the right thing". (In fact, I suspect that it already won't work correctly with LLVM in some cases.) I stated that "this could be well-defined".

I'm not sure how you got that impression. But what I do expect is for Rust to be a viable candidate to replace C and C++ for every conceivable use of C and C++. By far, this is what I see as Rust's biggest value proposition as a language. So, yes, it does matter that C++ has this feature.

Sorry, what? I'm against RFCs and think I live in a fairy tale world? It is honestly quite frustrating to be told that I believe things totally contrary what I actually believe (and totally contrary to the established norms of the community) on the basis of posts that in no way indicate such a thing. Please abstain from making claims about authors' beliefs unsupported by their posts.

And yes, I agree that an RFC is necessary. I've never written an RFC before, but I am willing to write one for this feature if I have time to do so before someone else does. I consider my post above about the different compiler backends to be (very) preliminary research for such an RFC.

The specific reason I've become active in this thread is that the Rust team came within a hair's breadth of defining the behavior as an abort regardless of the conditions, thus breaking existing code without providing an alternative. So, yes, we need to specify when it's safe not to abort, but I am not willing to do that until the answer is something other than "never".

If this were true, LLVM and GCC wouldn't support the -fexceptions flag. Without such a flag, you cannot unwind through C code.

scottmcm · March 10, 2019, 1:36am

I think this sentiment is the origin of the "laissez faire" comments -- code dependent on UB is already broken even if it appears to work sometimes, and changing what happens on UB is necessarily something that the implementation can do whenever.

For example, code that transmuted (u8,u16,u8) to [u16;3] was already broken before the layout changes even though it compiled and did what some people probably expected. That wasn't "breaking existing code" either, even though it made code that previously compiled stop compiling.

mcy · March 10, 2019, 3:25am

I suspect Ralf isn’t targeting you specifically with his comment, but rather trying to make a general statement about definedness, which are perhaps known to you but clearly not to the wider audience- if there were, this thread would not be phrased as “Rust introduced a miscompilation”.

As @scottmcm, if one morning Rust decides to make a formerly UB call defined, it is 100% the caller’s, never the callee’s, fault that their code broke. Similarly, breakage from changing codegen due to assumptions like “UB is never ever invoked” is also the caller’s fault. This is the central moral of this thread.

For what its worth, I think the changelog should not have stated that unwinding into another language would abort (much less that it was defined to do so). Changing UB to (implementation-) defined behavior should be dragged through the RFC process (which, from my read of Ralf’s comment, it was not). This change should have kept this behavior UB, and it should not have been mentioned in the changelog, since changing the code generator’s assumptions made around UB are not relevant to end users.

RalfJung · March 10, 2019, 10:31am

I find this is a dangerous and somewhat misleading way of arguing. UB is not defined by what some platform or backend happens to do. There is an import intermediate step here that you are skipping. (FWIW, I am not saying you don't understand this distinction, but you are failing to communicate it and that makes it very easy to misunderstand your statements.)

That example I brought up is an example of failure to realize that Rust programs don't run on x86, they run on an abstract machine. That machine has rules for what you are and are not allowed to do, and violating those rules (violating the "contract" of the abstract machine) is what we call UB. The reason the program is wrong has nothing directly to do with compiler backends or platforms. The program is wrong because it violates the contract of the abstract machine.

Compiler backends and platforms come in when we ask why the abstract machine is defined the way it is -- but they are not the only considerations coming in, and we should be very careful not to fit this definition too tightly to whatever happen to be the current optimizations and platforms. Being able to understand, analyze or test programs on the abstract machine are other considerations coming in here, and they are IMO just as important.

Sorry if this came across wrong, I did not mean to suggest you do. The comment was in reply to yours, but not directed only at you. This entire thread is about a crate that knowingly causes UB and got broken when the compiler's behavior in face of this UB changed. That's the kind of "laiseez faire" attitude I was referring to.

(FWIW, I didn't say anybody was against RFCs, I just suggested some people think an RFC is not necessary because we can just do unwinding across FFI barriers and it just works, even though it "technically" is UB. I guess we all sometimes overinterpret other people's statements.)

That is not what I meant to say, and I am sorry for not communicating more clearly. As mentioned above, I was responding to the entire thread, not just to your post.

What I meant to say is: people (not you specifically!) clearly thought that unwinding across the FFI edge is a feature supported by Rust. However, it is not, there is no RFC specifying it, and the fact that this happens to "work" nevertheless does not magically mean that Rust has a feature that was never specified.

What you are saying here sounds a lot like "there was a working feature and the Rust team wanted to take it away" (and if that is not what you wanted to communicate, I am sorry -- but also please consider why this is the interpretation several people get when reading your posts). But that's not what happened. There was no feature that got removed. The only change is better UB linting being turned on by default for all builds, and we should never be in a situation where that upsets anyone.

Nothing the Rust team did indicated that Rust never should support unwinding across an FFI edge. In fact, there already is an unstable #[unwind] attribute. All that happened is that UB got turned into a safe abort, taking away the possibility to pretend that Rust supports FFI unwinding. This changes nothing about the possibility of actually adding FFI unwinding as a feature, having a way to tell the compiler that this is happening (to suppress the abort-on-panic) and specifying the conditions under which this is allowed.

While I agree in principle, I think it is also important to be emphatic with programmers that want to get something done, and there just is no way to do it without causing UB. While the "right thing to do" is to get the definition of UB changed, that can seem like a insurmountable obstacle. I can understand why people take shortcuts. But I don't have to be happy about it, and I think it is important that we communicate clearly why we think programmers should go through the extra effort of finding or creating a UB-free way to do what they want to do. We can point at this thread as an example.

dhm · March 10, 2019, 12:39pm

Hmm, another thread deriving into meta-questions about UB. I guess it does show there should be an official Rust page expliciting what the conclusion about it is. This way, since this kind of “discussion” will keep arising from time to time, it will be possible to just link to the official Rust stance on it and be done with it, imho.

Now, back to the topic: @jcranmer has made a very detailed post about the possible direction of the needed RFC; what do other unwind-competent people think about it?

@mods since it has actually nothing to do with Rust 1.33, could there be a new Pre-RFC discussion: making some cases of FFI unwinding not UB thread?

Centril · March 10, 2019, 12:46pm

In https://github.com/rust-lang/rust/issues/58794#issuecomment-471281240, I have formally proposed that we go ahead with the change in https://github.com/rust-lang/rust/pull/55982 for 1.34 and beyond.

kornel · March 10, 2019, 1:03pm

This is probably the most relevant/interesting issue currently:

mjbshaw · March 10, 2019, 3:02pm

Is anyone working on an RFC for this? @jcranmer’s reply looks like a good starting point to me. I’m happy to start one, but I don’t want to waste my time if someone already has a head start.

Topic		Replies	Views
Support C APIs designed for safe unwinding Unsafe Code Guidelines	12	2561	March 25, 2019
Unhandled panics in Rust vs. in FFI	11	2563	March 17, 2023
Some thoughts on a less slippery catch_unwind libs	7	1974	October 16, 2022
Pre-RFC: Remove Rust's dependency on Visual Studio in 4 (...complex?) steps compiler	15	4697	September 16, 2022
Pre-RFC: adding a `#[warn(drop_with_ffi)]` lint Unsafe Code Guidelines	7	557	June 5, 2024

Unwinding through FFI after Rust 1.33

Related topics