Unwinding through FFI after Rust 1.33

I’ve not started working on an RFC, although I think the proposal I made would be a good starting point. It would be a shame to let this die out without coming to any sort of resolution.

I agree your proposal is a great starting point. Since it seems no one else is working on it, I’ll start drafting it. I don’t plan on posting it until the draft is finished, which might take awhile. If anyone wants to participate or review the draft before it’s finished and posted let me know (e.g., I could set up collaborative editing somewhere).

1 Like

Honestly, this problem has always had a solution that works correctly and reliably 100% of the time: write a thin wrapper over the C / C++ / FooLang language that catches exceptions or whatever the error handling mechanism of that language is, and convert that to an error code that can be used on the Rust side.

The issue here is not about allowing people to solve a problem that they can't solve today.

AFAICT, the issue is that some people say that "this is too much work", and "we don't want to implement a binding generator that writes the code for us", and "we don't want to implement support for this in rust-bindgen", etc. therefore the language team and compiler team should add a solution to the language.

This sounds like a really long shot, and IMO, all those approaches are better suited to solve this problem than a language feature.

Nobody has asked or answered the question of "What do we mean by unwinding here?".

Rust panics are ABI incompatible with C SJLJ (SetJmp/LongJmp) and with C++ exceptions (and C++ exceptions are incompatible with C's SJLJ).

That is, if we say that panicking and catching panics across the FFI boundary is not UB, but implementation defined, then the implementation needs to specify per target an ABI for panicking through FFI, and if the code on both sides of the FFI does not adhere to it, then the behavior is undefined.

We can translate Rust panics into FFI by catching them on every FFI call, and we can catch all panics from FFI and translate them to Rust panics.

Yet if we set the unwinding ABI to match C++'s, C code using SJLJ would still be UB, and AFAICT, even if we set the unwinding ABI to match SJLJ then interfacing with C code using it will often be UB (e.g. SJLJ would require calling setjmp on the rust side and implicitly passing a jump context to C, but is this context passed as the first function argument? the last one? that's fixed by the ABI, yet every C library does this differently because they can).

And this is without mentioning the overhead this would add to FFI calls that are not #[no_unwind], which probably isn't acceptable. To remove the overhead Rust panic ABI would need to match that of the FFI, but then it either matches C++ or C, and is incompatible with the other. We probably can't match neither C++ or C because that would be a backwards incompatible change requiring tweaking destruction order or even allowing panics to skip destructors.

IMO even if we make it implementation-defined, the most reasonable default for an implementation would be to abort if Rust panics into FFI, to force the user to explicitly convert panics to the ABI of the language that they are interfacing with, and to remove the overhead for most FFI function calls which don't need to handle panics.

From panics into Rust, an implementation would probably say that unwinding into Rust is undefined behavior on that implementation, because there is no way to tell from the Rust side whether a FFI function will panic, and if so, from which programming language. For example, Rust calls into C without passing a jump context so the compiler assumes the C FFI can't panic, yet C calls C++, which throws where the C and C++ compiler, which are the same, cooperate to make that work in case C++ was calling the C code. So now we need to specify the C++ exception ABI for a C FFI function.

@BatmanAoD

So I believe that this could be well-defined with any of the major toolchains.

AFAIK there are multiple incompatible unwinding ABIs on Windows (SJLJ and SEH). When targeting e.g. the msvc targets which use SEH a C library can still use SJLJ (SetJmpLongJmp) for unwinding since those are standard C APIs. If we assume the unwinding ABI of the target to be SEH, which is what we should do on that target, the problem that's being discussed here would still not work because the C libraries use SJLJ instead.

@RalfJung

But instead people are sad because their crates don’t work any more, and then language designers are sad because they broke code.

A year and a half ago GCC stopped zero-initializing uninitialized variables, and people with programs relying on uninitialized memory being zeroed complained that now their code which was working correctly in the only platform and toolchain that they cared about was now broken.

This exact same thing is happening here. Some programs were invoking undefined behavior by unwinding across FFI boundaries and were appearing to work properly on some targets. A compiler change which catches this issue was then introduced, the migration path is clear, yet users argue that they shouldn't have to do anything.

There are many options available to users here. Working out the details about unwinding across FFI, platform unwinding ABIs, converting panics, exceptions, SJLJ across unwinding ABIs, adding controls to tweak that, etc. to the language, sounds like too much work for something that could be solved with a binding generator.

If the argument is that the binding generator needs to generate C code, then if we allow calling setjmp / longjmp from Rust, rust-bindgen could be extended to generate Rust code that handles these for the users, probably with some tuning to match the differences in ABI across the different libraries using SJLJ, and people would be able to write macros that do this for them if they don't want to use bindgen.

6 Likes

This is not accurate. The library in question never uses setjmp and is completely agnostic to the unwinding method. I do not propose to support setjmp. In fact, I'm strongly opposed to having any form of setjmp compatibility in Rust.

I specifically wanted to narrow scope of this issue to only catch_unwind + panic! pair, to exclude complication from any cross-language interoperability (beyond bare minimum of skipping over their stack frames).

1 Like

I am confused here. This sounds like the kind of stuff that would be needed to turn a C SJLJ into a Rust panic, or a Rust panic into a C++ exception, or so. That is not what is being discussed here.

What we are talking about here is Rust code calling C code calling Rust code, and the inner Rust code throwing a panic that the outer Rust code can handle. The C code just needs to let the panic "pass through". From my reading of what is described above, this is okay if the C code is compiled with the right flag. This is confirmed by the similar case of having C++ (instead of Rust) on both ends of this on StackOverflow:

compiling the C code with -fexceptions will ensure the run-time remains consistent when an exception is thrown through C code

There are more caveats (the C code might leak or so), but in principle this seems to be workable. I assume other compilers have similar flags.

1 Like

So if Rust does not call setjmp, how can the C library unwind into Rust ?

By calling a Rust function that calls panic!().

What happens if that C code calls C++ code that calls C code that calls Rust that panic? Do we unwind across C++ calling destructors?

In case of libjpeg and libpng that’s out of scope — never happens.

Then the proposed attribute should be called #[libjpeg_unwind] and not #[unwind].

1 Like

#[c_unwind] works for me.

2 Likes

I’d be strongly against that.

1 Like

AFAIK this only works if we propagate Rust panics as C++ exceptions, and the C code was compiled with C++ exception support, and then catch C++ exceptions and convert them back to Rust panics because the Rust panic implementation does not need to match the C++ exception implementation.

3 Likes

(emphasis mine)

(idem)

For reference, here is a CraneLift issue hinting at such an alternative implementation for panics. Brief summary: a compiler can define the ABI of Rust functions to have two return addresses, one for the normal returns (as is usual) and one for panic!s (this one is new).

A Rust compiler implementing the above would have no trouble calling or defining extern "C" functions, except for panics out of the extern fns it defines. And as far as I know, a C library is unlikely to support being unwound through such a calling convention.

4 Likes

That cranelift issue is interesting. I’d seen it mentioned that Rust wants the freedom to tweak unwinding independently of other language implementations, so it’s nice seeing an example of how that freedom would actually be utilized.

1 Like

You can link against a DLL compiled by any toolchain, independent of which toolchain you're using. So even if you're using msvc, the DLL you're calling into may have been produced by a mingw toolchain using dwarf exception handling. That said, all native code on windows has SEH unwinding tables because Windows itself needs them, and so SEH unwinding will always unwind correctly through any native code, but whether destructors will fire is determined by the SEH handler that is used. Rust uses the same handler that VC++ does, so if unwinding was allowed across the C++ Rust boundary, C++ exceptions would trigger Rust destructors and Rust panics would trigger C++ destructors. Also setjmp and longjmp in VC++ use the SEH exception stuff, they're not some separate unwinding mechanism.

7 Likes

Background: I have a program (repo), and it’s normally run by compiling the lib into a dynamic library that exports C ABI functions, and then the binary loads up said dynamic library and makes calls into it. So we have Rust code calling other Rust code through the C ABI via a dynamic library.

Question: If the dynamic lib panics back up into the binary, is it UB or not?

EDIT: After discussion with @gnzlbg on Discord it seems like the probable answer is “that’s UB don’t do that”, but anyone should feel free to add more info if they have more info.

2 Likes

@Lokathor I don't think that can ever work "as is" without setting the Rust panicking implementation in stone forever, which is something that we probably don't want to do to allow us to improve the implementation over time.

The problem is, that even if the code at both side of the FFI is Rust, it could be Rust compiled with two different toolchain versions, using incompatible panicking implementations. (Through C FFI, we can't really check that).

On Windows, MSVC breaks ABI compatibility on every release (similar to how Rust releases currently work), so C++ has this same problem: Throwing C++ exceptions across DLL boundaries - Stack Overflow

If you want error-handling which is portable across multiple compilers/compiler versions/compiler settings, either use return codes or OS-provided exceptions (e.g. SEH on Windows)

If you want to make this work, you need to pick the same panicking ABI at both sides of the FFI and stick to it. This is possible in Rust today: catch all panics, and pass them using a stable ABI, and then panic in Rust at the other side.

The stable ABI to use is.. up to you actually. Sure both error-codes and SEH work, but you can also use a pointer to a string (and if the pointer is null, there was no error), or whatever you want. You can also write a proc macro that does the dance for you, but questions like "which error code should it use", "should all panics use the same error code, or different ones", (the same applies to raising structured exceptions), are all pretty much application dependent.

EDIT: Even if we could come up with a general way to translate panics across FFI boundaries for some situations, that won't necessarily make panicking through FFI a good idea in practice. Lots of things that can't be checked need to align for it to work (e.g. what happens if the two Rust crates linked use the same panic implementation but a different memory allocator due to a different toolchain version / linking issue, and one crates tries to free the panic payload of the other crate, etc.).

6 Likes

Can SEH be manually triggered and caught in Rust using the MSVC toolchain? On other platforms, can either GNU or LLVM exceptions be thrown?

Can SEH be manually triggered and caught in Rust using the MSVC toolchain? On other platforms, can either GNU or LLVM exceptions be thrown?

Right now, panic is implemented by doing what the native platform does (the Rust run-time calls C++ code that throws an exception), so if the question is whether this can be done, the answer is yes, that's what panic and catch_unwind do. OTOH, if the question is whether this can be done independently of how the Rust run-time is implemented, AFAICT the answer is no. Unwinding across FFI is UB, so one can't call a C++ function that does this. One could throw and catch an exception inside an inline assembly block, but letting it escape (e.g. just throwing a SEH on windows) would be UB as well (right now destructors would be invoked, but if the Rust panic impl changes, that wouldn't be the case anymore).