Unwinding through FFI after Rust 1.33

In https://github.com/rust-lang/rust/issues/58794#issuecomment-471281240, I have formally proposed that we go ahead with the change in https://github.com/rust-lang/rust/pull/55982 for 1.34 and beyond.

2 Likes

This is probably the most relevant/interesting issue currently:

Is anyone working on an RFC for this? @jcranmer’s reply looks like a good starting point to me. I’m happy to start one, but I don’t want to waste my time if someone already has a head start.

2 Likes

I’ve not started working on an RFC, although I think the proposal I made would be a good starting point. It would be a shame to let this die out without coming to any sort of resolution.

I agree your proposal is a great starting point. Since it seems no one else is working on it, I’ll start drafting it. I don’t plan on posting it until the draft is finished, which might take awhile. If anyone wants to participate or review the draft before it’s finished and posted let me know (e.g., I could set up collaborative editing somewhere).

1 Like

Honestly, this problem has always had a solution that works correctly and reliably 100% of the time: write a thin wrapper over the C / C++ / FooLang language that catches exceptions or whatever the error handling mechanism of that language is, and convert that to an error code that can be used on the Rust side.

The issue here is not about allowing people to solve a problem that they can't solve today.

AFAICT, the issue is that some people say that "this is too much work", and "we don't want to implement a binding generator that writes the code for us", and "we don't want to implement support for this in rust-bindgen", etc. therefore the language team and compiler team should add a solution to the language.

This sounds like a really long shot, and IMO, all those approaches are better suited to solve this problem than a language feature.

Nobody has asked or answered the question of "What do we mean by unwinding here?".

Rust panics are ABI incompatible with C SJLJ (SetJmp/LongJmp) and with C++ exceptions (and C++ exceptions are incompatible with C's SJLJ).

That is, if we say that panicking and catching panics across the FFI boundary is not UB, but implementation defined, then the implementation needs to specify per target an ABI for panicking through FFI, and if the code on both sides of the FFI does not adhere to it, then the behavior is undefined.

We can translate Rust panics into FFI by catching them on every FFI call, and we can catch all panics from FFI and translate them to Rust panics.

Yet if we set the unwinding ABI to match C++'s, C code using SJLJ would still be UB, and AFAICT, even if we set the unwinding ABI to match SJLJ then interfacing with C code using it will often be UB (e.g. SJLJ would require calling setjmp on the rust side and implicitly passing a jump context to C, but is this context passed as the first function argument? the last one? that's fixed by the ABI, yet every C library does this differently because they can).

And this is without mentioning the overhead this would add to FFI calls that are not #[no_unwind], which probably isn't acceptable. To remove the overhead Rust panic ABI would need to match that of the FFI, but then it either matches C++ or C, and is incompatible with the other. We probably can't match neither C++ or C because that would be a backwards incompatible change requiring tweaking destruction order or even allowing panics to skip destructors.

IMO even if we make it implementation-defined, the most reasonable default for an implementation would be to abort if Rust panics into FFI, to force the user to explicitly convert panics to the ABI of the language that they are interfacing with, and to remove the overhead for most FFI function calls which don't need to handle panics.

From panics into Rust, an implementation would probably say that unwinding into Rust is undefined behavior on that implementation, because there is no way to tell from the Rust side whether a FFI function will panic, and if so, from which programming language. For example, Rust calls into C without passing a jump context so the compiler assumes the C FFI can't panic, yet C calls C++, which throws where the C and C++ compiler, which are the same, cooperate to make that work in case C++ was calling the C code. So now we need to specify the C++ exception ABI for a C FFI function.

@BatmanAoD

So I believe that this could be well-defined with any of the major toolchains.

AFAIK there are multiple incompatible unwinding ABIs on Windows (SJLJ and SEH). When targeting e.g. the msvc targets which use SEH a C library can still use SJLJ (SetJmpLongJmp) for unwinding since those are standard C APIs. If we assume the unwinding ABI of the target to be SEH, which is what we should do on that target, the problem that's being discussed here would still not work because the C libraries use SJLJ instead.

@RalfJung

But instead people are sad because their crates don’t work any more, and then language designers are sad because they broke code.

A year and a half ago GCC stopped zero-initializing uninitialized variables, and people with programs relying on uninitialized memory being zeroed complained that now their code which was working correctly in the only platform and toolchain that they cared about was now broken.

This exact same thing is happening here. Some programs were invoking undefined behavior by unwinding across FFI boundaries and were appearing to work properly on some targets. A compiler change which catches this issue was then introduced, the migration path is clear, yet users argue that they shouldn't have to do anything.

There are many options available to users here. Working out the details about unwinding across FFI, platform unwinding ABIs, converting panics, exceptions, SJLJ across unwinding ABIs, adding controls to tweak that, etc. to the language, sounds like too much work for something that could be solved with a binding generator.

If the argument is that the binding generator needs to generate C code, then if we allow calling setjmp / longjmp from Rust, rust-bindgen could be extended to generate Rust code that handles these for the users, probably with some tuning to match the differences in ABI across the different libraries using SJLJ, and people would be able to write macros that do this for them if they don't want to use bindgen.

6 Likes

This is not accurate. The library in question never uses setjmp and is completely agnostic to the unwinding method. I do not propose to support setjmp. In fact, I'm strongly opposed to having any form of setjmp compatibility in Rust.

I specifically wanted to narrow scope of this issue to only catch_unwind + panic! pair, to exclude complication from any cross-language interoperability (beyond bare minimum of skipping over their stack frames).

1 Like

I am confused here. This sounds like the kind of stuff that would be needed to turn a C SJLJ into a Rust panic, or a Rust panic into a C++ exception, or so. That is not what is being discussed here.

What we are talking about here is Rust code calling C code calling Rust code, and the inner Rust code throwing a panic that the outer Rust code can handle. The C code just needs to let the panic "pass through". From my reading of what is described above, this is okay if the C code is compiled with the right flag. This is confirmed by the similar case of having C++ (instead of Rust) on both ends of this on StackOverflow:

compiling the C code with -fexceptions will ensure the run-time remains consistent when an exception is thrown through C code

There are more caveats (the C code might leak or so), but in principle this seems to be workable. I assume other compilers have similar flags.

1 Like

So if Rust does not call setjmp, how can the C library unwind into Rust ?

By calling a Rust function that calls panic!().

What happens if that C code calls C++ code that calls C code that calls Rust that panic? Do we unwind across C++ calling destructors?

In case of libjpeg and libpng that’s out of scope — never happens.

Then the proposed attribute should be called #[libjpeg_unwind] and not #[unwind].

1 Like

#[c_unwind] works for me.

2 Likes

I’d be strongly against that.

1 Like

AFAIK this only works if we propagate Rust panics as C++ exceptions, and the C code was compiled with C++ exception support, and then catch C++ exceptions and convert them back to Rust panics because the Rust panic implementation does not need to match the C++ exception implementation.

3 Likes

(emphasis mine)

(idem)

For reference, here is a CraneLift issue hinting at such an alternative implementation for panics. Brief summary: a compiler can define the ABI of Rust functions to have two return addresses, one for the normal returns (as is usual) and one for panic!s (this one is new).

A Rust compiler implementing the above would have no trouble calling or defining extern "C" functions, except for panics out of the extern fns it defines. And as far as I know, a C library is unlikely to support being unwound through such a calling convention.

4 Likes

That cranelift issue is interesting. I’d seen it mentioned that Rust wants the freedom to tweak unwinding independently of other language implementations, so it’s nice seeing an example of how that freedom would actually be utilized.

1 Like

You can link against a DLL compiled by any toolchain, independent of which toolchain you're using. So even if you're using msvc, the DLL you're calling into may have been produced by a mingw toolchain using dwarf exception handling. That said, all native code on windows has SEH unwinding tables because Windows itself needs them, and so SEH unwinding will always unwind correctly through any native code, but whether destructors will fire is determined by the SEH handler that is used. Rust uses the same handler that VC++ does, so if unwinding was allowed across the C++ Rust boundary, C++ exceptions would trigger Rust destructors and Rust panics would trigger C++ destructors. Also setjmp and longjmp in VC++ use the SEH exception stuff, they're not some separate unwinding mechanism.

7 Likes

Background: I have a program (repo), and it’s normally run by compiling the lib into a dynamic library that exports C ABI functions, and then the binary loads up said dynamic library and makes calls into it. So we have Rust code calling other Rust code through the C ABI via a dynamic library.

Question: If the dynamic lib panics back up into the binary, is it UB or not?

EDIT: After discussion with @gnzlbg on Discord it seems like the probable answer is “that’s UB don’t do that”, but anyone should feel free to add more info if they have more info.

2 Likes