Modifying compiler (MIR?) to generate unwind cleanup routines for all blocks

TL;DR: I'm looking for ways to approximate asynchronous unwind tables by changing MIR to force the generation of cleanup routines even when no Rust-level panic is possible.

As part of a large research OS project, I have implemented custom unwinding support for our no_std environment that works correctly for all explicit language-level panics, i.e. synchronous unwinding. However, when trying to start unwinding from any other non-panicking part of the code (for example, after a machine fault), there is no unwinding landing pad that covers those instruction pointers, which is expected. The full-fledged correct solution for this is asynchronous unwind tables, which Rust does not support due to LLVM lacking support for it.

Thus, I'm looking for help with an alternate solution: changing the compiler to generate unwind cleanup routines / landing pads for all basic blocks, statements, or even instructions as an attempt to approximate true asynchronous unwinding. This is my first foray into rustc itself so I'm no expert, but so far I have looked into MIR because it appears to be where cleanup routines are handled, such as the no-landing-pads pass.

Here's a code example illustrating the problem:

fn foo() {
  let _b = Box::new(5);
  // Here, unwinding is started via an out-of-band mechanism,
  // but the `_b` Box will not be cleaned up/dropped
  // because no panic is possible in `foo()`.
}

fn main() {
  let _a = Box::new(10);
  foo();
  // Because `foo()` is a `Call`, 
  // the `_a` box will be cleaned up as expected.
}

My first naive attempt was to modify librustc_mir::build::scope::Scopes::may_panic() to always return true, but that was insufficient because the terminator for foo did not have any need for cleanup to begin with. I have played around with mir::build and mir::scope for a while, to no avail.

I believe a potential solution is to change the TerminatorKind of each block such that it includes an unwind cleanup element, but I am not sure how Terminators are actually created in the first place nor how or to what kind I should change that terminator to maintain correctness. I would greatly appreciate any tips on getting started or ideas on what other approaches to try.

=====

Thanks, and apologies if this is the wrong forum; I understand this is a wild idea but was still hoping to get in contact with people that work on drops per the Rustc experts map, such as @arielb1 or anyone else knowledgeable about unwinding.

5 Likes

Judging by your code example, where the issue only comes up in code regions that can't panic, it sounds like requires_uwtable (as mentioned in your first link) may help. Have you tried it?

One significant issue is that unwinding at arbitrary instructions is not necessarily 'safe' from a Rust perspective.

It's certainly unsafe if arbitrary means arbitrary. For example, the layout of &[u32] consists of a pointer field and a size field. Suppose we overwrite such an object, e.g.:

pub fn modify_slice(p_slice: &mut &[u32]) {
    *p_slice = &[1, 2, 3];
}

The write to the object will likely use two instructions, one per field. Suppose it writes the pointer first, then the count. If unwinding is initiated in between those instructions, you'll end up with the new pointer and the old count. This is problematic because the slice may be observable from Drop routines, e.g.:

pub struct SliceWrapper<'a> { slice: &'a [u32] }

impl<'a> Drop for SliceWrapper<'a> {
    fn drop(&mut self) {
        println!("dropped {:?}", self.slice);
    }
}

fn go() {
    let mut wrapper = SliceWrapper { slice: &[] };
    modify_slice(&mut wrapper.slice);
}

What specific circumstances do you expect to trigger unwinding in? If you only intend to trigger it on a thread when that thread itself experiences a page fault – I can't tell from your description whether this is the case – then you might object that the two instructions to write the fields cannot fault, unless something is badly screwed up already. However, the compiler has significant leeway to reorder instructions, potentially moving faulting instructions between the writes... depending on what exactly is allowed to fault.

4 Likes

I wanted to ask this for my own education.. comex already asked, but I'd like to second the question:

Why are they necessary? Okay SIG_INT can happen at any instruction at all but what should happen in the kernel in reaction to it? Why does the kernel need to unwind the current thread's stack?

1 Like

@comex thanks very much for your reply!

I have indeed tried to use requires_uwtable, i set it in my OS's target .json file, but the generated code did not change. Is there more documentation available on requires_uwtable? Because AFAICT, it doesn't really have any effect unless no-landing-pads is also specified, if I understand this rustc code properly.

I totally agree that this is not safe, especially when unwinding from arbitrary instructions. Safety and correctness is something I can shoot for later; for now I'm just looking for ways to produce an unwind table with maximal coverage.

Regarding the use cases asked by @comex and @atagunov, our system is a single address space OS runs everything at the same privilege level (e.g., Ring 0). Thus, there is no concept of userspace processes, and everything can be considered a kernel thread/task. I am trying to experiment with using unwinding not only for language-level exceptions like Rust panics, but also for unwinding after a machine fault occurs (e.g., unhandle-able page faults, invalid opcodes, etc) as well as when a thread is killed.

In these two cases, the default behavior** is simply to abort the task, so any form of unwinding would we an improvement, even if we couldn't guarantee safety up front in all cases.

**Technically we can unwind all stack frames except the current frame in which the machine exception occurred, since those are all covered by call sites that have unwind cleanup routines (e.g., TerminatorKind::Call). There is just no unwind information for that current frame, which is what I'm trying to solve here.

This sounds like an interesting project. I would like to suggest an alternative design that might be worth evaluating, either by itself or in comparison to on-thread unwinding: "keeper" threads as used in KeyKOS (see Norman Hardy "The KeyKOS Architecture," Operating Systems Review 1985). The basic idea is that when a thread takes a machine fault you suspend its execution and activate a separate thread that has debugger-like access to the faulting context. KeyKOS considered this a way of achieving the usual "no policy in ring 0" microkernel deal, but I think it might be generally applicable as a way of avoiding all the problems that arise with trying to do fault recovery within the same thread and memory access context that took the machine fault (may have interrupted a critical section of the runtime library, base registers may be unreliable, global state may be corrupted, ...)

3 Likes

I'd question that assumption. Aborting a task won't cause arbitrary and unpredictable misbehavior. As comex showed, unwinding from an arbitrary instruction can.

3 Likes

@zackw thanks very much for your response! I have heard of KeyKOS before but wasn't familiar with the "keeper" threads, those are quite interesting.

However, I don't think "keeper" threads would help here, because no matter which thread performs the actual unwinding cleanup (be it the same "local" thread or a remote thread or address space), there still would not be any unwinding cleanup routine generated by the compiler that covers the stack frame in which the fault occurred. Thus, even a separate keeper thread wouldn't have sufficient information to cleanup the most recent stack frame of the excepted thread.

Please do correct me if I'm wrong or otherwise misunderstood your suggestion.

Agreed, but not unwinding a task will certainly result in an unrecoverable or non-progressable system state. We have observed this many times, such as when an excepted/killed task has acquired a Mutex and needs to release it for another task to make progress, or similar scenarios with incrementing/decrementing reference-counted pointers.

As this is an experimental effort to push the boundaries of fault recovery, even if we couldn't unwind from any arbitrary instruction, it would at least be an improvement to increase the coverage of instructions that we could unwind from.

Sounds similar to Mirage monokernel based on OCaml. I guess OCaml overall safety guarantees no invalid instructions in Mirage and no need to unwind from any arbitrary instruction. Is your system Rust-only? So it is the unsafe Rust that introduces the need to panic from strange places?

Yes, our system is Rust only, but it's not a unikernel that can leverage an underlying hypervisor for fault tolerance.

So it is the unsafe Rust that introduces the need to panic from strange places?

It's not necessarily unsafe Rust causing it, nor "panics" since those are synchronous exceptions, but rather events beyond (below) the Rust language level, such as machine faults and task killing as mentioned above.

I looked into it a bit more.

At the LLVM IR level, there is simply no way to represent asynchonous unwinding. Synchronous unwinding works by using invoke instructions for all potentially-unwinding calls, like this:

invoke void @unky() to label %bb5 unwind label %cleanup

In this case, control flow will flow to %bb5 after the call, but if unky() unwinds, control flow will pass to %cleanup (which should with a landingpad instruction, and end with a resume instruction to continue unwinding). There is nowhere in the IR to specify a cleanup label to be used for anything other than an invoke.

At the assembly level, the unwind info format in the LDSA is based on specifying a range of instructions for a given cleanup label. But LLVM just specifies the beginning and the end of the assembly generated for each invoke IR instruction (usually a single assembly instruction). For example:

// inside the function itself
Ltmp0:
	callq	_unky
Ltmp1:
// [...snip...]

Ltmp2: // cleanup label
	movq	%rax, %rbx
	leaq	-16(%rbp), %rdi
	callq	__ZN4core3ptr18real_drop_in_place17hbf5348961906a45fE
	movq	%rbx, %rdi
	callq	__Unwind_Resume
	ud2
// [...snip...]

// inside __gcc_except_tab:
Lcst_begin0:
	.uleb128 Ltmp0-Lfunc_begin0 // beginning of instruction range
	.uleb128 Ltmp1-Ltmp0 // size of instruction range
	.uleb128 Ltmp2-Lfunc_begin0 // cleanup label

However, if multiple invoke instructions have the same cleanup label, LLVM merges the two ranges, which incidentally includes all instructions in between as well.

So in theory you could get more instruction coverage by adding dummy invoke instructions. But it's pretty much a terrible idea. Unwinding at arbitrary instructions is not only potentially unsafe from Rust's perspective, it's completely undefined behavior from LLVM's perspective. LLVM is free to, e.g., perform register allocation in a way such that jumping to the unwind handler works at both invoke instructions but not for random instructions in between.

Thanks for the in-depth analysis, @comex, I really do appreciate your time!

Sadly, I arrived at the same conclusion when I initially looked at LLVM's mailing list posts to see whether LLVM could support async unwinding at all, but it's good to confirm that. Indeed, I have observed that most instruction ranges are covered by a row in the unwinding table, but the LSDA they point to is either null or a no-op routine, which is expected since there is no invoke instruction there. That is effectively the reason behind me posting this topic -- I figured that doing it properly with true async unwinding was a nonstarter, so I was hoping I could abuse some mid-level part of the Rust compiler (before the codegen phase).

I had a sneaking suspicion that I would need to insert invokes (or TerminatorKind::Call at the MIR level) or transform other MIR terminators into that, but I wasn't confident that doing that could ever be valid.

=======

Perhaps it'd be better to address specific cases incrementally. Here's a concrete example that is fairly simple and might be do-able:

fn foo() {
   let _lock = GLOBAL_MUTEX.lock(); 
   if condition {
      loop { }
   }
}

Let's say we'd like to start unwinding when the instruction pointer is within the above loop, in order to ensure that _lock gets dropped. At the MIR level, that loop will result in a FalseUnwind terminator in the initial unsimplified MIR before any passes run. In fact, FalseUnwind is one of several terminators besides Call that does have unwind cleanup routines, but they get removed later: the simplify CFG passes will transform that FalseUnwind into a Goto that just jumps directly to the real target. So, the cleanup routine was initially generated but then lost.

I'm wondering if and how I could make transformations that would result in the unwinding cleanup routines still being present. That way we wouldn't need to do anything wonky in LLVM like peppering in invokes throughout the generated code.

Note that I have no idea if this is the right approach, if it's feasible, or if MIR is the right place for such transformations, but it's my only lead right now.

hmm.. JVM got safepoints:

Java threads poll a ‘safepoint flag’ (global or thread level) at ‘reasonable’ intervals and transition into a safepoint state ... when they observe a ‘Go to safepoint’ flag

so can they only be caused by badly written unsafe Rust and inline asm!? or are there other kinds of machine faults necessiating unwinding of a single thread as opposed to shutting down the whole OS?

I think that's the critical aspect of the problem. If there are potential machine faults within a sequence of instructions that modify different parts of a self-coherent structure, such as starting address and residual length for a slice, then there is no real hope of generating safe unwinding code for such machine faults.

That's one way to force a machine fault/exception, sure. Another way is a hardware failure, memory corruption, etc, really anything beyond the language level. It's not that such a fault necessitates unwinding just a single thread, it's just that we'd like to do better than a scorched-earth full kernel oops without even trying to tolerate the fault first.

Yes, when killing a task, we monitor its execution and then wait for it to reach a safe point, i.e., an instruction already covered by an unwind table entry; it's also similar to pthread deferred cancellation. But in cases like the fn foo() above, such an instruction will never be reached, despite it being safe to unwind from within that particular loop.

Even though unwinding from an arbitrary instruction can never be guaranteed safe in the general case, I bet that quite a few common code patterns can be unwound from safely, and I would like to at least explore those on a case-by-case basis rather than just give up. I'm still looking for assistance in that regard, though I do appreciate the detailed analysis of unwinding safety.

Once we can actually generate cleanup routines for more instructions, even if those unwinding routines are not 100% safe for all instructions in the range covered by that cleanup routine, then we can move on to determining which instructions in that range can be safely unwound from. For this, we could leverage additional solutions, e.g., manual annotations, static analysis, or dynamic stack frame analysis.

Yes, the main benefit of doing cleanup on a separate thread (as I see it) would be that the cleanup thread's own task state is known-good, not that it makes the actual unwinding job easier. However, it's possible that a separate keeper might not need to do full stack unwinding in order to tear down a faulted context, in at least some circumstances. I don't know if this is actually true, I'd have to work out a full design to know either way. It might be less likely to be true in your system where you're not using hardware memory protection.

Inserting a sufficient number of safepoints might be more achievable. Safe thread cancellation facility might be generally useful and could attract wider supported from language designers. I'd suggest a new thread.

From a design perspective, I'd like to note that Rust's standard MutexGuard intentionally "poisons" the mutex if it's destroyed during stack unwinding, under the reasoning that whatever invariant was meant to be preserved by the Mutex might not be true if code panicked while holding it. Given that your environment is no_std, I guess you're already using your own mutex type, and perhaps you've decided that unlocking mutexes during unwinding is worth the risk. But it's not the decision I'd make.

Well, MIR is compiled to LLVM IR. I suggest you first figure out what IR you want to eventually generate, then figure out how/whether that can be expressed in MIR.

Thanks, I like the idea of trying to insert safepoints. I presume this will result in at least a moderate amount of runtime overhead, but it's worth investigating. I'll look into it and start a new topic if need be.

Sure, I could poison our Mutexes; it's a (good) policy choice but partially depends on the context. In our environment, there must be some spinlock-like Mutexes that exist before the tasking subsystem is initialized, so poisoning isn't always possible. I presume other no_std environments have similar behavior.

The need to run unwinding routines remains, regardless of what a given type chooses its drop behavior to be; for example, other types like Arc/Rc need to decrement a reference count to ensure the enclosed type will eventually be dropped too.

Yes, agreed. I think the thesis of my original question can now be rewritten as "what sort of IR do I actually need to generate to allow unwinding cleanup routines to be generated or kept?" It seems that only Call (MIR) terminators (LLVM IR's invoke) can accommodate unwind cleanups, but with my limited compiler knowledge I cannot conceptualize how to use or insert such terminators. That's essentially what I need help figuring out.