Canvas unsafe code in the wild


#1

Continuing the discussion from Next steps for the unsafe code guidelines:

About two weeks ago, we had our first-ever synchronous unsafe code guidelines meeting. In that meeting, we decided to do a few things, but one of them was to try and canvas and start to get a better feeling for what unsafe code exists in the wild. I wrote up some initial assignments of things to investigate:

@nikomatsakis – rayon @RalfJung – Arc ? @gereeter – perhaps that llvm crate you were referring to? @asajeffrey – C bindings perhaps? @arielb1 – abomonation perhaps? @robbert – Rc

I wanted to start this thread for people to note down results and things so we can all read it.


Next steps for the unsafe code guidelines
Improving self-referential structs
#2

I didn’t get to the most interesting parts of rayon yet, which would be rayon-core, but I did some investigation into the parallel iterators. These are mostly safe code (yay safe abstractions!) but there is a bit of unsafe code dealing with uninitialized memory. Specifically, when constructing a vector, we will initialize it to the desired length and then fill in parts of it in parallel (with some assertions to make sure that everything gets written):

The collect() code.

The main interface boundary here is the following (safe) function:

pub fn collect_into<PAR_ITER, T>(mut pi: PAR_ITER, v: &mut Vec<T>)
    where PAR_ITER: ExactParallelIterator<Item = T>,
          T: Send

This will:

  • resize v to reserve pi.len() slots (but truncate to 0, so they are uninitialized)
  • allow the parallel iterator to drive the code:
    • parallel iterator can either split (into two consumers) or into_fold (into a reducer)
  • Consumers hold an &mut [ITEM]contents uninitialized!
    • each split uses split_at_mut
  • Reducers convert that into a IterMut<'c, ITEM>again, contents uninitialized!
    • each item that is pushed uses ptr::write
    • if, when reducing is done, the iterator is not drained, we abort
  • At the very end, we will call set_len to adjust the length of the vector
  • In general we have a “trust-but-verify” approach of the parallel iterator:
    • it tells us how many items it will write, but we count them ourselves
    • if at some point we get too many, we will panic
    • this may leak the values that were written thus far, since set_len is never called

Historically: - we used to use *mut everywhere - we rewrote it to use slices (split_at_mut) and iterators - because it was much cleaner

So, to sum up:

  • lots of &mut T and also slice::IterMut<T> where the referent is not initialized
  • potentially leaking data (if something goes awry)

Next steps for the unsafe code guidelines
#3

I spent some time reading into Rayon’s codebase, specifically rayon-core, which is. Let me summarize some of the patterns I saw:

Panic or return: you got two choices.

First, a sort of obvious one, but Rayon relies on being able to fully control control-flow. Specifically, we assume when we call a function, that function will either return, panic, or run forever. Since we can catch panics, that means that we can guarantee that if a function returns, it returns to us. This allows us to advertise pointers into our stack, confident that before we return, we can ensure that those pointers are no longer active. Something like setjmp() would ruin this.

Transmute, and plenty of it.

Second, true confessions, I kind of love transmute(). It’s such a handy interconversion tool, and Rayon’s source uses it frequently, though typically in tightly controlled ways (i.e., with the target type explicitly written). I guess it’d be interesting to see if those uses can be expressed without transmute – and, perhaps more importantly, if that is in some way safer versus just being different (I admit I am dubious).

(Ultimately I kind of think we’ll want some rules that say “types can be interconverted in these scenarios”, and those rules will apply whether you use transmute, a union, or any other sort of cast.)

Background: JobRef

The key abstraction in rayon-core is JobRef, which is a kind of home-spun trait object:

struct JobRef {
    pointer: *const (),
    data: unsafe fn(*const ())
}

We create the JobRef by taking in a *const T of some type T: Job, where Job is a trait like so:

trait Job {
    unsafe fn execute(this: *const Self);
}

The actual construction just extracts T::execute as the fn pointer and then does some transmutes to erase the type T etc:

impl JobRef {
    pub unsafe fn new<T>(data: *const T) -> JobRef
        where T: Job
    {
        let fn_ptr: unsafe fn(*const T) = <T as Job>::execute;
        JobRef {
            pointer: transmute(pointer), // erase types
            execute_fn: transmute(fn_ptr),
        }
    }
}

I think we could have used a real trait object, but those do not currently support *const self methods (for which there is no good reason). So we create it ourselves. We used to use a trait with &self, but that always felt like something of a lie. fn(self) would probably be more correct, but we don’t support that in trait objects either. =)

For one thing, the backing pointer in a JobRef can actually be of many different kinds:

  • when using join(), the pointer is an &StackJob<F>, logically – it points into the join() stack frame, which is guaranteed not to be popped until the job has executed. (Here F is the user’s closure.)
  • when using spawn(), the pointer is a Box<HeapJob<F>> – the job is allocated on the heap, but there is only ever one pointer to it. (Again, the F is a closure.)
  • when using futures, the pointer is an Arc<Future<F>> – the job is allocated on the heap, and there are multiple handles to it (e.g., it may be enqueued, but also referenced by other futures). (Here the F is a future.)

In all of these cases, the code that creates the JobRef ensures that it has the correct type of pointer, and the Job implementations then transmute the *const Self into the appropriate pointer type (e.g., Box<Self> etc). I think I might have preferred to implement the Job trait for the pointer types (e.g, implement Job for Box<HeapJob<F>> instead of HeapJob<F>) but that would have required passing the trait objects “by value” which doesn’t really work (basically, the existing struct encodes the constraint that a Job can only be implemented for something that is represented as a thin pointer).

Observations on JobRef: escaping

One thing I noticed is that Rayon frequently has cases where we “hide” lifetimes. Often this is accomplished by an (unsafe) function. One example is the one that creates a JobRef for a stack job (this is mildly simplified to remove irrelevant details):

impl<F, R> StackJob<F, R>
    where F: FnOnce() -> R
{
        pub unsafe fn as_job_ref(&self) -> JobRef {
            JobRef::new(self) // implicit: coercion from `self` to `*const Self`
        }
}

The problem here? The &self is actually escaping into the JobRef. Under some rules, this would be illegal.

I have to say that literally every time I look for this pattern in Rayon I find it. Now, I’ve generally been careful to write unsafe fn in those cases, precisely because I want to signal that something fishy is going on. But I wouldn’t be surprised to learn that this sort of erasure is a very common thing for people to do (at least if their unsafe code is trying to be clever with lifetimes).

In any case, there are numerous other examples of hiding lifetimes in this way.

Future code: types that may indeed be out of scope (but no accessible data)

In the futures code, the future ultimately has a type like ScopeFuture<'scope, F, S> where F is the future code. This has a field of type Option<Spawn<CU<F>>> – the details of the type aren’t important, except to note that this type F may have references (though F: 'scope). In general, we guarantee that all aliases of ScopeFuture expose this F type, and hence whenever they are used, the type F is in scope (and hence the references are valid).

However, there is one case where that is not true: you also get a future that represents the result of the future. It is of type RayonFuture<T, E>, where T and E are the result and error types of the future. Note that this type does not include F. This is intentional, as we want this type to escape and be usable wherever a T is usable. This is achieved by hiding the type F through a trait object (and some transmuting). The subtle point is this: through the methods on that trait object, it is possible to execute methods of the original ScopeFuture but after 'scope has ended. However, in such cases, all the fields that may have references (e.g., that Option I mentioned) have been set to None (and, in any case, are not accessed). So there are no actual references accessible, just the types may be invalid.

Panic guards

There are various places where we use “aborting” panic guards. These are in place to prevent unexpected unwinding from messing things up. I have found it tends to be impossible – and kind of subtle! – to predict all the places that user code may be invoked and try to catch unwinding. In particular, it is hard to see all the places where Drop might run – for a while I tried to be very careful about catching drop failures, but eventually I gave up and decided that we would just abort if a Drop impl should panic (as such, I’ve added drop guards here and there). A lint that indicates when a value of generic type might be dropped would, I think, be very useful – since I’ve likely missed some spots.

Scoping of unsafe

Rayon is pretty careful to mark functions as unsafe if they place any “additional constraints” on their caller (e.g., cannot allow return value to escape a certain lifetime etc). There are usually comments explaining what those constraints are, as well, though not always. =)


#4

This is a safety issue, not a definedness issue.

What sorts of transmutes?

So you are casting an fn(*const Thin) to an fn(*const ()) and calling it as such? That sounds like quite a popular pattern, especially with FFI etc. where you can’t use trait-objects.

It is interesting how far can you stretch fn-pointer-compatibility - e.g. calling an fn(&T) as an fn(&mut T), an fn(&T) as an fn(&U) for some type U, an fn(T) as an fn(Newtype<T>), an fn(Option<*const T>) as an fn(Option<*const ()>) etc.

So you are transmuting an Box<Trait + 'short> to an Box<Trait + 'long> and calling it outside of 'short?

Also a safety, rather than definedness, issue. I agree that some way to write the unwind blocks yourself would be very nice for ergonomics.


#5

If we’re talking about the safety of setjmp, I’ll note it is unsafe even without rayon:

pub struct LongjmpOnClone;

impl Clone for LongjmpOnClone {
    fn clone(&self) -> Self {
        longjmp();
    }
}

fn main() {
    let mut v = vec![];
    if !setjmp() {
        v.extend(&[LongjmpOnClone]);
    }
    // this better hold :-)
    assert_eq!(v.len(), 0);
}

#6

Actually, that’s an imperfect example because it can be bypassed using “PPYP”, and the “race” is on an &mut, but there is a version more similar to Rayon (which I expect to be used sometimes when interacting with some C libraries):

pub mod my_mod {
    use std::cell::Cell;

    pub struct EnvCtxt {
        // INVARIANT: `var` always points to something valid if not None
        var: Cell<Option<*const u32>>,
    }
    
    impl EnvCtxt {
        pub fn new() -> Self {
            // ok because `var` is None
            EnvCtxt { var: Cell::new(None) }
        }
        
        pub fn get(&self) -> Option<u32> {
            unsafe {
                if let Some(var) = self.var.get() {
                    // ok because `var` points somewhere valid
                    Some(*var)
                } else {
                    None
                }
            }
        }
        
        pub fn with<F>(&self, value: &u32, f: F) where F: FnOnce() {
            struct Guard<'a>(&'a EnvCtxt);
            impl<'a> Drop for Guard<'a> {
                fn drop(&mut self) {
                    // ok because `var` is None. (I could preserve the
                    // old value, but that would require a subtler
                    // invariant).
                    self.0.var.set(None);
                }
            }
            let _guard = Guard(self);
            // ok because `value` is currently valid, and will be
            // cleared before it is invalidated.
            self.var.set(Some(value));
            f();
        }
    }
}

pub type JmpBuf = [u64; 200/8];

extern "C" {
    fn setjmp(buf: *mut JmpBuf) -> u32;
    fn longjmp(buf: *mut JmpBuf, val: u32);
}

fn main() {
    let ctxt = my_mod::EnvCtxt::new();
    let mut jmp_buf = [0; 200/8];
    let val = Box::new(0);
    if unsafe { setjmp(&mut jmp_buf) } == 0 {
        ctxt.with(&val, || unsafe { longjmp(&mut jmp_buf, 1) });
    }
    drop(val); let _ = Box::new(0xccccccccu32);
    println!("{:?}", ctxt.get());
}

Threads are only important in a fork-like situation where ordinarily the invariant-violated library’s code will never get run.


#7

Can you clarify what you mean by a “safety issue”?


#8

Roughly speaking, yes.


#9

definedness issue = may cause UB if the UB rules are not defined finely enough. safety issue = may cause UB if called incorrectly.

For example, Vec::unchecked_get is a “pure-safety” issue - it is obvious whether a particular execution trace is defined (read: it does not access a vector beyond its length), but it might not be obvious whether there exists some unsafe trace, but e.g. aliasing issues are definedness issues - it might not be obvious whether a particular trace is defined or not.


#10

I see, I thought this is what you might have meant. However, I’d draw a distinction between unchecked_get and prohibiting closures that not return normally (i.e., which may use setjmp) – it seems important to me to highlight things that unsafe code assumes about the universe of all other code.

In particular, I think that eventually we are going to want to enumerate sets of capabilities – somewhat like I described in this blog post on observational equivalence, though probably not exactly like that – that unsafe code is allowed to make use of. Or at least enumerate things that we know it cannot do.

As a simple example, if you are writing unsafe code for some application that specifically targets a simple embedded processor which does not support multi-threading, that might allow you to make stronger assumptions in your unsafe code – but that code would not necessarily be ‘composable’ with other unsafe code.


#11

(To be clear, I don’t disagree with the distinction you’re drawing or your definitions.)


#12

I thought of a better way to draw the distinction. =)

unchecked_get is an unsafe function. So it’s perfectly valid for it to impose arbitrary constraints on its callers, as long as they are documented, and I agree it’s not a big concern for the UCG if they are violated.

join() is safe. This means that it is leaning on the Rust type system to enforce its constraints. And that is of interest. One of those constraints is that, when we invoke a closure, it will either return or panic.


#13

This transmute works around something that looks like a failure of borrow checker. Intuitively I think it’s safe and should work, but even with a significant refactoring of these methods I could not find a way to avoid the transmute:


#14

I thought I’d chime in with some comments on Abomonation; I assume it is the same one we are talking about as no one else should spell it that way (where is the “5” from? typo? lots of secret forks doing cool things? :D).

Abomonation does several horrible things, and I wouldn’t be surprised if the correct answer is not “this should be defined behavior”. But, I thought I would point out some of the things that it does, as different from the problem it thinks it has to solve. I have an interest in something with similar properties being possible, even if the pile of code that is Abomonation needs to be deleted and zeroed over several times.

Abomonation was an attempt to deal with the issue that for data intensive compute with communication (ie multiple workers) one really wants some sort of region allocation in order to efficiently bundle up data, ship them to other workers, and unpack the data on the other end. The “competition” in the HPC space just memcpys plain old data around, the competition in the JVM space takes comically large amounts of time to serialize data (or roll their own off-heap serialized representations), and the competition in the C/C++ space does a heap crawl but then has to strongly warn the recipient that they should not use methods that expect to mutate anything about the interpreted data. Rust seems well positioned to do the efficient thing, safely.

Abomonation’s implementation was the optimistic take that maybe the minimum amount of work would be enough: memcpy on each pointer you find into a new region on the encoding side, and some pointer correction and casting on the receiving side. Other options exist, like CapnProto, but there is an ergonomic hit where types in use need some macro love (this is less a problem if you are serializing your own few types, and more of a problem if you are writing middleware for other people who wouldn’t otherwise know you have to do this for them).

There are a few concrete things I know of that are presently UB with respect to Rust’s guidelines:

  1. Abomonation calls memcpy to read from structs which may have padding bytes. The last version of the docs that I read said that reads of undefined values is undefined behavior. This is not what LLVM says (such reads result in undef values, which may become UB if they are used badly, e.g. in division denominators), but I can understand why Rust would be more restrictive (perhaps it doesn’t need to be in this case, though).

  2. Abomonation totally ignores alignment, and could produce references to types that are not aligned with properties that their initial allocation had. This can be fixed, but it may mean a copy to shift everything by just a few bytes, or only reading data into known aligned locations.

  3. Abomonation totally ignores widths, with the assumption that you are using the same binary to encode and decode the data.

Abomonation also does some other weird things that I’m not even sure if they are clearly defined or undefined. The main one is to (i) make a copy of reachable memory from some typed reference &T, (ii) correct pointers in that copy to reference their corresponding locations in the copy, then (iii) cast the result to a &T and act as if it is valid. The UB spec says something about “it is UB for values to be invalid”, where clearly whether what Abomonation does being UB or not depends on what is or is not valid; I suspect the fact that the values are behind a reference is not a saving grace.

For example:

  1. I could imagine some version of this working out just fine for &[u64], which has a well-defined size and alignment and all of that.

  2. I would imagine no version should work out just fine for &Rc<T>, because the implementation of clone does some funny business with UnsafeCell that is very unlikely to be faithfully reproduced (maybe it works? maybe it just writes over the strong count location in the underlying &[u8], or maybe it does this and some NOALIAS assumptions elsewhere break).

  3. I don’t know whether and how this would work for Vec<T> or Option<T> or other composite types. Each of these have some baked in assumptions about their structure, and may not be valid just because you copied their exact bytes. At the moment it “seems to” work, in that I encode and decode a lot of data and don’t experience problems.

On the encode side of things, I have to imagine there is some version of “copy lots of bytes into a buffer” that isn’t undefined behavior, because there are no particular invariants about the resulting Vec<u8>. The decode side seems to be where things are terrifying, because the intent is to let otherwise oblivious Rust code use a &T as if it were such a thing.

So, it is all a bit scary, and a bit of a mess. I’d love to get some guidance (perhaps as a part of this process, or perhaps as the result) about how to make it UB conformant.

I want to double-extra stress that from my point of view it is important to be able to have some sort of correspondence between in-memory representations and “on-disk” representations, if you want performance from data-intensive computation. repr(C) is one way out, but it has a similar ergonomic overhead to CapnProto: either users need to use it on their custom what-would-otherwise-be- (String, usize) types, or accept that each type is wrapped by the system preventing them from using code that expects a &[(String, usize)], and having them re-write it as taking something like &[Wrapper<(String, usize)>]. Either of these approaches also increase the cost of moving data from a Vec<T> into a Vec<Wrapper<T>>, where we may not be able to just memcpy data any more.

Can I set repr(C) for an entire project? This is something I would like out of the UB spec: some way to reliably know (or insist on) something about the in-memory representation of types in the programs I run.

Edit: Just to re-iterate, it is not important that I insist on a specific in-memory representation, but rather consistency of the in-memory representation, whatever its layout. For example, Rust could hypothetically create behind the scenes a few specializations of types with different layouts, and convert between them with care. This would utterly crush Abomonation’s plan, and would stress me out greatly. Field orders that vary build-to-build (which is apparently in nightly?) are also stressful, but are apparently appealing enough to other people that they are worth doing (can I insist on Abomonation vs Serde serialization numbers in any perf benchmarks they produce?).

Edit 2: Awesome side information: Abomonation’s unsafe is literally the only unsafe code in timely dataflow (that I know of). I think that is pretty amazing for the language.


#15

Also, in case folks haven’t seen it, there is a PLDI 2017 paper on cutting down UB in LLVM. I’m not sure it has immediate relevance here (it seems to be mostly about managing undef and poison in llvm), but it might be that the ideas are helpful in some way (and knowing more is better than knowing less).


#16

Okay so I had a closer look at Rc. I know I got assigned Arc, but I am more familiar with Rc because I looked very closely at it as part of the RustBelt research project.

Mostly, in terms of what we are discussing as part of the unsafe code guidelines, Rc seems to be fairly harmless. Here’s what I found noteworthy. There’s nothing big, I just felt I couldn’t come back without anything to report :wink:

Struct field offsets

into_raw and from_raw do some pointer arithmetic based on offsets of struct fields, I think there is general agreement that this should be possible. The concrete implementation of this in the offset_of! macro uses mem::uninitialized() and then calls mem::forget() in the result. This means mem::forget can get called with a totally invalid argument, but that’s not too surprising – in fact, this is even suggested by the official documentation of mem::foget(). I was a little surprised that the offset is in fact computed at run-time; you would think that the field offset is a compile-time constant.

__from_str, __from_array

These are super-dubious, as even the comments in there say. If I read the code correctly, they assume that a fat pointer can be created by creatively casting a two-element array… However, they are private to rustc.

RcBoxPtr

Rc::drop calls RcBoxPtr::dec_weak after decrementing the strong count and droping the content of the RcBox. This is somewhat interesting because RcBoxPtr::dec_weak takes &Rc<T> as an argument and is a perfectly safe function, but it is actually called on an Rc that is in a bad state – it doesn’t satisfy the usual invariants that Rc typically satisfies. However, the Rc passed to dec_weak is still a valid instance of a "syntactic Rc", so I don’t think the compiler could exploit this in any way.


Role of UB / uninitialized memory
#17

Funny enough, and unsurprisingly, you are essentially replicating here an example from a paper on control effects. That paper has examples of programs that are equivalent when there are no “first-class control effects” (i.e. no first-class continuations), but are no longer equivalent once you add call/cc or setjmp/longjmp.

But yeah, what we are fundamentally observing here is that the closure-based “trick” to get around the fact that destructors are not guaranteed to run, does not work on languages with first-class control effects.


#18

Calling memcpy must always be valid, even if the memory is uninitialized. So I think you are good here. However, one interesting question is whether it is possible to implement memcpy in Rust by doing byte-wise copy of a [u8], or whether it will have to be a magical intrinsic. It’s actually not easy to make memcpy implementable, and uninitialized bytes (e.g. padding) are part of the reason why (the other part is pointers).

Making assumptions about their structure is one part. The other interesting part is the tricks you are playing with the borrow checker. In fact, I think that part of Abomonation can be “distilled down” by considering a function fn fake_box(&&T) -> &Box<T> implemented as a transmute. I think this is very close to what happens with Vec<T> in Abomonation, except that we are side-stepping most of the discussion about struct layout. (Please correct me if I am wrong.) The good news is that we have actually proven such a conversion function to be safe in our restricted model of Rust. :slight_smile:


#19

I think this is correct (approximating fake_box as "what I do with Vec" ;)). The lifetimes are for sure “scary” but I think I convinced myself that they are legit; if that is what you’ve proven safe, awesome. :smiley:


#20

This is necessary because in the case of Rc<TraitObject>, the offset of the inner value depends on the alignment of the actual object. For trait objects, the alignment is a field in the vtable, which the compiler uses to calculate the offset.