Proposal to add 'sandboxed' compile-time enforcement for element access

Preface

While this applies to more than just vectors, I will be focusing on them in my examples and proof of concept, as their API is relatively simple and they are the most ubiquitous and error-prone collection in my experience.

Note:

The proof of concept implementation is available here: GitHub - JamFactoryInc/vec-sandbox

See src/use_cases.rs for concrete examples and comparisons

The Problem

I frequently find myself working with vectors that have certain invariant qualities that I know, but I cannot tell the compiler without unsafe.

For example,

fn do_something(vec: Vec<_>) {
    vec.push(value);
    vec.first().expect("value should exist at first index")
}

This code will never panic as-is. Nevertheless, we have to either do unnecessary error handling for a None path that will never be returned, or we must use panicking functions like unwrap or expect.

Using panicking functions can lead to a DOS condition if the invariant relationship is broken at some point, making the panicking path accessible.

fn push_and_ref(vec: &mut Vec<String>, value: String) -> &mut String {
    // this is a minimal case, but invariant relationships like this often exist in larger contexts that are more difficult to reason about

    // if removed / reordered, the line below may panic, depending on the state of `vec`
    vec.push(value); 
    vec.last_mut().expect("value should exist at last index")
}

We can try to add error handling, but in some cases, this can be more trouble than it's worth.

How do you handle a value that should never exist?

How do you test a code path that never executes?

// we have to return an Option, even though this function will never return `None`
fn push_and_ref(vec: &mut Vec<String>, value: String) -> Option<&mut String> {
    vec.push(value);
    
    // how do we convince the compiler that `last_mut()` is always present?
    // additionally, a `&mut String` can only be returned here if it outlives our elided `&'1 mut Vec<String>` lifetime.
    // since we cannot access any reference to a vector without either panicking or somehow unwrapping an `Option`/`Result`, 
    // the only valid return value would be a `&'static mut String`, but mutable statics are unsafe.

    // all this to say, I can't think of a way to return a valid `&mut String` from this function without a panicking path or using `unsafe`, 
    // so we must defer the error handling to the caller, but now we've only multiplied & distributed our problems.
    vec.last_mut(value_index) 
}

// elsewhere

vec.push_and_ref(some_value).unwrap_or_else(|| {
    // what do we do here?
    // This code will *probably* never be executed, but this isn't guaranteed
})

What I usually see in these situations is expect or unwrap.

This is fine if properly tested and due diligence is performed. However, I can point to many, many CVEs to evidence that developers don’t always do their due diligence or test all the cases.

Having element access guarantees checked at compile time would prevent these issues; we need to defer enforcement of these invariant relationships to the compiler.

Proposal

The rust type system is robust enough to model most of these guarantees via a Typestate pattern.

This can be done by reserving a mutable reference to a vector, stored in an owned SandboxMut instance.

This would look something like this

let mut vec = vec![];
vec.sandboxed_scope(|sandbox| {
    // `sandbox` has unique mutable access to `vec`
})

We can then add a constant parameter MIN_LEN to sandbox, providing us with some guarantees about the referenced vector.

While this would default to zero, certain operations like push could change this value.

These operations would need to take ownership of sandbox in order to return a new sandbox with a different constant parameter.

let mut vec = vec![];
vec.sandboxed_scope(|sandbox: SandboxMut<0, _>| {
    let sandbox: SandboxMut<1, _> = sandbox.push(1); // the new sandbox has a minimum length of 1
})

Finally, we can separate the different vector operations into different traits, implemented only for certain values of MIN_LEN.

For example, pop(), first(), and last() all require at least one element, so they can be moved to a NonEmpty trait that is implemented for SandboxMut<N, T> where N > 0

Together, this can provide compile-time guarantees for the previous example:

// we can safely return a `&mut String` using sandboxing
// this function will not compile if the invariant relationship is broken
fn push_and_ref_sandboxed(vec: &mut Vec<String>, value: String) -> &mut String {
    // we can also just obtain the sandbox directly, without wrapping it in a closure
    // `release_get` is like `get`, but takes ownership of the sandbox, allowing its returned reference to outlive the sandbox itself
    // -1 is a negative index, indicating the last element of the vector
    vec.sandboxed() 
        .push(value)
        .release_get_mut::<-1>() 
}

This is an interesting idea, but quite a tricky new API design. I think that the best thing to do is to polish your “proof of concept” and publish it to crates.io so that people can easily try using it and see whether it provides benefit in practice. It’s unlikely that there would be interest in adding it to std without such testing of the design.

3 Likes

There’s an alternative implementation possible for the same basic idea: instead of using something like sandboxed_scope to create a reference from an existing vector, instead you allocate a MaybeUninit<Vec<…>> and then have a constructor for SandboxMut that takes the MaybeUninit as an argument, initialises it and returns a reference into it. I hadn’t seen the version that takes an existing vector before, but they complement each other well (and are probably both special cases of the same general idea: you have a “bounding type” – Vecin your case and MaybeUninit in mine – and can have references to it that treat it as having any subtype of that bounding type).

I’ve been meaning to write a lot on this topic for a while, but haven’t managed to find the time. The underlying technique, though, is a) very powerful and b) very different from the way things are normally done in Rust – it effectively lets you implement strong updates, but at the cost of objects being harder to create than normal and references acting a bit differently

Some other applications for this pattern which would be hard to implement using normal Rust techniques:

  • implementing objects that can’t be moved, like futures (Pin is an existing workaround for the issue, but you can use this sandboxing technique instead; given how confusing Pin is, it might even be more ergonomic – it also statically guarantees that you don’t get panics from polling a future after it returns);
  • implementing memory allocators in safe Rust that allocate from a buffer on the stack (if you try to do it naively, you discover that the allocator becomes a cyclic structure, because it needs a slice of “memory I haven’t used yet” but that points to inside the allocator itself – this patterns olves it because the slice gets stored in the reference to the allocator rather than the allocator);
  • using padding bytes of enum fields as though they were niches, in order to store the tags of the other variants (this doesn’t work in current Rust because if you took a reference to the field, a write through that reference might write the padding bytes in a way that changed the enum variant – but as long as the bounding type is MaybeUninit, it works in a sandbox because the reference type can fix the padding bytes when it’s dropped and there’s no way to observe the bytes of the enum while it’s mutably borrowed)

One of the biggest changes is that if you mem::forget a regular Rust reference, the value it was borrowed or reborrowed from will always be a valid value of the type of the reference, whereas if you mem::forget one of these sandbox references, the value it was borrowed from ends up as a value of the bounding type (which is often a much more general type like MaybeUninit). In general, I think this is an improvement (it makes panic-safety easier to define and understand), but it’s very different from how Rust currently works.

The other really big change is with respect to reborrowing: Rust is very reliant on reborrowing at present for working with mutable references, but sandbox mutable references hardly use it. I wasn’t previously aware that reborrowing was even possible with this sort of reference, although this post made me realise that you could implement it using nested sandboxes.

I guess the real question here is “this sort of thing is clearly useful in some cases – but to what extent do we change the Rust libraries to use this pattern rather than more normal Rust patterns?“ You could probably rewrite the entire standard library in this form, and it might even be an improvement, but doing that would break all the existing Rust code. Rewriting just part of it in this form, alongside the existing APIs, would be confusing and might cause an ecosystem split, so it would probably be best reserved for cases where it were particularly helpful.

Perhaps all this is an argument for trying to implement this sort of thing as a new language feature rather than a library: that would probably make it easier to integrate with existing code. But it’s currently unclear to me what such a feature would look like (although there are some thoughts in the strong updates post I linked above, they probably aren’t directly applicable to the Vec situation).

1 Like

The "sandbox" terminology usually implies a strong enforcement that holds up even against code trying to intentionally bypass it. Rust doesn't have that capability, and the type system is for code cooperating with it.

For collections, there are crates already:

And these kind of guarantees could be modelled with typestate.

For other things, the more common solution is to define pre- and post-conditions on functions.

1 Like

I agree that “sandbox” probably isn’t a very good name for this sort of thing. But it’s a hard pattern to name in a clear way – I’ve been thinking about the pattern a lot but don’t have a name for it, which makes it difficult to talk about.

I think of this as being more of a general pattern for implementing typestate than anything else (although it’s a bit more powerful than that, I think the most common use is to work around Rust’s lack of typestate). There are basically three ways to do typestate in Rust at the moment: a) doing it entirely with moves, b) this approach in which an object stays in place and you create lots of different types of references to it, and c) using normal Rust references but transmuting them to add or remove #[repr(transparent)] wrappers around the code containing them. The first approach isn’t very general, and the third approach currently requires unsafe and is hard to build safe abstractions around, so in practice, the approach seen here is the only practical way to do general-purpose typestate in Rust at present.

(For example, vec1 uses the first of the approaches above, but it is a lot less powerful – if you have a vector in a struct field, you have to decide whether it’s a Vec1 in which case it can never have less than one element, or a Vec in which case you don’t get the positive-element guarantee. The approach in the OP allows you to assert, over a given region of code, that a Vec currently contains at least one element, even in cases where it could be empty at other times durint the program’s execution.)

I’m wondering about your thoughts on a more pared-down version of this concept, as seen in one of my use cases:

self.bounds = self.sorted_values.as_non_empty().map(|sandbox| {
    (*sandbox.first(), *sandbox.last())
});

This accomplishes much of my sought-after functionality, but doesn’t require typestate; it could be a simple &mut Vec<T> wrapper that exposes some functionality common to non-empty collections after a runtime check. Effectively lifting multiple redundant checks out into one condition.

I’m learning about vec1 from this post, so I’ll have to look into it to see how it compares & contrasts to my concept, but so far it seems like it’s conceptually closer to a NonZeroXXX, guaranteeing that the vector will never be empty

Sorry, I did not mean to imply that I thought the idea was too complex. What I meant was: because it is complex (but potentially very good), it should be proven by providing it in a library outside of std before it is added to std, so that any problems can be discovered that might be only evident after its usage in actual practice. std is where APIs get nearly frozen forever, so new ideas that aren’t “obviously correct” should be tested outside of std when they can be.

5 Likes