[Pre-RFC] A tiny internal change to `reverse`, `copy_from_slice`, `swap_with_slice`

tl;dr: I propose to rewrite, for example, swap_with_slice from this current version (source)

// Before (minimized)
pub const fn swap_with_slice(&mut self, other: &mut [T]) {
    unsafe {
        ptr::swap_nonoverlapping(self.as_mut_ptr(), other.as_mut_ptr(), self.len());
    }
}

to lift self.len() to the front as follows:

// After (minimized)
pub const fn swap_with_slice(&mut self, other: &mut [T]) {
    let len = self.len();
    unsafe {
        ptr::swap_nonoverlapping(self.as_mut_ptr(), other.as_mut_ptr(), len);
    }
}

The same idea applies to the implementation of reverse and copy_from_slice.

Are there any downsides to such a rewrite?


Motivation

I'm working on formally verifying slice functions in core using Creusot (see also the Verify Rust std challenge).

Part of the approach is to annotate functions with "ghost code" to enable formal reasoning using our tool Creusot with no impact to run-time. The goal being to verify fragments of core, there is an automated "erasure check" that if you remove this ghost code, the remaining code must match the original code in core. This check works for most functions, except the three in the title.

In the swap_with_slice example above, the method as_mut_ptr returns a raw pointer, and we extend its result with some "ghost data" that lets us record what the pointer is pointing to. Unfortunately for us, this "ghost data" carries the same lifetime as the input borrow, whereas previously as_mut_ptr only returned a raw pointer so there were no lifetimes in the result. Because of this, self is no longer available to call self.len().

Hence our current solution is to rewrite this function to get self.len() first, and I'm wondering if it would make sense to upstream this single change to core, instead of extending our "erasure check" with an ad hoc rule for this case. (Our verified version of swap_with_slice for the curious.)

Indeed, even outside of formal verification, one could argue that self.as_mut_ptr() "morally borrows" from self, so calling self.len() while the raw pointer self.as_mut_ptr() is still in use is kinda smelly.

I'm curious to hear people's thoughts on this.

The core thing to me with this is that if you did this, I expect someone will change it back to the other way since it's "obviously" unnecessary.

What's your plan to have a test for it to keep it from regressing?


But at a broader level, note that .len() is special. We do things like rust/compiler/rustc_mir_transform/src/lower_slice_len.rs at main · rust-lang/rust · GitHub that special-case it already. It's special in the ABI since .len() is actually just looking at a function parameter, etc.

So I think you should just support this. Could probably even enable that particular mir pass if you need it.

2 Likes

What's your plan to have a test for it to keep it from regressing?

I think a comment explaining why those functions would be written that way should be enough to dissuade Rust maintainers from rewriting it without a stronger reason.

But at a broader level, note that .len() is special. (...)

As far as I understand, these special cases are motivated by performance, or are there any that affect the functional behavior of .len()?

I can recognize that what I'm proposing is a hard sell. Even if my proposal doesn't win, what I'm hoping to get out of this discussion is also a clearer understanding of what makes self.len() special in this context, if it is indeed special.

I don't know if I'd say this is a good way to think of it, but a slice reference is kind of a pointer/metadata pair (where the metadata is the length), so as_mut_ptr could be seen as exclusively borrowing the pointer part but not the metadata part, similar to what you can do with separate fields of tuples (or structs). It wouldn't have to only be an optimization thing. (The special bit would then be that you're doing it using methods instead of direct field access, but at least that's instantaneous special behavior rather than ongoing, and something at least some people would like to support for arbitrary methods some day.)

I realised a while ago that raw pointers conceptually have a lifetime (which is the lifetime of their provenance, not the lifetime of the thing they point to). Under that model, you would indeed have to copy the length out of a slice reference before convering it to a pointer, if you wanted to use it at the same time as the pointer.

With current (and all likely future) Rust operational semantics, Rust code is allowed to violate the mutable borrow rules as long as they don't access any bytes of memory that the mutable reference is also used to access. As such, the following is sound (although it can't be written using safe Rust):

fn f(s: &mut [u8;2]) -> u8 {
    let p = s.as_mut_ptr();
    s[1] = 4;
    unsafe { p.write(5); }
    s[1]
}

but if you wrote s[0] twice rather than s[1], it would be undefined behaviour.

The current Rust code for swap_with_slice is effectively doing the same thing: it's creating what are conceptually overlapping mutable references, but they're accessing disjoint memory (the length and the slice contents) and thus no actual breach of unsafe Rust's rules has occurred.

In general, I'd be in favour of rewriting it to put the len() first, because (as you say) it makes the soundness analysis vastly simpler – there's no longer a need to rely on a somewhat obscure rule that is specific to unsafe Rust, meaning that proof tools that think along safe-Rust lines will be much better at understanding what is going on. (On the other hand, code like the existing code is to many programmers "obviously correct", so it's important that unsafe Rust's operational semantics permits it: there would be too much of a chance of people writing undefined code by mistake otherwise.)

I don't make the decisions, but if I did, I would probably move the len() call first with a comment saying something like "it is easier for automated proof tools to prove this code correct if it doesn't attempt to read the length through a mutable reference while a pointer derived from it is currently in use" (but ideally with better / more succinct wording).

1 Like