Miri approved way to rejoin the slices

(backstory in in this post).

I am writing a bit of unsafe code, and miri is unhappy :slight_smile: What I am building is essentially an allocator that uses &'m mut [u8] as a backing store.

One trick I want to allow is creating a temporary sub-allocator from the suffix of the slice, which I can do using split_at_mut. Now, once the sub-allocator dies, I want to rejoin its slice back to the main one.

My current code makes miri unhappy :sob:

Is there any way to make this work? Or is this actually unsound for some reason?

This feels pretty unsound to me. For example inside the closure passed to with_scratch I could do *mem = Mem::new(Box::leak(Box::new([]))); and if size was not 0 then the Mem after with_scratch return will allow me to write to the address 0x1.

1 Like

As documented under Provenance:

Shrinking provenance cannot be undone: even if you โ€œknowโ€ there is a larger allocation, you canโ€™t derive a pointer with a larger provenance. Similarly, you cannot โ€œrecombineโ€ two contiguous provenances back into one (i.e. with a fn merge(&[T], &[T]) -> &[T] ).

That said, while the provenance cannot be reestablished, I don't think it's UB if you know the pointer is valid...

4 Likes

Maybe something like this? Rust Playground

I changed the &mut Mem<'m> into a MemView<'_, 'm>, which is essentially the same but doesn't allow swapping the Mem<'m>, only forwarding operations. Then I fixed MIRI by keeping around a raw pointer to the whole allocation as a way to keep the original provenance.

1 Like

Aha, that's almost it! I think it's relatively important to keep using the Mem type, so that its possible to recursively sub-divide both 'm and 's further, but that seems achievable!

Maybe something like this could work?

Rust Playground

Given a MemView you can still keep subdivide it recursively though.

I feel like you need some pretty weird trick to keep the &mut Mem<'m> argument in the closure since it pretty much allows the user to change it how they want, including "extracting" the inner slice.

They have the same API, but different types, so the client needs to choose between Mem or MemView. This is solvable by always using MemView and allocating a dummy MemView at the start, but then MemView has an extra lifetime...

Hm, I think such a trick actually exists, if we replace the new constructor with a with function:

pub fn with<T>(raw: &mut [u8], f: impl FnOnce(&mut Mem<'_>) -> T) -> T {

It prevents constructing a Mem with a specific lifetime, and that perhaps closes the hole? playground.

The limitation is of-course that now it's impossible to escape borrowed data beyound the mem, so something like

fn alloc_int(buf: &mut [u8]) -> &mut int

becomes impossible to do via Mem API

1 Like

Previous thread on this topic: Safe slice rejoining

1 Like

Maybe a middle-ground could be Mem<'m, View = ()>, and have the new() yield a Mem<'m>, but the alloc and so on APIs would be generic over it. Finally, the with_scrap would then offer &mut Mem<'m, View> for some struct View(()); or smth like that.

It's basically the MemView suggestion but hidden within a generic parameter, to avoid code duplication / keep the docs readable :slightly_smiling_face:


Regarding the OP, I think the key line / notion is about this operation:

let (hd, tl) = (
    slice::from_raw_parts_mut(ptr.add( 0 ), mid ),
    slice::from_raw_parts_mut(ptr.add(mid), size),
);
self.raw = hd;
ret = scope(self, &mut Mem::new(tl));
let new_len = self.raw.len();
let new_ptr = self.raw.as_mut_ptr();
// Imbue `new_ptr`, with the original provenance of `ptr` 
// (one thus spanning over the last `size` bytes, contrary to that of
// `new_ptr` โŠ† `hd` )
let new_ptr = ptr.offset(new_ptr.offset_from(ptr)); // ๐Ÿ‘ˆ
self.raw = slice::from_raw_parts_mut(new_ptr , new_len + size);

This p.offset(q.offset_from(p)), i.e., p + (q - p) in pseudo-code, is the typical way to get a pointer whose address is that of q, and yet retains its original provenance, since offsets don't alter it. Hopefully with with_addr() and the rest of the provenance-based APIs this will be easier to write.

Is there any reason that this can't work? I see you're discussing something about swapping out the Mem object, but even if you take ownership of the sub-mem passed to with_scratch with some sort of swap, I don't see how it can outlive the closure.

Edit: Or even this where we give away ownership of the sub-mem outright.

An unrelated point, but the backing storage should be &mut [MaybeUninit<u8>] to avoid this from being UB:

use std::mem::MaybeUninit;

fn main() {
    let mut buf = [0u8; 4];
    let mut mem = Mem::new(&mut buf);
    let x = mem.alloc(MaybeUninit::<u8>::uninit()).unwrap();
    println!("{:?}", buf);
}
2 Likes

I thin this is broken for the same reason the original code was broken, here's a segfaul:

fn main() {
    let mut buf = [0u8; 4];
    let mut mem = Mem::new(&mut buf);
    mem.with_scratch(2, |mem, scratch| {
        *mem = Mem::new(Box::leak(Box::new([])));
    });
    let segfault = mem.alloc::<u8>(0);
}

Oh wow! Do I understand this right? What happens here is that we change the buf itself to be uninitialized. That is, the API allows allocating whatever from the underlying bytes, and we can allocate, eg, MaybeUninit, through which we can mark original memory as no-longer initialized. So, by the time we drop mem, the original buffer contains radioactive garbage which we shouldn't be able to touch?

I wonder if MaybeUninit is even needed here? Like, what if we allocate T with some padding bytes, into &buf, and then look at those padding bytes? Would that be UB?

Ah, I misunderstood. I was looking at swapping the sub-buffer, but the issue was when swapping self. However, it seems to me that this could be fixed by checking whether self has been swapped for a different buffer. playground

Yes, padding causes the same issue.

1 Like