I am writing a bit of unsafe code, and miri is unhappy What I am building is essentially an allocator that uses &'m mut [u8] as a backing store.
One trick I want to allow is creating a temporary sub-allocator from the suffix of the slice, which I can do using split_at_mut. Now, once the sub-allocator dies, I want to rejoin its slice back to the main one.
My current code makes miri unhappy
Is there any way to make this work? Or is this actually unsound for some reason?
This feels pretty unsound to me. For example inside the closure passed to with_scratch I could do *mem = Mem::new(Box::leak(Box::new([]))); and if size was not 0 then the Mem after with_scratch return will allow me to write to the address 0x1.
Shrinking provenance cannot be undone: even if you โknowโ there is a larger allocation, you canโt derive a pointer with a larger provenance. Similarly, you cannot โrecombineโ two contiguous provenances back into one (i.e. with a fn merge(&[T], &[T]) -> &[T] ).
That said, while the provenance cannot be reestablished, I don't think it's UB if you know the pointer is valid...
I changed the &mut Mem<'m> into a MemView<'_, 'm>, which is essentially the same but doesn't allow swapping the Mem<'m>, only forwarding operations. Then I fixed MIRI by keeping around a raw pointer to the whole allocation as a way to keep the original provenance.
Aha, that's almost it! I think it's relatively important to keep using the Mem type, so that its possible to recursively sub-divide both 'm and 's further, but that seems achievable!
Given a MemView you can still keep subdivide it recursively though.
I feel like you need some pretty weird trick to keep the &mut Mem<'m> argument in the closure since it pretty much allows the user to change it how they want, including "extracting" the inner slice.
They have the same API, but different types, so the client needs to choose between Mem or MemView. This is solvable by always using MemView and allocating a dummy MemView at the start, but then MemView has an extra lifetime...
Maybe a middle-ground could be Mem<'m, View = ()>, and have the new() yield a Mem<'m>, but the alloc and so on APIs would be generic over it. Finally, the with_scrap would then offer &mut Mem<'m, View> for some struct View(()); or smth like that.
It's basically the MemView suggestion but hidden within a generic parameter, to avoid code duplication / keep the docs readable
Regarding the OP, I think the key line / notion is about this operation:
let (hd, tl) = (
slice::from_raw_parts_mut(ptr.add( 0 ), mid ),
slice::from_raw_parts_mut(ptr.add(mid), size),
);
self.raw = hd;
ret = scope(self, &mut Mem::new(tl));
let new_len = self.raw.len();
let new_ptr = self.raw.as_mut_ptr();
// Imbue `new_ptr`, with the original provenance of `ptr`
// (one thus spanning over the last `size` bytes, contrary to that of
// `new_ptr` โ `hd` )
let new_ptr = ptr.offset(new_ptr.offset_from(ptr)); // ๐
self.raw = slice::from_raw_parts_mut(new_ptr , new_len + size);
This p.offset(q.offset_from(p)), i.e., p + (q - p) in pseudo-code, is the typical way to get a pointer whose address is that of q, and yet retains its original provenance, since offsets don't alter it. Hopefully with with_addr() and the rest of the provenance-based APIs this will be easier to write.
Is there any reason that this can't work? I see you're discussing something about swapping out the Mem object, but even if you take ownership of the sub-mem passed to with_scratch with some sort of swap, I don't see how it can outlive the closure.
Edit: Or even this where we give away ownership of the sub-mem outright.
An unrelated point, but the backing storage should be &mut [MaybeUninit<u8>] to avoid this from being UB:
use std::mem::MaybeUninit;
fn main() {
let mut buf = [0u8; 4];
let mut mem = Mem::new(&mut buf);
let x = mem.alloc(MaybeUninit::<u8>::uninit()).unwrap();
println!("{:?}", buf);
}
I thin this is broken for the same reason the original code was broken, here's a segfaul:
fn main() {
let mut buf = [0u8; 4];
let mut mem = Mem::new(&mut buf);
mem.with_scratch(2, |mem, scratch| {
*mem = Mem::new(Box::leak(Box::new([])));
});
let segfault = mem.alloc::<u8>(0);
}
Oh wow! Do I understand this right? What happens here is that we change the buf itself to be uninitialized. That is, the API allows allocating whatever from the underlying bytes, and we can allocate, eg, MaybeUninit, through which we can mark original memory as no-longer initialized. So, by the time we drop mem, the original buffer contains radioactive garbage which we shouldn't be able to touch?
I wonder if MaybeUninit is even needed here? Like, what if we allocate T with some padding bytes, into &buf, and then look at those padding bytes? Would that be UB?
Ah, I misunderstood. I was looking at swapping the sub-buffer, but the issue was when swapping self. However, it seems to me that this could be fixed by checking whether self has been swapped for a different buffer. playground