Miri approved way to rejoin the slices

matklad · October 6, 2022, 3:50pm

(backstory in in this post).

I am writing a bit of unsafe code, and miri is unhappy What I am building is essentially an allocator that uses &'m mut [u8] as a backing store.

One trick I want to allow is creating a temporary sub-allocator from the suffix of the slice, which I can do using split_at_mut. Now, once the sub-allocator dies, I want to rejoin its slice back to the main one.

My current code makes miri unhappy

Is there any way to make this work? Or is this actually unsound for some reason?

SkiFire13 · October 6, 2022, 4:27pm

This feels pretty unsound to me. For example inside the closure passed to with_scratch I could do *mem = Mem::new(Box::leak(Box::new([]))); and if size was not 0 then the Mem after with_scratch return will allow me to write to the address 0x1.

eggyal · October 6, 2022, 4:29pm

As documented under Provenance:

Shrinking provenance cannot be undone: even if you “know” there is a larger allocation, you can’t derive a pointer with a larger provenance. Similarly, you cannot “recombine” two contiguous provenances back into one (i.e. with a fn merge(&[T], &[T]) -> &[T] ).

That said, while the provenance cannot be reestablished, I don't think it's UB if you know the pointer is valid...

SkiFire13 · October 6, 2022, 4:47pm

Maybe something like this? Rust Playground

I changed the &mut Mem<'m> into a MemView<'_, 'm>, which is essentially the same but doesn't allow swapping the Mem<'m>, only forwarding operations. Then I fixed MIRI by keeping around a raw pointer to the whole allocation as a way to keep the original provenance.

matklad · October 6, 2022, 4:50pm

Aha, that's almost it! I think it's relatively important to keep using the Mem type, so that its possible to recursively sub-divide both 'm and 's further, but that seems achievable!

cpud36 · October 6, 2022, 4:59pm

Maybe something like this could work?

Rust Playground

SkiFire13 · October 6, 2022, 5:00pm

Given a MemView you can still keep subdivide it recursively though.

I feel like you need some pretty weird trick to keep the &mut Mem<'m> argument in the closure since it pretty much allows the user to change it how they want, including "extracting" the inner slice.

matklad · October 6, 2022, 5:21pm

They have the same API, but different types, so the client needs to choose between Mem or MemView. This is solvable by always using MemView and allocating a dummy MemView at the start, but then MemView has an extra lifetime...

matklad · October 6, 2022, 8:07pm

Hm, I think such a trick actually exists, if we replace the new constructor with a with function:

pub fn with<T>(raw: &mut [u8], f: impl FnOnce(&mut Mem<'_>) -> T) -> T {

It prevents constructing a Mem with a specific lifetime, and that perhaps closes the hole? playground.

The limitation is of-course that now it's impossible to escape borrowed data beyound the mem, so something like

fn alloc_int(buf: &mut [u8]) -> &mut int

becomes impossible to do via Mem API

bascule · October 6, 2022, 8:22pm

Previous thread on this topic: Safe slice rejoining

dhm · October 9, 2022, 5:42pm

Maybe a middle-ground could be Mem<'m, View = ()>, and have the new() yield a Mem<'m>, but the alloc and so on APIs would be generic over it. Finally, the with_scrap would then offer &mut Mem<'m, View> for some struct View(()); or smth like that.

It's basically the MemView suggestion but hidden within a generic parameter, to avoid code duplication / keep the docs readable

Regarding the OP, I think the key line / notion is about this operation:

let (hd, tl) = (
    slice::from_raw_parts_mut(ptr.add( 0 ), mid ),
    slice::from_raw_parts_mut(ptr.add(mid), size),
);
self.raw = hd;
ret = scope(self, &mut Mem::new(tl));
let new_len = self.raw.len();
let new_ptr = self.raw.as_mut_ptr();
// Imbue `new_ptr`, with the original provenance of `ptr` 
// (one thus spanning over the last `size` bytes, contrary to that of
// `new_ptr` ⊆ `hd` )
let new_ptr = ptr.offset(new_ptr.offset_from(ptr)); // 👈
self.raw = slice::from_raw_parts_mut(new_ptr , new_len + size);

Demo

This p.offset(q.offset_from(p)), i.e., p + (q - p) in pseudo-code, is the typical way to get a pointer whose address is that of q, and yet retains its original provenance, since offsets don't alter it. Hopefully with with_addr() and the rest of the provenance-based APIs this will be easier to write.

alice · October 10, 2022, 9:09am

Is there any reason that this can't work? I see you're discussing something about swapping out the Mem object, but even if you take ownership of the sub-mem passed to with_scratch with some sort of swap, I don't see how it can outlive the closure.

Edit: Or even this where we give away ownership of the sub-mem outright.

alice · October 10, 2022, 9:15am

An unrelated point, but the backing storage should be &mut [MaybeUninit<u8>] to avoid this from being UB:

use std::mem::MaybeUninit;

fn main() {
    let mut buf = [0u8; 4];
    let mut mem = Mem::new(&mut buf);
    let x = mem.alloc(MaybeUninit::<u8>::uninit()).unwrap();
    println!("{:?}", buf);
}

matklad · October 10, 2022, 10:08am

I thin this is broken for the same reason the original code was broken, here's a segfaul:

fn main() {
    let mut buf = [0u8; 4];
    let mut mem = Mem::new(&mut buf);
    mem.with_scratch(2, |mem, scratch| {
        *mem = Mem::new(Box::leak(Box::new([])));
    });
    let segfault = mem.alloc::<u8>(0);
}

Oh wow! Do I understand this right? What happens here is that we change the buf itself to be uninitialized. That is, the API allows allocating whatever from the underlying bytes, and we can allocate, eg, MaybeUninit, through which we can mark original memory as no-longer initialized. So, by the time we drop mem, the original buffer contains radioactive garbage which we shouldn't be able to touch?

I wonder if MaybeUninit is even needed here? Like, what if we allocate T with some padding bytes, into &buf, and then look at those padding bytes? Would that be UB?

alice · October 10, 2022, 10:30am

Ah, I misunderstood. I was looking at swapping the sub-buffer, but the issue was when swapping self. However, it seems to me that this could be fixed by checking whether self has been swapped for a different buffer. playground

Yes, padding causes the same issue.

system · January 8, 2023, 10:30am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Safe slice rejoining language design	13	1802	June 5, 2022
Pointers to the heap can have unbounded provenance Unsafe Code Guidelines	3	658	April 3, 2025
Pre-RFC: Add join_seq method to slices and strs libs	13	1567	June 11, 2020
Add as_mut_ref for slice or array libs	31	1961	June 9, 2021
Slice patterns in the land of MIR compiler	19	4266	March 25, 2019

Miri approved way to rejoin the slices

Related topics