Pointers to the heap can have unbounded provenance

abgros · March 31, 2025, 7:07pm

This code runs with no problems under Miri:

use stalloc::SyncStalloc;

#[global_allocator]
static GLOBAL: SyncStalloc<1024, 8> = SyncStalloc::new(); 

trait Wtf<T> {
    unsafe fn get_unchecked_no_ub(&mut self, idx: usize) -> &mut T;
}

impl<T> Wtf<T> for Vec<T> {
    unsafe fn get_unchecked_no_ub(&mut self, idx: usize) -> &mut T {
        unsafe { &mut *self.as_mut_ptr().add(idx) }
    }
}

fn main() {
    let mut first = vec![34u64; 5];
    let second = vec![10.0f64.powi(-320); 2];
    let third = vec![true; 16];

    for idx in 0..first.len() + second.len() + third.len() / 8 {
        unsafe {
            *first.get_unchecked_no_ub(idx) = 0;
        }
    }

    for elem in third {
        assert_eq!(elem, false);
    }
}

I was puzzled by this until I realized that the pointer owned by the Vec has provenance over the entire allocator. Since the global allocator is implemented within Rust, Miri is able to inspect its implementation and reason about it as though it were just a regular array.

In other words, under the current memory model (according to Miri):

Use-after-free is totally fine (the Vec can call dealloc() and then secretly keep a copy of the original pointer that retains its provenance).
Out-of-bounds indexing is fine, as long as the target happens to contain a valid value.
A Vec can arbitrarily overwrite another Vec, even one supposedly declared as immutable (the heap itself has interior mutability).
It's possible to have two Vecs alias each other (as long as you take care not to create aliasing &mut references).
Using uninitialized memory as fine, as long as that memory happens to have been initialized earlier.
Even memory leaks aren't caught (that should be fixable, though).

Naturally, it would be better to have Miri catch these sorts of errors. I think the solution would be to have the ability to create a "subprovenance" — a pointer with a more restricted provenance than its ancestor. Calling deallocate() would then (via some magic function) nullify the subprovenance while still preserving every other allocation.

CAD97 · March 31, 2025, 10:22pm

This isn't quite supposed to be accurate; the shims between Global and #[global_allocator] introduce the magic that allows for allocation elision and launders object identity.

However, this is a very open question as to how to solve within the memory model, because now you end up with two distinct rust allocated objects that exist in the same machine bytes. This is generally assumed to be impossible (and in fact, the address comparison will optimize to false despite being true if you hide it from optimization), so something needs to change, but nobody is quite sure what. There are a couple ideas, but their impact to other derived properties that are used for optimizations range from prohibitive to unclear.

cc @RalfJung

Evian-Zhang · April 3, 2025, 8:48am

I think the global allocator should participate in miri's check, instead of checked by miri. (I don't mean we don't need to check the allocator implementation, but it should be done individually inside corresponding allocator crate).

Even more, there are many use cases where people tend to use a vec to hold all elements, and replace references with indices in the vec (the "arena" approaches). In some point of view, this is also a simplified version of heap allocator, and will also involves semantically UAF or double-free.

As a result, a simple "superprovenance" may not be sufficient, since there may be multiple-layer heap allocators which builds upon another. Instead, I prefer let them participate miri's check to provide semantic information about custom heap information.

RalfJung · April 3, 2025, 2:22pm

I think you've just rediscovered this issue:

github.com/rust-lang/miri

Refresh provenance of global allocator

opened 07:17AM - 22 Nov 22 UTC

RalfJung

C-enhancement I-misses-UB A-allocator

It would seem nice if Miri could detect an error in the following code: ```rust… #![feature(allocator_api, slice_ptr_get)] use std::alloc::{Allocator, Global, Layout, System}; #[global_allocator] static A: System = System; fn main() { let l = Layout::from_size_align(1, 1).unwrap(); let ptr = Global.allocate(l).unwrap().as_non_null_ptr(); unsafe { System.deallocate(ptr, l); } } ``` That would basically reflect that the global allocator entry points are special magic and cannot be interchanged with directly calling the underlying allocator. (This doesn't catch all possible issue called by the magic of these symbols, e.g. it does not reflect that LLVM can replace heap allocations by stack allocations or even remove them entirely under some circumstances.) To implement this we'll probably want the __rust_alloc shim to generate new provenance for the allocation (to distinguish it from the underlying allocation generated by `System`) and __rust_dealloc should undo that transformation. The details are pretty unclear though -- do we have two AllocId with the same address or do we use something more like Stacked Borrows to realize this "stacking" of allocations? Related discussion: https://github.com/rust-lang/wg-allocators/issues/108.

Topic		Replies	Views
Tree Borrows explained Unsafe Code Guidelines	31	4982	December 22, 2024
Miri approved way to rejoin the slices	15	1588	January 8, 2023
Rc and internal mutability libs	22	2511	March 21, 2020
Can `Pin::map_unchecked_mut` actually be used safely at all? Unsafe Code Guidelines	7	1668	August 22, 2019
Writing through a pointer derived from a shared reference, after that reference is dead Unsafe Code Guidelines	7	1604	March 27, 2019

Pointers to the heap can have unbounded provenance

Related topics