Pre-RFC: core::ptr::simulate_realloc

Hi, this is a re-written version of the previously posted draft RFC#3700. Apologies for doing this slightly backwards, once there's some agreement here I'll update the previous PR with (an updated version of) the following. Accompanying implementation is posted at PR#130886.


Summary

Add a helper for primitive pointer types to facilitate modifying the address of a pointer. This mechanism is intended to enable the use of architecture features such as AArch64 Top-Byte Ignore (TBI) to facilitate use-cases such as high-bit pointer tagging. An example application of this mechanism would be writing a tagging memory allocator.

Motivation

The term "pointer tagging" could be used to mean either high-bit tagging or low-bit tagging. Architecture extensions such as AArch64 Top-Byte Ignore make the CPU disregard the top bits of a pointer when determining the memory address, leaving them free for other uses.

This RFC is specifically concerned with creating those high-bit tagged pointers for systems which can make use of such architecture features. High-bit tagged pointers pose a somewhat tricky challenge for Rust, as the memory model still considers those high bits to be part of the address. Thus, from the memory model's perspective, changing those bits puts the pointer outside of its original allocation, despite it not being the case as far as the hardware & OS are concerned. This makes loads and stores using the pointer Undefined Behaviour, despite the fact that if such loads and stores were to be directly done in assembly they would be perfectly safe and valid.

Whenever this RFC refers to a "tagged pointer", it should be taken to mean a pointer that had some of its top bits set to non-0 values.

Tagged pointers are pointers in which the unused top bits are set to contain some metadata - the tag. No 64-bit architecture today actually uses a 64-bit address space. Most operating systems only use the lower 48 bits, leaving higher bits unused. The remaining bits are for the most part used to distinguish userspace pointers (0x00) from kernelspace pointers (0xff), at least on Linux. Certain architectures provide extensions, such as TBI on AArch64, that make it easier for programs to make use of those unused bits to insert custom metadata into the pointer without having to manually mask them out prior to every load and store. This tagging method can be used without said architecture extensions - by masking out the bits manually - albeit said extensions make it more efficient.

Currently, Rust does not support directly using TBI and related architecture extensions that facilitate the use of tagged pointers. This could potentially cause issues in cases such as working with TBI-enabled C/C++ components over FFI, or when writing a tagging memory allocator. While there is no explicit support for this in C/C++, due to there not being Strict Provenance restrictions it is straightforward to write a 'correct' pointer tagging implementation by simply doing a inttoptr cast inside the memory allocator implementation, be it a custom C malloc or using a custom C++ Allocator. The goal of this effort is to create a Rust API for implementing this type of functionality that is guaranteed to be free of Undefined Behaviour.

There needs to be a low-level helper in the standard library, despite the relatively niche use case and relative simplicity, so that there is a single known location where Miri hooks can be called to update the canonical address. This will make it easier to modify pointer addresses without breaking the Rust memory model in the process.

Guide-level explanation

This RFC adds one associated function to core::ptr: pub unsafe fn simulate_realloc<T>(mut original: *mut T, new_address: usize) -> *mut T

use core::ptr::simulate_realloc;

let tag = 63;
let new_addr = ptr as usize | tag << 56;
let tagged_ptr = unsafe { simulate_realloc(ptr, new_addr) };

The purpose of this function is to indicate to the compiler that an allocation that used to be pointed to by a given pointer can now only be accessed by the new pointer with the provided new address. This is supposed to be semantically equivalent to a realloc from the untagged address to the tagged address, and conceptually similar to a move - it is no longer valid to access the allocation through the untagged pointer or any derived pointers. That being said, no actual reallocation is done - the underlying memory does not change, it only changes within the Rust memory model.

Reference-level explanation

As previously explained, the memory model we currently have is not fully compatible with memory tagging and tagged pointers. Setting the high bits of a pointer must be done with great care in order to avoid introducing Undefined Behaviour, which could arise as a result of violating pointer aliasing rules - using two 'live' pointers which have different 64-bit addresses but do point to the same chunk of memory would weaken alias analysis and related optimisations.

We can avoid this issue by simulating a realloc from the untagged address to the tagged address. To do so, we need the helper function to return a pointer that will be annotated in LLVM IR as noalias, as per the following excerpt.

On function return values, the noalias attribute indicates that the function acts like a system memory allocation function, returning a pointer to allocated storage disjoint from the storage for any other object accessible to the caller.

This will result in the new pointer getting a brand new provenance, disjoint from the provenance of the original pointer.

Every change to the high bits has to at least simulate a realloc and we must ensure the old pointers are invalidated. This is due to the aforementioned discrepancy between how Rust & LLVM see a memory address and how the OS & hardware see memory addresses. From the OS & hardware perspective, the high bits are reserved for metadata and do not actually form part of the address (in the sense of an 'address' being an index into the memory array). From the LLVM perspective, the high bits are part of the address and changing them means we are now dealing with a different address altogether. Having to reconcile those two views necessarily creates some friction and extra considerations.

Function signature, documentation and implementation:

/// Simulate a realloc to a new address
///
/// Intended for use with pointer tagging architecture features such as AArch64 TBI.
/// This function creates a new pointer with the address `new_address` and a brand new provenance,
/// simulating a realloc from the original address to the new address.
/// Note that this is only a simulated realloc - nothing actually gets moved or reallocated.
///
/// SAFETY: Users *must* ensure that `new_address` actually contains the same memory as the original.
/// The primary use-case is working with various architecture pointer tagging schemes, where two
/// different 64-bit addresses can point to the same chunk of memory due to some bits being ignored.
/// When used incorrectly, this function can be used to violate the memory model in arbitrary ways.
/// Furthermore, after using this function, users must ensure that the underlying memory is only ever
/// accessed through the newly created pointer. Any accesses through the original pointer
/// (or any pointers derived from it) would be Undefined Behaviour.
#[inline(never)]
#[unstable(feature = "ptr_simulate_realloc", issue = "none")]
#[cfg_attr(not(bootstrap), rustc_simulate_allocator)]
#[allow(fuzzy_provenance_casts)]
pub unsafe fn simulate_realloc<T>(original: *mut T, new_address: usize) -> *mut T {
    // FIXME(strict_provenance_magic): I am magic and should be a compiler intrinsic.
    // How do we get a brand-new provenance for Strict Provenance?
    let mut ptr = new_address as *mut T;
    // SAFETY: This does not do anything
    unsafe {
        asm!("/* simulate realloc from {original} to {ptr} */",
         original = in(reg) original, ptr = inout(reg) ptr);
    }
    // FIXME: call Miri hooks to update the address of the original allocation
    ptr
}

To ensure that the function actually simulates a realloc, we need to make sure that it is treated similarly to real allocator functions in the codegen stage. That is to say, the function return value must be annotated with noalias in LLVM, as explained earlier in this section. One way to do so would be through a rustc built-in attribute similar to e.g. rustc_allocator - rustc_simulate_allocator. This attribute will be passed down to the codegen stage so that the codegen can appropriately annotate the function.

Drawbacks

Such a low-level helper is inherently highly unsafe and could be used to violate the memory model in many different ways, so it will have to be used with great care. The approach of simulating a realloc is unfortunate in that it makes the support we add to the language more restrictive than the actual hardware reality allows for, but this seems to be the only solution available for the time being as modifying the entire stack to support disregarding the top bits of a pointer would be a non-trivial endeavour.

Rationale and alternatives

Without having a dedicated library helper for modifying the address, users wanting to make use of high-bit tagging would have to resort to manually using bitwise operations and would be at risk of inadvertently introducing Undefined Behaviour. Having a helper for doing so in the library creates a place where e.g. Miri hooks can be called to let Miri know that a pointer's cannonical address has been updated.

It is most likely not feasible to make simulate_realloc() safe to use regardless of the context, hence the current approach is to make it an unsafe function with a safety notice about the user's responsibilities.

Prior art

TBI already works in C, though mostly by default and care must be taken to make sure no Undefined Behaviour is introduced. The compiler does not take special steps to preserve the tags, but it doesn't try to remove them either. That being said, the C/C++ standard library does not take tagging schemes into account during alias analysis. With this proposal, Rust would have much better defined and safer support for TBI than C or C++.

Notably, Android already makes extensive use of TBI by tagging all heap allocations.

The idea is also not one specific to AArch64, as there are similar extensions present on other architectures that facilitate working with tagged pointers.

Unresolved questions

What is the best way to make this compatible with Strict Provenance? We want to be able to create a pointer with an arbitrary address, detached from any existing pointers and with a brand-new provenance. From the LLVM side this can be handled through generating inttoptr which does not have the same aliasing restrictions as getelementptr alongside annotating the function return value as noalias which can be done with the aforementioned new built-in attribute. Is this enough for it to fit within the Strict Provenance framework? If not, how can we make it fit?

What should the helper actually be called? Something like simulate_realloc or change_addr could be useful at making it clear to the user what semantic implications using this helper has. There may be better names that I have not thought of yet.

Future possibilities

With a low-level helper for changing the address such as the one proposed here, it would be trivial to add helper functions for supporting specific tagging schemes to std::arch. All of those architecture-specific functions would internally use this helper.

Whilst the realloc-like approach is restrictive today, at some point in the future if the LLVM memory model gains an understanding that the address is only made up of the lower 56 bits, this restriction could be relaxed. It would then allow both the original and the tagged pointer to be valid and aliased at the same time.

On compatible platforms, interesting use-cases might be possible, e.g. tagging pointers when allocating memory in Rust in order to insert metadata that could be used in experiments with pointer strict provenance.

3 Likes

This wouldn't just affect pointers but also references, right? Since userspace addresses aren't allowed to be in the upper half of the address space that means we could have 63bits worth of of niches in references on platforms with such address space splits (something I'm experimenting with). Adding support for logically "moving" things to the upper half would seem incompatible with that.

This only mentions ARM CPU extensions. To what extent is this compatible with Intel LAM and whatever the AMD one was called? It would probably be good to mention more than just one vendor's technology in the RFC to show that this isn't just a niche concern.

I believe different platforms mask off a different number of bits. Rust prides itself of being cross platform. How will this this difference be abstracted away? Can it be? And if not, should Rust even support this?

Perhaps change_tag_bits, and we assert in documentation that it's UB to use this for anything other than changing bits that are ignored by the platform for determining if two pointers point to the same physical memory location?

The idea is that you can't use this for pointer trickery, like having two independent allocations of the same size and alignment that you move between, but you can use it without UB if the platform says that certain bits in a virtual address are ignored.

And we put it in terms of "ignored by the platform" so that as well as covering things like Arm TBI, Arm MTE and Intel LAM, we also cover cases where the OS is defined as setting up aliases in your paging structures, or the legacy MIPS kseg0 and kseg1.

Isn't this just ptr::with_addr()? There is even an example for tagged pointers in the ptr documentation std::ptr - Rust

1 Like

What should the helper actually be called?

moved()?

1 Like

The existing with_addr method preserves provenance.

Why do you want brand-new provenance? How would that work on platforms like CHERI where you're not allowed to forge one?

3 Likes

Hm I'm not exactly sure what you mean? On platforms which support the extensions in question you can already have references like that, e.g. if you write Rust code for Android whatever you allocate will come with a high-bit tagged address. We are not actually moving anything, the idea here is that the allocated object is already accessible at the tagged address anyway and all we're trying to do is to just make pointer which you can use to perform that access without UB. You can simply do (addr | tag) as *const i32 and using that pointer will Just Work - but it's technically UB.

The way the helper is currently designed and described above, it would be fully compatible with Intel's LAM and AMD's UAI, and not even specific to pointer tagging. There's nothing AArch64-specific there, though I focus on that in the description because that's the one I'm actually familiar with.

The idea is that even though different platforms mask off different bits, in the end it boils down to creating a new independent pointer which is what this method is for and this is the actually tricky part. Once we have this helper, it will be trivial to write tagging methods for any tagging scheme just by doing simulate_realloc(ptr, tagged_address). Same for a cross-platform tagging solution which just applies the appropriate architecture's helper based on cfgs, but that's much higher level. This RFC is just for the low-level helper.

Could do, in my initial draft it was called with_tag but then people pointed out that this doesn't account for low-bit tagging and that it's not even specific to tagging in general. Plus it may seem a little confusing given that the method literally just creates a new pointer.

Would be much easier if it was, but sadly it is not. with_addr internally uses wrapping_offset which in turn uses getelementptr. Thus a pointer you get from with_addr is only valid to access as long as it ends up within the original allocation. Because the memory model does not recognise TBI et al, according to the memory model a high-bit tagged pointer is terabytes outside of its original allocation, which makes it UB to load and store from it. This is the problem I'm trying to get around of and from previous discussions with Ralf we've concluded that a simulated realloc is the way to go.

The example in the documentation is the other kind of tagging - low-bit tagging. Low-bit tagging does not have this problem because the resulting pointer is still within the original allocation. Besides the example removes the tag prior to loading from the pointer anyway so none of these considerations apply as the pointer is never actually used.

As in the reply above, that's because it uses wrapping_offset and GEP. We can't do that for high-bit tagging because that's UB. Getting a new provenance and pretending we've reallocated to the tagged address based on previous discussions is the only way we've found so far to make this work within the memory model.

CHERI is a good question, I am not very familiar with CHERI but it's a pretty unique platform and I'm not even sure if the use-cases for this still matter on there. This is aimed at all the "traditional" architectures which do use high-bit pointer tags in this way.

2 Likes

Idk about android, but at least on vanilla linux they shouldn't because that would be a breaking change of documented behavior that userspace addresses don't have the top bits set (especially the MSB=1 belonging to the kernel), which means userspace can make that assumption.

The documentation you linked is out of date on the website. If you pull the latest master and check in the source it's updated. https://lore.kernel.org/all/20240702091349.356008-1-kevin.brodsky@arm.com/

1 Like

Is this not meant to be user-facing documentation?

Otherwise it seems like it would be part of the ABI (things one can assume about pointers / what is allowed to be passed to the kernel in syscalls) and that's a breaking change and would warrant a new triple.

This documentation is not an ABI guarantee, it is a document that's meant to describe how memory works on AArch64 in the kernel. Documents describing the ABI are over here. From experience a lot of the time when something in the kernel gets changed it takes quite a while for someone to bother with updating these documents. As per this email, TBI has been enabled for userspace since 2012..

I think naming it simulate_realloc or something is a much better name than with_tag or similar names -- it's also useful for when you have a piece of memory mapped to two different locations, e.g. a faster ring buffer where you have two mappings of the same 1MB of memory in a 2MB range, which allows you to simply simulate_realloc and then write up to 1MB in one operation without needing to split it for wrapping around the end of the ring buffer (since you can treat the range from the current write position as if it's a 1MB allocation, though that might be difficult spec-wise if not accessed from only one thread at a time) and just increment by the written amount and wrap the write pointer by 1MB afterwards. This also helps motivate why the proposed operation is useful even if you don't have ARM TBI or equivalent.

I think that kind of use-case should be part of the motivation e.g. you mmap the exact same memory to two locations and then use the realloc intrinsic to access both of them without any further mmap calls needed:

4 Likes

I would prefer this to be a separate function (even if it has the same implementation in the backend), because it differs in an important detail; TBI, LAM and similar are (so far) all specified to consider the "virtual address" for the purposes of the rest of the CPU spec (including guarantees such as forward progress) as the one you get after masking out the tag bits, whereas what you're describing is two different virtual addresses that map to the same physical address.

And I want to be very clear about when I'm doing pointer crimes that rely on two virtual addresses mapping to the same physical address, versus when I'm doing pointer crimes that rely on the CPU ignoring some bits of the virtual address, precisely because the CPU is allowed to have different behaviours for the two different cases.

Then what's the user documentation that shows which address ranges are available for use with MAP_FIXED_NOREPLACE on various platforms and which are forbidden ranges? I thought that (and equivalents for other arches) was the page.

It's not a question of "want", it's a question of "need".

LLVM and by extension Rust assume that every allocation exists at exactly one place in the address space. You can't just toggle the 59th bit and use that pointer; for LLVM this looks like you did a wrapping_offset(1 << 59) and that means you are way out-of-bounds of the allocation, and a subsequent access is UB.

The only way I can think of to fit an action like this into the LLVM memory model is to say that logically, we are moving the allocation to a different place in the address space. That's why this is basically a realloc. So this is very, very different from with_addr.

Those situations are logically equivalent, we should treat them the same. What top-byte-ignore really does, in my mind, is make it so that the lower 2^56 bytes of the address space are repeated 256 times to fill up the full 2^64 bytes of the address space. That's very much like the mmap situation of having an allocation mapped in multiple places.

3 Likes

They are not logically equivalent on at least some CPUs that Rust supports. The memory model for normal loads and stores (but not atomic operations) says that two accesses touch the same location if the virtual address is the same. For TBI, there are 256 ways to specific the same virtual address, and in terms of the single-thread ordering of memory visibility, it doesn't matter which of those addresses you choose - they're all the same, and a store to 0x80000000_00001000 and a load from 0x1000 are considered to be the accessing the same location.

In contrast, if you have the same physical address mapped to two different virtual addresses, the CPU is allowed to treat them as different locations for the purposes of single-threaded memory ordering - only the atomic operations care about physical addresses. So if you map a 4096 byte buffer to 0x1000 and 0x2000, store to 0x1000, and then load from 0x2000, the CPU is permitted to reorder within a single thread such that the load from 0x2000 does not see the previous store to 0x1000, even though it is not permitted to reorder such that a load from 0x1000 does not see the previous store to 0x1000

Thus, TBI changes to an address are not going to result in the physical machine surprising you any more than it does without calling this function. But the addresses you get by playing virtual mapping games introduce the potential for a single thread to get surprising results, and I want it to be abundantly clear from the code that you've accepted that this can happen.

2 Likes

It seems that simulate_realloc is an incorrect name, given that it's not really like a realloc call which invalidates the old allocation, but rather like an mmap at a different address which still keeps the old allocation alive. Maybe something like compiler_mmap, similarly to compiler_fence?

I think this discussion is missing details about the way that existing systems handle such shenanigans. How does LLVM deal with the issue? How is it handled in C? Sure, C doesn't have the aliasing restrictions of Rust, but it still has provenance and the compiler still tries to optimize accesses based on known aliasing information. If I understand the issue correctly, this code works:

int a = 3;
int *p = &a;
int *r = (int *)(((uintptr_t)p) | TAG);
*r = 5;
assert a == 3;

But shouldn't the compiler load the cached value of a, given that it doesn't seem to alias any known places and the pointers are entirely different?

1 Like