Raw pointer ergonomics

scottjmaddox · February 1, 2023, 6:22am

I'm currently working on a VM inspired by interaction nets (a graph-based model of computation), and I need to use a lot of cyclic data structures with manual memory management, i.e. a lot of unsafe and raw pointers. And I'm finding that Rust is extremely painful for this use case. I think a lot of the pain would be alleviated by providing an ergonomic pointer-offset mechanism, which could be as simple as a field access. For example:

use std::ptr;

#[derive(Debug, Clone, Copy)]
struct Node(*mut Node, *mut Node);

fn main() {
    let mut node = Node(ptr::null_mut(), ptr::null_mut());
    let node_ptr = &mut node as *mut Node;
    
    // let node_0_ptr = node_ptr.0; // Why not let this be pointer offset?
    
    // Instead, I have to do this:
    let node_0_ptr = unsafe { &mut (*node_ptr).0 as *mut _ };
}

When I want to get an interior pointer, I would like to be able to just do node_ptr.0. Is there a reason this can't be supported?

It would also be nice if you could directly take a pointer using *mut node rather than needing to do &mut node as *mut Node, but I'm guessing supporting *mut node would considerably complicate parsing (if not make it ambiguous). Luckily, that pain is at least alleviated by automatic coercion from &mut _ to *mut _, and can be further alleviated by a user-defined macro.

In most instances I would use Vec and indexes rather than raw pointers, but this is a case where that would make the code even more opaque and less type safe (and less performant for large graphs, due to reallocation and memcopy). I need graphs of nodes with non-uniform size, with pointers to the interiors of other nodes.

I'm writing a low level VM with strict performance considerations. I currently use inlined functions to abstract away the pointer offset boilerplate, but I end up needing a separate function for each field of each node.

Also, it's perhaps worth noting that the current situation requires sprinkling unsafe everywhere, even though the operation, i.e. pointer offset, is not actually unsafe. This makes it more difficult to minimize the use of the unsafe keyword, which will make future auditing of the unsafe code more difficult.

Note: I had originally posted this on the users forum, but 2e71828 helpfully pointed out that this would be a better location.

jhpratt · February 1, 2023, 6:28am

Is this what you're looking for? It seems like it would solve the "issue".

scottjmaddox · February 1, 2023, 6:31am

It looks like that requires a value, not a pointer? I usually don't have an lvalue that I could use addr_of_mut on, just a pointer that I need to offset.

jhpratt · February 1, 2023, 6:36am

Yes, it requires a value. From the example you provided, it appeared you had that. If you do not, please show an example of what would be improved and how.

If you already have a pointer, what is the issue with calling offset? *mut Node inherently does not have any fields, so I don't follow why node_ptr.0 should be allowed. Even if it were allowed, I don't see why it would return a pointer to the first item in the tuple.

scottjmaddox · February 1, 2023, 6:46am

Sorry, was trying to provide a minimal example. I'll see if I can come up with something better.

I would need to know the offset. If I'm not using repr(C) only the compiler knows that. I could of course define a const for each field of each node. Also I would need to use wrapping_offset to avoid sprinkling unsafe everywhere. The result wouldn't be much better than what I currently have.
Even if I am using repr(C), writing node_ptr.offset(0) and node_ptr.offset(8) (assuming I'm limiting myself to 64-bit systems) is considerably less clear than node_ptr.0 and node_ptr.1, and doesn't automatically update the offset (or provide an error) if I change the definition of Node.
The pointer type would not be updated based on the field's type.

Okay, sorry, perhaps I should not have proposed a specific solution, but rather asked for viable solutions.

jhpratt · February 1, 2023, 7:00am

Proposing a solution is fine, I'm just asking questions about the solution, as I don't follow your reasoning.

You can ask the compiler for that information using ptr::addr_of_mut!.

Ultimately if you're trying to write something that is very low level, you're going to need unsafe. While you say that the pointer offset "is not actually unsafe", it very much is. There are documented safety requirements that must be upheld for the code to be sound.

Overall, Rust very much prefers references over pointers. It is best to create abstractions so that you're not dealing with pointers constantly.

scottjmaddox · February 1, 2023, 7:17am

I'm currently trying to define a const offset, so that I can at least avoid the technically UB of &mut (*node_ptr).1 as *mut _, but I'm having trouble. It doesn't seem like ptr::addr_of_mut! helps here. Is there a way to fill these ???'s in the following? I can make it repr(C) and just write the numbers down, but I would like to avoid having to audit the constants when making changes to the Node definitions, if possible.

#[derive(Debug, Clone, Copy)]
struct Node(*mut Node, *mut Node);

const NODE_0_OFFSET: isize = ???;
const NODE_1_OFFSET: isize = ???;

While you say that the pointer offset "is not actually unsafe", it very much is. There are documented safety requirements that must be upheld for the code to be sound.

wrapping_offset is marked safe. offset is unsafe because it is unconstrained. If you have a valid node_ptr: *mut Node, there would be no way for node_ptr.0 or node_ptr.1 (i.e. offsetting to the given field) to wrap.

Overall, Rust very much prefers references over pointers. It is best to create abstractions so that you're not dealing with pointers constantly.

Yes, I have a safe, owning TermGraph wrapper around the graph, but the graph updates have to use raw pointers, since it's a cyclic graph. I'm building the part below the abstraction.

Think doubly-linked list. To implement one in rust, you have to use raw pointers. But the graphs I'm dealing with are considerably more complex.

2e71828 · February 1, 2023, 7:28am

scottjmaddox:

Is there a way to fill these ???'s in the following? I can make it repr(C) and just write the numbers down, but I would like to avoid having to audit the constants when making changes to the Node definitions, if possible.
#[derive(Debug, Clone, Copy)]
struct Node(*mut Node, *mut Node);

const NODE_0_OFFSET: isize = ???;
const NODE_1_OFFSET: isize = ???;

It looks like this will be solved by RFC 3308, which has been accepted and is being implemented now.

scottjmaddox · February 1, 2023, 7:37am

Okay, thanks I'll use memoffset::offset_of for now.

scottjmaddox · February 1, 2023, 8:04am

Thinking more on this, do I think I can define some pointer wrapper types with helper methods for doing pointer offsets. It won't be quite as minimal as field access, but it should be close.

Thanks for the suggestions!

CAD97 · February 1, 2023, 8:16am

Specifically, it would be addr_of_mut!((*node_ptr).0) to project from *mut Node to a pointer to its first field.

There certainly is a hole in the ergonomics of working with raw pointers. This is a known limitation of Rust, and one we're hoping to eventually address. If you look into the definition of addr_of_mut!, you'll see that it expands to the unstable syntax &raw mut $place. It's unlikely that pointer field projection will be done directly with .-based field access syntax, but it's something which we would like to make possible eventually. Importantly, we know it would be very beneficial to have syntax which only does place computation and is guaranteed not to do any autoderef and create implicit temporary references in code meant to be just using pointers. Such operations are still likely going to be unsafe, though, because using inbounds offsets really does have a significant beneficial effect on optimization.

The unsafety of offset actually has very little to do with whether the computation overflows the address space. When you use offset, you're specifically asserting that the source and computed pointers are both pointing to and inbounds of the same allocated object. (The one-past-the-end address counts as inbounds.)

Doing such an offset is always unsafe, because there are requirements that you as the programmer must fulfill. If the pointer is a valid dereferencable pointer, then performing the offset is sound. (So long as you do remember to do a byte offset and not an offset in units of T.)

Absolutely, this is probably the best way of going about things with current Rust, and probably even if/when better facilities for working with raw pointers are available; declare a new type with the semantic identity that you're working with and define API based on what you actually need to do with it. You don't need to make your container abstraction directly out of the raw building blocks; you can and should encapsulate sound ways of doing unsafe things wherever doing so is useful throughout the entire implementation stack.

And even if you can't encapsulate any unsafety, giving yourself a more richer API more specific to the implementation on hand to work with will rarely be a bad idea.

scottjmaddox · February 1, 2023, 8:22am

That's good to know. I'll be sure to use offset and not wrapping_offset in my helper functions/methods.

Thanks for the additional information and suggestions!

scottjmaddox · February 1, 2023, 9:49am

Switched to using offset with memoffset::offset_of, and then proceeded to spend an hour or so debugging confusing panics / segfaults. Turns out offset is in units of T, while offset_of is in units of bytes. Obvious in retrospect, but that's a pretty big footgun...

afetisov · February 1, 2023, 12:32pm

Note that this operation is very likely to lead to UB, because the resulting raw pointer has the same aliasing restrictions as &mut T. The cast itself is technically fine, but it's very likely that you'd be using it in a way that violates &mut T's exclusive aliasing requirements.

You should use the already mentioned ptr::addr_of_mut! macro to get a raw pointer without the aliasing burden.

You can't do this for the same reason you can't do it in C++. _.0 is an access to a field on a place containing a struct. A pointer isn't a place containing a struct, so you need to explicitly dereference it to get one. In C/C++, you would use the postfix -> operator to get the same thing, i.e. node_ptr->_0. But that result of that expression is itself a place, and you need a pointer, so now you must take its address. Thus in C/C++ you would write it as &node_ptr->_0, while in Rust you must write ptr::addr_of!((*node_ptr).0). If raw reference syntax were stable, it would look slightly less verbose: &raw (*node_ptr).0.

cole-miller · February 1, 2023, 2:54pm

Gankra has proposed a ~ operator for this, with an accompanying ~[] operator for ptr::offset:

RalfJung · February 1, 2023, 6:04pm

Reading through the discussion here I am a bit confused. Do you really want offset_of?

The description sounds like what you want is a nicer way to write addr_of_mut!((*node_ptr).0), which has been mentioned a few times in this thread. Is that true? Why do you consider offset_of to be nicer than that? Is it really the case that node_ptr is dangling here so that constructing the place (lvalue) *node_ptr is UB?

I agree we should have nicer syntax for addr_of_mut!((*node_ptr).0), along the lines of Gankra's blog post. We could also possibly relax the UB requirements. I very much hope someone will make this their project and push it through! I don't know if I can stomach another syntax-related RFC alongside all my other projects... so I don't think I will be the driver here.

scottjmaddox · February 1, 2023, 11:07pm

I think addr_of_mut!((*node_ptr).0) is indeed what I want. It won't actually do a read, I assume?

It would definitely be nice to have a syntax that made it more clear that there's no read involved on the pointer projection.

scottjmaddox · February 1, 2023, 11:55pm

An example of what I ended up with:

#[derive(Debug, Clone, Copy)]
#[repr(C, align(8))]
struct Sup {
    l: u64,
    e1: Tagged,
    e2: Tagged,
}

trait SupPtrExt {
    fn l(self) -> *mut u64;
    fn e1(self) -> *mut Tagged;
    fn e2(self) -> *mut Tagged;
}

impl SupPtrExt for *mut Sup {
    #[inline(always)]
    fn l(self) -> *mut u64 {
        unsafe { addr_of_mut!((*self).l) }
    }

    #[inline(always)]
    fn e1(self) -> *mut Tagged {
        unsafe { addr_of_mut!((*self).e1) }
    }

    #[inline(always)]
    fn e2(self) -> *mut Tagged {
        unsafe { addr_of_mut!((*self).e2) }
    }
}

This is sufficiently ergonomic for my use case. Thanks for all of the help, everyone!

CAD97 · February 2, 2023, 3:28am

This is unsound as-is, though. The offset methods must be unsafe because they are UB to use on an invalid pointer.

scottjmaddox · February 2, 2023, 4:16am

Hmm, okay, I'll make the methods unsafe. It's just for internal use, and I can't imagine how this would end up accidentally wrapping, but better safe (unsafe?) than sorry.

Topic		Replies	Views
Ergonomics of raw-pointer slices language design	7	1366	December 10, 2022
Unsafe Deref Trait	4	1523	May 23, 2021
Pre-RFC: raw pointer cleanup	8	3966	March 25, 2019
Pre-RFC: Struct/union raw pointer field access language design	11	2105	April 2, 2020
Computing raw pointers to fields Unsafe Code Guidelines	3	4372	December 22, 2024

Raw pointer ergonomics

Related topics