Is it just API design work that is needed? Or is there opsem work or LLVM work as well that has to be done?
Thank you for the detailed response - I'll address each point.
You're right that observing what happens under UB is not a basis for argument, and that the examples in question involved dereferencing pointers without provenance. I've removed the section entirely - the case rests on the Need and significance section and the cost analysis below.
I understand that this is about the AM's concept of allocation, not the Allocator trait - I've removed ZeroableAllocator from the post accordingly. The deeper question you raise is that any Rust code today can rely on the invariant "no live allocation exists at address 0" for its own reasoning, e.g. using null as a sentinel to distinguish allocated from unallocated pointers. Changing this invariant would invalidate that reasoning.
I'd like to understand the scope of this: beyond &T/Option<&T> niche optimisations (which this proposal does not touch), is there existing code whose soundness proof - not convention - formally depends on the absence of a live allocation at address 0? If there are concrete examples, they would help assess whether a migration path exists. If the invariant is load-bearing in ways I haven't accounted for, I want to know.
That said - read_volatile and write_volatile on address 0 are already permitted, which means the AM already allows some operations to touch address 0 without triggering UB. The Vital change in this proposal extends that to non-volatile core::ptr operations. This is a wider scope, but not a jump from "0 is never touched" to "0 is fully valid" - it is a continuation of a direction the language has already taken.
As promised in my previous comment, I've measured the codegen impact of null_pointer_is_valid. This addresses a gap that @RalfJung rightly pointed out, and that several others in the thread have implicitly assumed without data.
I am not a compiler engineer, so I kept it to five simple C patterns on aarch64-unknown-none with Homebrew clang 21.1.8 at -O2, comparing baseline against -fno-delete-null-pointer-checks.
// bench.c
// clang -v | Homebrew clang version 21.1.8
// clang --target=aarch64-unknown-none -O2 -S bench.c -o baseline.s
// clang --target=aarch64-unknown-none -O2 -fno-delete-null-pointer-checks -S bench.c -o zeroisvalid.s
// diff baseline.s zeroisvalid.s
#include <stdint.h>
#include <stddef.h>
// C1: null check elimination after dereference
uint32_t check_after_deref(const uint32_t *p) {
uint32_t v = *p; // dereference
if (p) return v + 1; // compiler can assume p != null
return 0; // baseline: this branch eliminated
}
// C2: two paths merging on null knowledge
uint32_t branch_after_store(uint32_t *p, uint32_t val) {
*p = val; // store implies p != null
if (p) return *p + 1; // redundant check?
return 42;
}
// C3: loop with pointer increment
uint32_t sum_until_null(const uint32_t *const *ptrs) {
uint32_t sum = 0;
while (*ptrs) { // null-terminated array of pointers
sum += **(ptrs++);
}
return sum;
}
// C4: devirtualisation / inlining based on nonnull
void copy_if_valid(uint32_t *dst, const uint32_t *src, size_t n) {
if (dst && src) {
for (size_t i = 0; i < n; i++)
dst[i] = src[i];
}
}
// C5: struct access implying nonnull
uint32_t read_two_fields(const struct { uint32_t a; uint32_t b; } *s) {
uint32_t x = s->a; // implies s != null
if (s) return x + s->b; // check eliminable?
return 0;
}
8a9,10
> cbz x0, .LBB0_2
> // %bb.1:
10a13
> .LBB0_2:
21a25,26
> mov w9, #42 // =0x2a
> cmp x0, #0
23c28
< add w0, w1, #1
---
> csinc w0, w9, w1, eq
111a117,118
> cbz x0, .LBB4_2
> // %bb.1:
113a121
> .LBB4_2:
The result: codegen differences appear only where the compiler would eliminate null-check-not-gating-accesses: 2-3 additional instructions per site. Null-check-gating-accesses (C3, C4) produce identical output.
This doesn't prove the cost is zero for Rust, but given that one of the most performance-sensitive C codebases in existence has operated without this optimisation for over fifteen years, I think the default assumption should be that the cost is manageable unless demonstrated otherwise.
Ecosystem compatibility is not the primary goal - the zeroable reference primitive is about closing the gap where no reference-like primitive can represent a valid hardware address. Ecosystem adoption can follow incrementally.
It has been roughly two weeks since #83 and #84 were posted. To my knowledge, no informal justification for the AM axiom (no live allocation at 0x0) - whether adoption rationale, consequences of changing it, or an example of code whose soundness depends on this axiom beyond NonNull<T> - appears to have been presented in this thread. I have also searched related discussions, and was unable to find one.
I intend to submit an RFC based on the current OP - which is revised to reflect feedback on this thread - 336 hours from now. Until then, I would like to gather further feedback here on Internals. You are also welcome to comment during the formal RFC stage, so please feel free to participate at your convenience.
Arguments already addressed above will not be revisited unless accompanied by new evidence. As for the need to access 0x0 itself: the OP provides concrete counterexamples, so this point is settled.
So aside from an integral part of the language, which you're explicitly disregarding for unclear reasons.
That's not at all how the process works. Addressing the top-level post itself in its current state, it is still not clear why you necessitate a language change here. Everything you propose can be accomplished in a third-party library. You can't simply declare something to be true while closing your eyes and plugging your ears. Issues have been raised; you are the one that needs to address them.
The question was specifically about raw ptr semantics whether there is existing code whose soundness depends on the AM axiom specifically at *const T / *mut T beyond what NonNull<T> already explicitly guarantees.
What is 'settled' is whether the problem - someone must access 0x0 - exists(refer #2-#62), not the entire proposal. The proposal itself and its design is of course open for discussion.
There already exists a mechanism to access address 0, so "there needs to be a way to access address 0" is not a rationale for adding something new. Anything beyond that is a matter of tradeoffs (ergonomics, performance, etc) for various groups of users.
Revisiting the entire thread, there appears to be a divergence in how the issue is perceived. In most environments, workarounds like volatile access, wrapper crates, and extra abstractions are considered a sensible and sufficient solution. But in bare-metal environments where every single byte counts, the same becomes a blocking problem; and in mission-critical environments, the very need for a workaround is itself an audit failure or even defect. This proposal originates from the latter perspective, and given that Rust treats bare-metal as a first-class target, I believe this perspective warrants proportional consideration; not in number, but in worst-case failure cost.
We can't always have a perfect day. That's life; we must be prepared.
Abstractions can be made to be zero cost. You're making the plausibly incorrect assumption that an abstraction has overhead or that crates can't be optimized away.
They do have cost. volatile access is not optimisable by its definition and inline asm is even worse - it will cause audit failure. And forming reference at 0x0 is not a cost problem, it's simply impossible under the current AM. There's nothing 'zero-cost'.
You might have a chance of adding new variants of a few functions in the ptr module (e.g. read, write or copy_non_overlapping) which allow null without volatile semantics. Since these couldn't easily be provided by third party crates and wouldn't impact the language at large.
You could then implement your own abstractions on top of these in a third-party crate to make these use-cases easier and safer.
But I don't see your higher impact proposals, like allowing ordinary pointers to dereference null or new reference types in core happening.
I think adding read_memory and write_memory to core::ptr like this should be enough:
pub const unsafe fn read_memory<T>(src: *const T) -> T;
pub const unsafe fn write_memory<T>(dst: *mut T, src: T);
Their semantics would be similar to those of volatile, but without the elision/reordering guarantees. Something similar to this:
When a
memoryoperation is used for memory inside an allocation, it behaves exactly likeread/write.Memory operations, however, may also be used to access memory that is outside of any Rust allocation. In this use-case, the pointer does not have to be valid for reads/writes. Here, any address value is possible, including 0 and
usize::MAX, so long as the semantics of such a read/write match those of ordinary memory access on the target hardware. The provenance of the pointer is irrelevant, and it can be created withwithout_provenance. The access must not trap.
(I'm sure the semantics as written above don't quite work. But they should give the right idea. Somebody with a better understanding of the rust memory model than me would need to specify this)
- Unlike
*_volatileor inline asm, these could beconst, though they'd still need to panic when operating outside an allocation at compile-time. - Other functions
copy,copy_non_overlapping,replace, etc. can easily be implemented on top of these in a these in a third-party crate - Maybe
copyandcopy_non_overlappingwould have a performance benefit from an intrinsic, so perhaps adding them makes sense as well.
This direction has merit - having non-volatile, optimisable primitives for accessing memory outside the AM's allocation model would cover a real gap that read_volatile/write_volatile currently fill imperfectly, and I think it could complement the proposal well.
That said, I'm not sure it alone would be sufficient for cases like the DevTreeBlob example in the OP, where the hardware places a structure at 0x0 on a 16-bit target with no spare RAM. There, what's needed is &mut DevTreeBlob to call methods, mutate fields, and pass to APIs that expect references and no composition of individual read/write primitives can produce that. There's also the concern that each core::ptr function would need a corresponding variant, which could add considerable API surface.
Still, thank you for the thought-out suggestion. This could be a useful building block regardless of how the reference question is resolved.
With CodesInChaos' suggestion in #93, couldn't the ergonomics be improved by a proc macro that rewrites something like this...
#[foo]
impl DevTreeBlob {
fn blah(&mut self) -> u32 {
self.xxx = 123;
self.yyy
}
}
...into something like this...
impl DevTreeBlob {
fn blah(ptr: *mut Self) -> u32 {
core::ptr::write_memory(&raw mut ptr.xxx, 123);
core::ptr::read_memory(&raw ptr.yyy)
}
}
...and that would satisfy most of your needs?
A & or &mut at null will never happen, precisely for the reasons laid out earlier in this thread; references are never null, niche optimization, and both the compiler and the ecosystem count on both of those. The thing you wrote as "instant UB" is going to continue to be instant UB.
But given the appropriate low-level primitives, you'd be able to design a different wrapper type, e.g. AnywherePtr<T>, which can contain a raw pointer value that's only used with those specific low-level primitives. And with upcoming work on field projection, you could go from an AnywherePtr<MyStruct> to an AnywherePtr<FieldOfMyStruct>. That could look roughly like let value = my_anywhere_ptr~field.read(); or my_anywhere_ptr~field.write(value); (where the exact operator in place of ~ is still TBD).
I've already revised the OP - allowing zero to & and &mut will never happen. The zeroable reference primitive is NOT replacing the non-zero reference; they coexist.
Anyway, your AnywherePtr<T> + field projection seems interesting and I think it naturally points toward making it a compiler-known primitive. Once you need lifetime tracking, borrow semantics, and field projection on a pointer type, it becomes difficult to express that purely as a library type without compiler support. At that point, promoting it to a new reference primitive seems like the more natural path.
&raw ptr.yyy doesn't compile, and &raw (*ptr).yyy is UB if ptr is null or dangling. Not sure if there is currently a good work-around for that. (You can go through offset_of, but that throws type inference out of the window)
We already have smart pointers that provide lifetime tracking and borrow semantics. Field projection is being worked on. Once that's ready, I expect that's enough compiler support to do the rest as a library type.
Lifetime tracking and borrow support is already available to custom types, with the exception of re-borrowing.
There are many other types, both in the standard library and in third-party crates that would benefit from field projection and/or reborrow support. So it's better to add those features in the compiler, and make your reference type an ordinary library type.
For field projections the tracking issue is Tracking Issue for Field Projections ยท Issue #145383 ยท rust-lang/rust ยท GitHub
For re-borrowing the tracking issue is Tracking Issue for Reborrow trait lang experiment ยท Issue #145612 ยท rust-lang/rust ยท GitHub
Both of these features can effectively be used with custom types today, just with some downsides, like ugly syntax, and lack of type inference.
For example field projection, might currently look something like field!(pointer, fieldname) instead of pointer~fieldname. playground
And reborrowing might require something like pointer.reborrow() instead of happening automatically.
Don't smart pointers ultimately rely on &T internally? Either way, I'd like to see a concrete example to evaluate whether they can provide expressiveness equivalent to &T.