Idea / Pre-RFC: Null-free pointer and Zeroable reference

For both of these targets any existing C compiler for them (gcc, clang) would not have allowed using the 0 address either. And in both cases you can just reserve the 0 address in a linker script. It is only a single byte. If you are that tight on memory that a single byte wasted is an issue, you've got bigger problems.

3 Likes

In both those situations, the memory map is not the language's responsibility, but the programmer's responsibility. (I note that you didn't respond to my question about the memory map or how programs were run. I admit that "memory map" was ambiguous; I was talking about the memory layout that the program used within virtual memory, not about the virtual/physical memory mapping.) You can make it the language's responsibility by adding a runtime environment, but then the runtime environment would choose a method of configuring memory that's compatible with the language.

You have also not clarified how much of the system is written in Rust (everything down to bare metal? Everything but the OS? Everything but the runtime?) It matters in this situation.

My contention is that it is reasonable to require an operating system or bare-metal runtime to choose uses for memory in such a way that address 0 is something that you would never want to form a reference to. There is no performance loss from doing so (in fact there is a performance gain, because it makes null-checks faster), because there will always be some amount of memory that it is not useful to reference. (Note that the MMU is irrelevant here: whether there's an MMU or not, and whether there's an identity map or not, the same arguments apply to virtual address 0. I've been assuming that you haven't mapped in a way that makes virtual address 0 unreadable, because if you have there obviously isn't a problem. That said, if you can do so, you probably should do so for security reasons, because it reduces the consequences of a null dereference in unsafe kernel code.)

Here's a good way to think about it:

  • If you view the program (including the part written in Rust) as "everything down to bare metal" then the memory layout is the programmer's responsibility, not the language's responsibility, and thus nothing is being shifted onto the user.

  • On the other hand, if you view the program as just being the "userland" code that does useful work, and not the runtime support code, it's the responsibility of the runtime support code / kernel / ABI to choose how memory is arranged. Such code will need to use at least 1 byte of memory itself, and it may as well choose to itself use the first byte of memory, so that it never returns it to an allocation request. The code in question wouldn't necessarily need to be written in Rust – but even if it is, it's conceptually a separate program / library and thus the previous case would apply to it.

The problem here isn't "someone might want to use address 0". It's "is it ever useful to form a reference to address 0?". Someone has to design the virtual memory layout used by the program – either the programmer, or the kernel developer, or the person who writes the runtime support libraries. That person will necessarily be able to find something to put at address 0 that isn't useful to form a reference to, even if they aren't forced to do so by the hardware.

As such, the only potentially problematic scenario is "some pre-existing kernel/ABI is designed in such a way that it might load general-purpose program memory of the programs it runs at address 0". Realistically, nobody is designing kernels or ABIs like that, both because there is always a better use for the address in question (whether it's mapping it to some physical special case, using it itself, or unmapping it to catch null-dereference bugs), and because doing so would be incompatible with most programming languages (most notably C – although C allows nonzero null pointers, they're slower than all-bits-zero null pointers and thus practical C compilers don't use such representations).

4 Likes

The RFC is about the language should not assume any address is invalid. The discussion has shifted to whether specific examples justify forming a reference to 0x0 - but that's a different question. Even if no current hardware required it, the principle stands: an abstract machine that forbids a valid hardware address is making an unnecessary assumption. I'd prefer to return to discussing whether that assumption is justified, rather than auditing individual use cases.

I think that the language should assume that one address is invalid to form a reference to – doing so gives major performance gains and I don't think there are any practical scenarios in which there are meaningful disadvantages. Sometimes, the cost of what you give up to be able to optimize the code better is actually worth it.

For what it's worth, I would prefer the language to place more constraints on things like memory layout and what the allocator can do, rather than fewer, as long as doing so does not make the allocator less efficient. Alignment is a good example of that sort of thing which exists in Rust today – many processors are able to read a u16 from an odd address, but even on those processors, Rust prefers to store u16s only at even addresses in order to provide optimisation opportunities. This puts a restriction on the memory allocators, in that Box::new(0u16) has to return an even address, but this is a restriction that is usually cheap to implement into an allocator, making it a cost worth paying. (And this is despite the fact that it has a real cost if you're using a generic unknown allocator that doesn't have a way to specify alignment requirements – when using such an allocator, the runtime's wrapper around the allocator would have to allocate 3 bytes for the u16 rather than 2 in order to ensure it had an aligned location to place it, and might need to track extra metadata to allow deallocation to work.)

There are other cases, not supported in today's Rust, where co-operation with the allocator could lead to performance gains. For example, most of today's fast allocators can be asked "where does the memory allocation containing address X start?" and "what is the size of the allocation starting at address X?" and are able to produce correct answers in only a few machine instructions. Having an API that made it possible to ask the questions would let you "compress" a Vec from three pointer-sized values down to one (you store an address within the Vec's allocation, with an offset from the start depending on its length – then you can ask the allocator for the start address to recreate its length, and ask the allocator what the capacity is). This sort of Vec would be slower to use, but save memory (and perhaps indirectly save time) when storing the Vec long-term (you woud probably have methods to change between the two forms, changing between the active-use and storage forms based on what the code was doing).

1 Like

Some hardware addresses are valid, yet not safe to make arbitrary accesses to. For example MMIO regions. You must never take a reference to such hardware addresses. The compiler is allowed to insert spurious reads of references. For example to move an invariant load out of a loop (it would insert a spurious read if the loop was taken 0 times) You also can't access the stack region without taking a reference to a live a stack variable in both Rust and C. If that wasn't UB, any optimization affecting the stack layout would be a miscompilation.

You have to let go of your assumption that the C or Rust abstract machine allow you to do everything that the physical machine allows without any restrictions. That assumption is fundamentally incompatible with optimizing compilers. Any optimization is visible on the physical machine (for example the changes as a result of optimizations to the machine code emitted are observable on the physical machine) and is only allowed because only the behavior on the abstract machine is guaranteed be preserved by the compiler.

11 Likes

The "null pointer niche optimization" is not just an optimization, it is a guarantee. Since the stdlib explicitly documents the layout of Option::<&T>::None to be size_of::<&T>() bytes equal to 0, and the relative transmutes to be sound, it can be relied upon by unsafe code for correctness/soundness and not just performance.

Changing this would be a major breaking change and would definitely break unsafe code out there, even if not done only for some targets or as a user opt-in.

Maybe it could be an unsafe opt-in, but at that point is it really worth the additional complexity over the existing workarounds?

This is already the case. I guess you meant to relax the conditions needed for those to be valid pointers?


I also wonder about the implications for FFI. A lot of foreign interfaces expect valid pointers to not be 0 and on the Rust side this is often guaranteed thanks to allocations not returning 0 pointers or references being guaranteed to not be 0. Changing this would mean that all those FFI calls would now become unsound, effectively making Rust incompatible with most other languages.

6 Likes

Your argument basically amounts to "because the lower level permits me to do X, the higher level must not prevent me from doing X." But that doesn't hold.

As an example, the hardware is capable of cooking itself with excess heat until it dies. But your software (even if care metal) can't make the hardware do that; the firmware sitting between the hardware and your software sees the temperature increase and lowers the CPU clock to reduce excess heat production and preventing the hardware from destroying itself.

Any and all abstraction layers will make things that are possible at the lower level, at a minimum, more difficult to accomplish (without just dropping down to said lower level). That's the point of abstraction: providing an interface which hides away some of the particular details of the underlying layer so you aren't required to handle states not reachable purely using the abstraction layer.


The hardware also defines what all of the possible outcomes of a data race. If Rust imposing the extra semantic overhead of requiring unsafe and volatile to access 0x0000 is unacceptable to you, would that not also apply to forcing the use of UnsafeCell for aliased mutable memory also be problematic? If you think that it is, Rust unfortunately isn't for you; you fundamentally disagree with Rust's design philosophy.

If anything is to change, it would be to make access through a null raw pointer valid, but to still prohibit null references. (Like it is for zero-sized accesses.) But I don't expect that to ever happen; the benefits are minuscule and only for targets designed in total irreverence for all languages having the concept of a null pointer.

(Opinions contained in this post are exclusively my own and should not be used to represent those of T-opsem in any manner. I am a member of T-opsem.)

8 Likes

There is a difference between "what is possible with the language" and "what is possible with every construct of the language".

You can access memory or devices at address 0 with Rust. We provide a mechanism for that.

You can build safe abstractions over that, if you want.

You can't have a reference &T at address 0.

That doesn't mean Rust is incapable of running on hardware that puts things at address 0; it means you don't get to use the reference type to reference memory at address 0.

Similarly, that &str requires UTF-8 doesn't mean Rust is incapable of dealing with things that aren't UTF-8, it just means you can't use the &str type to do it.

10 Likes

The layout change for Option<&T>::None was an element I hadn't anticipated. Thank you for pointing this out; I'd like to emphasise that users can opt-in via compiler flags or per-target approaches, and that gradual migration remains an open direction. I've reflected this in the Cost section for now.

Thank you for the clarification. I have amended it.

Regarding FFI: I am aware that the ripple effects of Option<&T> cannot be ignored, and this is reflected. However, ensuring that pointers are not null during FFI calls in userland is the programmer's responsibility under the communication protocol between two programmes. This should be distinguished from its introduction as an invariant for Rust AM.

The statement "because the lower level permits me to do X, the higher level must not prevent me from doing X" not only raises the question of whether it undermines logical soundness when breaking assumptions, but also encompasses the issue of whether errors occur in situations where they should not. As I mentioned above: Forbidding unaligned access is defensive which prevents you from doing what the hardware would punish. Forbidding 0x0 is offensive which prevents you from doing what the hardware would happily allow. A systems language's constraints should protect the programmer from the hardware instead of attacking them to protect its optimisation.

UTF-8 has an abundance of good alternatives such as Vec, &[u8], CStr, and OsStr, but for pointers, there is absolutely none except for volatile access. This means that almost all ptr::method operations are disallowed for the 0x0 address.

For example, ptr::copy(src, dst, cnt) should be an unconditional no-op when cnt is zero, but Rust treats it as UB simply because either src or dst might be 0x0.

A case closer to my own scenario: when a physical RAM allocator treats 0x0 as a valid RAM address and happens to return Some(0) (type = Option<usize>), Rust disallows casting this to a pointer for use even if the programmer can guarantee the validity of access. There is literally ZERO workaround for this.

Wrong.

UB means "The std requires nothing, impl may or may not define." in C, but "This must never happen, period." in Rust. Declaration of UB MUST BE MORE PRUDENT for Rust over C.

You're missing the point: the entire point of this RFC is that the language shouldn't force such design decisions. "Just avoid 0x0" assumes the null assumption is justified in order to defend it, which is a circular reasoning.

Imposing more allocator constraints for optimisation is a reasonable opinion for application-level languages. For a systems language with bare-metal support, it's logically unsound to deny its own purpose of existence.

Also, your alignment analogy doesn't hold. Alignment is hardware-grounded: Arm cores fault on misaligned access, RAM bus design prefers/forces it, C FFI standardises it. The null assumption has no such basis - no hardware treats 0x0 as inherently invalid. It's inherited from C convention, not from any/all hardware principle.

To anticipate the counterargument: "should we then allow unaligned access unconditionally?" No - and for the same reason this RFC should be accepted.

And the issue I raised earlier remains unaddressed: ptr::methods are raw pointer operations, not references, and they are UB at 0x0. There is literally ZERO way to work around. A systems language whose core library cannot perform basic pointer operations on valid hardware addresses can never be fully faithful to the hardware it aims to support - that is what this RFC aims to fix.

You can pretty easily create volatile versions of those methods by building on top of read/write_volatile. That's at least one way to work around.

3 Likes

This is incorrect. ptr::copy::<i32>(ptr::null(), ptr::null_mut(), 0) is perfectly valid behavior in Rust; all aligned pointers are valid for zero-sized accesses, including null. The invalid_null_arguments will error compilation if you write that unless you disable it, but that's technically a false positive that is tolerated since there's no reason to actually write a ptr::copy involving null, since it's either a no-op or UB.

This is an incorrect understanding, or at the least an imprecise one. If your program exhibits UB according to the C standard, it ceases to be a valid C program. It may be valid GNU C, as the gcc refines the C standard and defines some behavior which the standard leaves undefined. But that isn't C, that's GNU C, a different language very similar to but different than C.

Rust is exactly the same way, except that the only viable compiler is not a distinct entity from the language definition, and as such, provides no support extending beyond the language guarantees. If you use specifically rustc-1.93.0, then there is some behavior declared as undefined by the documentation which the implementation does actually implement consistent "defined" behavior for, from a compilation point of view. But we provide no support for your code if you are relying on these implementation details, and they can change any time you update your compiler, just the same as switching between gcc or clang or some vendor specific C compiler means your C UB can change how it gets handled.

Pointer methods are UB on null pointers only when they require that the pointer is valid for accesses. The ones which do not require the pointer to point to a valid allocation can be valid for null pointers.

Rust does prohibit an allocation at the null address from being a valid allocation for Rust code, but this is an entirely reasonable ask for the real benefits that the null address being a reserved invalid pointer gives us.

4 Likes

The feature being sort of opt-in is a way to make the user using it aware of it, so hopefully they write code with that in mind. The same cannot be said about all the existing crates in the ecosystem, that might use the existing guarantee for that layout.

You missed the next step: programmers currently ensure that is the case in part by (correctly!) relying on the language level invariants.

1 Like

Thank you for the correction on ZST access - I've read through core library source code and my claim was imprecise. I appreciate the citation.

However, the core problem stands for non-ZST accesses. Consider the following RISC-V scenario:

A firmware or bootloader can freely place a RAM layout map, DTB, or boot parameters at ANY address including 0x0 and report the address to the Rust programme. If it has to read the data:

use core::slice::from_raw_parts as mkslice;

// map can report 0x0 - which is assured to be a valid address
// len can never be 0.
fn ignite(map: *const RamLayout, len: NonZeroUsize) -> ! {
    // and dereference is UB solely because of the null assumption.
    for layout in unsafe { mkslice(map, len.get()).iter() } {
        ...
    }

    ...
}

This is perfectly sound at machine level, yet Rust declares it must never happen. The available workarounds are:

  • read_volatile: works only for individual values, not for bulk copies or reference construction.
  • Inline asm: bypasses the problem but the language has an abstraction gap between its primitive API and valid hardware operations.

You describe the null reservation as "an entirely reasonable ask". I would like to ask: "reasonable" for whom? For userland on AMD64 Windows, absolutely. But for a bare-metal RISC-V target where the firmware has placed meaningful - or even critical - data at 0x0, the cost is the fundamental core::ptroperations being unable to operate on valid hardware. This is not a theoretical concern; it is a concrete limitation that forces developers to abandon Rust's core::ptr API entirely not because they violated any safety assurance, but because the language defined that it's invalid.

The RFC does not deny that null-pointer optimisation has real value. The cost section acknowledges this explicitly. I'd like to question whether it justifies making the core::ptr API unusable to express valid operations on a real-world hardware - and whether a systems language should impose such a limitation at the abstract machine level rather than offering it as an option for programmers.

These are not two costs that can both be avoided - they are two opposite sides of the trade-off, and the choice between them is precisely what this RFC asks the community to discuss.

What should not be lost in the discussion is that the current status isn't zero-cost either:

as mentioned earlier, these question still stands - Rust cannot fully empower a certain hardware. That's a cost already being paid silently, by the developers who cannot use the language on their hardware.

My OP is not about "should we pay for it?"; rather it's "which cost makes more sense?". ecosystem-wide migration with proper guidelines, or continued unavailability on real hardware. I can never accept the latter, but I do believe the discussion must begin from acknowledging that all three options carry real costs and should lead out "how we can make the cost cheaper".

The motivation for avoiding 0x0 is performance – on almost all computer hardware, 0 is the most efficient number to test for (the instructions for checking zeroness are faster and shorter than the instructions for checking other specific values), so by choosing to never place anything that might be a reference target at address 0. It's very common to want at least some values to not be valid as pointers, so that if you have a value that's either a pointer or something else, you don't need to allocate additional memory to track which. The only two efficient solutions to this are "0 isn't the address of a pointer target" (allowing efficient nullness tests using a zeroness test) and "numbers above usize::MAX/2 aren't the address of a pointer target" (allowing efficient nullness tests using a negativeness test), and the historical consensus is to use the former option because the latter wastes too much memory on 16-bit and 32-bit systems.

Because the instructions for checking to see if something is at a specific address other than 0 are longer than zeroness tests, avoiding allocating at byte 0 actually saves memory – the additional code from putting a null value elsewhere will cost more than one byte (whereas you can normally find a valid use for 0x0 even if the hardware doesn't impose one, so avoiding allocating at 0 usually doesn't even cost a byte).

Programming languages are abstractions over the hardware. They make assumptions about memory layout in order to make the abstractions possible. For example, Rust (and C, and almost every other language) assumes that the memory allocator for heap allocations doesn't allocate memory inside the stack. The hardware doesn't care about this at all – it'll let you allocate locations that are already on the stack as heap memory just fine (and there have been security bugs in the past when C programs have been tricked into doing this) – but doing so makes the program very hard to reason about and prove sound.

Part of the skill of programming on bare metal is to come up with your own set of layout assumptions that the language you're using can deal with efficiently. Yes, Rust places constraints on what memory layouts you can use – besides the 0x0 requirement, the compiler assumes a contiguous stack (even though the language specification probably doesn't), and that stack frames follow a particular pattern (which is also not enforced by the hardware). So does every other language that's higher-level than assembly language. When you're programming on bare metal, you take the constraints imposed by the compiler into account – those constraints are there to allow the compiler to generate efficient code, so any sensible programmer would cooperate with them.

Avoiding 0x0 is also hardware-grounded. As a simple example, x86-64 doesn't require accesses to be aligned but does require that you don't write arbitrary data to physical address 0x0. So it doesn't make sense to draw a distinction here. (And, of course, 0x0 being special is hardware-grounded in the sense that 0 is the fastest number to test for, regardless of whether the memory address is usable or not.)

In general, hardware platforms tend to impose requirements that are similar to the requirements imposed by programming languages. This isn't a coincidence – hardware designers design their hardware to be able to effiicently implement common software patterns, and likewise language designers design their languages to be able to efficiently run on common hardware.

2 Likes

You're envisioning a scenario in which the firmware or bootloader creates an unknown memory layout on its own, and forces the program it loads to conform to it. This is unrealistic – a sensible loader would place some constraints on the addresses it uses in order to give guarantees to the program it's loading, that it could use for optimisation.

You don't need to look at bare metal to see examples of this. When you run an executable on an operating system like Linux or Windows, the operating system and dynamic loader cooperate to set up the executable's memory layout in a suitable way – the executable format contains instructions like "I want you to put these bytes somewhere in memory, and those bytes somewhere else in memory, and give me X number of zeroed bytes to store my static data in", and the loader will come up with a memory layout appropriately. A sufficiently perverse loader could, in theory, choose to place the program's static data section starting at virtual address 0, in which case a pointer to the first static variable would look like a null pointer and break the program. But no sensible loader does that in practice, because it can just as easily choose a memory layout that allows for efficient null checks – if the loader was going to map something at address 0, it would choose something that is never going to need a pointer formed to it, so that C programs (and now Rust programs) would run correctly.

The specifications for ABIs (including program loaders), in general, are actively designed to help software make assumptions that it can use to run efficiently, rather than to force it to do inefficient things to defend against weird edge cases. A good example is that on Linux x86-64, the operating system promises to never touch memory within 128 bytes of the top of the stack (even though this is something that an operating system could do, and in fact did frequently do in practice on 32-bit x86 as part of the implementation of signal handlers) – and Rust programs compiled for x86-64 Linux are able to take advantage of that, using memory addresses slightly above the top of the stack to store temporaries in. It would be trivial for the programs to avoid making use of this allowance – allocating stack memory is quick and easy – but using the unallocated memory is slightly more efficient than even that, so the compiler does so in order to save a few cycles. In this case, the motivation for the memory layout assumption is that it doesn't cost the operating system much to make the promise, and it can save a few cycles in running programs, so the restriction makes a good tradeoff.

In general it is reasonable for software to make some requirements of the hardware/firmware/OS/loader environment in which it runs. After all, the program already needs to cooperate with the platform it's running on in order to learn things like, e.g., what memory it can use to store data and how to make calls to its dependencies – you need a standard / shared agreement between the software and the platform in order for the program to be meaningful. Such agreements should be (and typically are) designed to help both the software and hardware run efficiently and securely and to be able to take advantage of optimisation opportunities – and so it makes sense for a programming language to take advantage of them, rather than to have to do inefficient things in case the platform has been set up in a way that defeats optimisation assumptions.

(As a side note, in practice, it has historically been more common for loaders to get the alignment wrong than it has been for them to load things at address 0 – your assumptions about what a loader might or might not choose to do may need rechecking.)

2 Likes