Dereferenceable Zero

Never do assume programmers to be sensible - This is why Rust exists. The same also applies to hardware assumptions.

and I still remember you didn't answer about core::ptr API.

Period.

@H4n_uL Many people in this thread are trying to carefully and diplomatically explain ways you can accomplish what you want to do, whether by using the read_volatile and write_volatile operations, or building a safe wrapper type atop those operations. It seems like you are specifically asking for the existing reference type &T to change, to accommodate null pointers, and you're arguing that the niche optimization for Option<&T> could be sacrificed for that accommodation, as well as all the assumptions within code to the same effect.

Being as explicit as possible: the niche optimization is a stable guarantee, and the assumption that &T will never be null is a stable guarantee. Changing the standard reference type &T to permit null pointers to be valid would be an extensive change with a high cost and a stability break. The latter isn't viable, and even if that weren't the case the former would be too high a price to pay for an unusual use case. That doesn't mean we don't care about the use case, or that we don't support the use case; it means that we've provided support for the use case that's proportional to that use case, to make it possible, and we aren't willing to go further in a way that would create a disproportionate impact on other people and other code.

Part of the point of a pre-RFC is to get preliminary vibes and see how something would be received. In this case, you're getting the feedback that it isn't going to happen in the form you're proposing it, and people are taking the time to attempt to diplomatically present alternatives that could work for your use case.

20 Likes

Note: I was applying revisions to the post while your comment was being written.

I understand - the niche optimization for Option<&T> is a stable guarantee, and the cost of changing it is unacceptable. I take that.

That said, the revised Demand section now presents two concrete scenarios: one where the hardware places a structure at 0x0 on a 16-bit target with no spare RAM - making volatile workarounds physically impossible, and another where whether the code is UB depends entirely on which address the firmware happens to allocate and report - which is an external decision.

Given that the existing workarounds don't cover these cases, I'd welcome any guidance on how they should be handled.

In such a super-constrained, niche scenario: why not just write a wrapper - or, indeed, the whole thing - in assembly? (Is Rust even supported on 16-bit target architectures? :thinking:)

The language is intentionally designed to not rule out the possibility of running on a 16-bit architecture (e.g. usize is From<u16> but not From<u32>). I don't know whether there's any specific 16-bit architecture that's well-supported, though.

4 Likes

So, there are two cases to talk about here:

Case 1: you have hardware resources like registers at address 0. This is the case where read_volatile/write_volatile generally suffice.

Case 2: you have a memory allocator that allocates arbitrary objects at address 0 and considers it as valid as any other address. I can appreciate that read_volatile and write_volatile don't fully suffice for that case, and it's more painful to deal with. However, that case is one where it's not necessary to use address 0; a memory allocator could pre-reserve one byte to prevent that. And that feels like a not-unreasonable cost to pay to have &T continue to have a reserved null value.

In any case, whatever approach you used to interoperate with such an allocator would almost certainly entail more than one byte of overhead, so it would cost you less to reserve one byte in your allocator.

3 Likes

There is a very simple workaround: if !src.is_null() && !dst.is_null().

OTOH, checking this for every use will carry a perf overhead for everyone.

As mentioned at the Demand section:

There is no spare RAM to copy into - read_volatile has nowhere to write the result.

The address is decided by an allocator of external software - mostly firmware - that the Rust programme cannot constrain.

In both scenarios, programmers cannot control the external constraints - Hardware rules over programmer.

This is exactly the conclusion I have been trying so hard to avoid. If it's true, then why would C - or any programming language in existence - ever be needed over assembly?

With my lang hat on, any "but what if there was hardware that ______" doesn't work for me as motivation for a feature like this, not even for the restricted version of "well pointer reads could be allowed". (As Josh said, the reference version will just never happen.)

I appreciated the analysis in Idea / Pre-RFC: Null-free pointers - #13 by ais523 -- if this isn't really a problem on the existing hardware that mentioned, you'll need to make the rationale more concrete. Don't say "consider a"; name it specifically.

Because portability and safety matters more for large software. Nobody's running firefox on 2 KiB of RAM. But also if you're in a bootloader that's 80% platform-specific assembly anyway -- because what it does is fundamentally not something expressable in the Rust AM -- then you don't actually need an HLL.

6 Likes

Also, because writing something in Rust is still an improvement even if it has to use more unsafe or has to use non-standard pointer types; that's still better than having to write it entirely in assembly.

1 Like

Yeah, that's why I said 80% assembly.

10% assembly and 90% rust is way better than 100% assembly. 99% assembly and 1% rust I don't know that I'd bother with the rust.

https://www.renesas.com/en/document/dst/rx62nrx621-group-datasheet-rev140

Page 50. On-chip RAM from 0x0 up to 0x18000.

Page 5 and 15. 16-bit address bus providing access to 65,536 contiguous bytes of RAM space; No room for a sentinel.

Named, period.

No, they run satellites on such machine. According to the White House ONCD report (2024-02), Rust is explicitly mentioned as a candidate for space systems and safety-critical embedded. These are bare-metal, are constrained, and run on hardware where 0x0 can be valid RAM. The Renesas RX family alone - with SRAM at 0x0, no MMU, and more than a billion shipped - is deployed across industrial motor control, medical devices, and IoTs.

Tock OS and Hubris: complete, production-deployed operating systems in Rust - which are also no_std. Assembly is nowhere near 80%. Not every embedded system requires architecture-specific operations more than any generalised ones.

And why do the same reasons that it's not a problem on the other chips not apply to this one?

Nobody's saying it's impossible to have RAM at address zero, just that at worst losing those 3 bytes (since you linked a 32-bit chip I'll assume it likes 4-byte alignment) isn't actually a problem in practice.

4-byte pointers into less than 2¹⁷ memory is going to waste far more memory than not using the zero address. But I don't think we're going to offer a "3-byte pointers" mode in rust either.

(BTW, did you know that rust doesn't let you have a &u32 to the last four bytes of memory in the address space either?)

3 Likes

I think this is a point that hasn't been explicitly responded to, so I'll go through these cases:

  1. In practice, if a processor is placing something in memory, then either a) it places it at a location that's hardcoded and part of the platform specification, or b) the location is under software control and can be chosen by reconfiguring the hardware. This means that if the hardware is placing someting at 0, either it's always placing it at 0 (in which case you can access it using volatile reads and writes), or else it's your own fault that you told it to use address 0 rather than somewhere more sensible.

    You seem to have a mental model of hardware in which it acts like a black-box memory allocator for which you have no control over the addresses it chooses. I doubt such hardware exists – it would make no sense, and standard hardware design is to respond to the addresses that software chooses rather than vice versa.

    There are some processors that have a fixed memory map that is not under software control. Even so, in these processors the memory map is fixed and part of the platform specification, and thus software is able to hardcode the details.

  2. This would only be a problem if a) the firmware were making allocation choices, and b) the firmware nondeterministically chose 0 as a possible address to allocate.

    There are firmwares that do memory allocation, e.g. Coreboot, which I looked into in order to get more idea about how firmwares work. It turns out that Coreboot loads bootloaders written in ELF format, which is the same format that Linux executables use to specify their requests for how they should be loaded into memory – so if you are using Coreboot as your firmware, the bootloader would be able to request "please don't load this section at address 0" much like Linux executables can.

    In general, I expect all firmwares that do memory allocation would pick an allocation scheme that is suitable for system programming languages to use. The reason is that if they didn't, they would be unable to load bootloaders written in C, which would be a big deficiency which would mean that nobody would use them. (Yes, C itself doesn't require that null is all-bits-zero – but practical C compilers do.)

    If the firmware doesn't do memory allocation, then it will be using a fixed memory layout instead, so the usual "if something is loaded at address 0, the software will know that" principle applkies.

4 Likes

This is making the assumption that you need the whole struct (either owned or a reference). Is that really a hard constraint?

First: You did not address W65C02S: a 16-bit chip whose entire 64 kiB RAM address space is contiguous throughout 0x0..=0xffff - where there's nothing to sacrifice.

Second: It IS a HUGE problem if the data loses its meaning by part, and the data is pinned by hardware.

Third: ptr overflow is nondeterministic: which makes sense to prohibit defensively; Access to 0x0 is not.

Even if you decompose the struct into individual volatile reads, the second example - from_raw_parts(map, len) where map comes from firmware - still stands. You need a slice, and constructing a slice requires a reference. There is no field-by-field workaround for that.

And practically, giving up &T will result in losing access to the ecosystem built on it. Writing a raw-pointer-only forks of those means maintaining a parallel ecosystem that can never keep up with upstream.

This is getting to the core error behind the back-and-forth, the question does not logically follow the premise and is answered straightforward: Rust (and C) is better than assembly at other tasks. Poignantly: if a hammer (C/Rust) isn't as good as a screwdriver (Assembly) at turning screws, why would a hammer ever be needed over a screwdriver?; is as absurd to me. Tools like programming language prevail by being best at addressing (at least) one specific set of requirements. It usually doesn't pay to make software worse at its validated use case for the sake of achieving mediocre, at best, performance at something else, especially a niche task. Idea / Pre-RFC: Null-free pointers - #42 by josh explained in detail how the RFC is an instance of such a consequence and hopefully that generalization helps shape the approach of addressing the concerns you're receiving.

That rule is also a reason to appreciate Rust's lack of CHAR_BIT, lack of register-sized primitive types, and lack of "platform-defined" baggage. Those choices are not deficiencies but a distinct direction. They make it less close-to-the-metal but help with the more central and unique mission of simplifying verifiability; and there already are languages that are very good at being close to the metal.

You named a target, not a problem. That is not itself an example where a program is unable to use read_volatile or other portions of Rust as-is due to 'no spare RAM'. The only thing you can't read directly here is the first byte of blob, which isn't consequential by itself and has plenty of alternatives with less impact than a language change. Production examples instead of toy examples are great because we can be sure they had some amount of engineering applied to a search of a solution and a good Alternatives section has an easy time conveying the attempted dead-ends (and their disqualifying properties).

As also already pointed out, consider you can readily link and interact with parts written in assembly implying this is not an exclusive choice. As the report itself says, there are no silver bullets and its whole point is moving toward rather than absolutism (Ctrl-F both terms to check if I'm misrepresenting its message). The task where assembly is best-at can be as small as a number of functions in your embedded program. (For a 16-bit project, there can only be so many functions anyways). Separating a problem into disjunct pieces so that different best tools can be utilized is a key part of good engineering. Your own examples of TockOS and Hubris having smaller amounts of assembly than feared only fuel the argument that Rust as-is already can hand-in-hand with assembly successfully solve the problem space better than one may think.

5 Likes

I accept the point - I've been pointing out targets, not the actual incident or blocking problem.

However, here is a target with a problem.

Robin Mueller, a researcher at the University of Stuttgart's Institute of Space Systems, was developing a Rust bootloader for the Vorago VA108xx and VA416xx - radiation-hardened Cortex-M4 MCUs deployed in aerospace - where programme RAM starts at 0x0. He needed to read the running application image from address 0x0 to flash it to non-volatile memory. Skipping the first few bytes was not an option - the image starts there. Standard Rust pointer operations were impossible; the only path was inline Arm® assembly.

    if FLASH_SELF {
        let mut first_four_bytes: [u8; 4] = [0; 4];
        read_four_bytes_at_addr_zero(&mut first_four_bytes);
        let bootloader_data = {
            unsafe {
                &*core::ptr::slice_from_raw_parts(
                    (BOOTLOADER_START_ADDR + 4) as *const u8,
                    (BOOTLOADER_END_ADDR - BOOTLOADER_START_ADDR - 8) as usize,
                )
            }
        };
        let mut digest = CRC_ALGO.digest();
        digest.update(&first_four_bytes);
        digest.update(bootloader_data);
        let bootloader_crc = digest.finalize();

        nvm.write_data(0x0, &first_four_bytes);
        nvm.write_data(0x4, bootloader_data);
        if let Err(e) = nvm.verify_data(0x0, &first_four_bytes) {
            if DEFMT_PRINTOUTS {
                defmt::error!("verification of self-flash to NVM failed: {:?}", e);
            }
        }
        if let Err(e) = nvm.verify_data(0x4, bootloader_data) {
            if DEFMT_PRINTOUTS {
                defmt::error!("verification of self-flash to NVM failed: {:?}", e);
            }
        }

        nvm.write_data(BOOTLOADER_CRC_ADDR, &bootloader_crc.to_be_bytes());
        if let Err(e) = nvm.verify_data(BOOTLOADER_CRC_ADDR, &bootloader_crc.to_be_bytes()) {
            if DEFMT_PRINTOUTS {
                defmt::error!(
                    "error: CRC verification for bootloader self-flash failed: {:?}",
                    e
                );
            }
        }
    }
// Reading from address 0x0 is problematic in Rust.
// See https://users.rust-lang.org/t/reading-from-physical-address-0x0/117408/5.
// This solution falls back to assembler to deal with this.
fn read_four_bytes_at_addr_zero(buf: &mut [u8; 4]) {
    unsafe {
        core::arch::asm!(
            "ldr r0, [{0}]",    // Load 4 bytes from src into r0 register
            "str r0, [{1}]",    // Store r0 register into first_four_bytes
            in(reg) BOOTLOADER_START_ADDR as *const u8,         // Input: src pointer (0x0)
            in(reg) buf as *mut [u8; 4],  // Input: destination pointer
        );
    }
}
fn check_own_crc(
    sysconfig: &pac::Sysconfig,
    cp: &cortex_m::Peripherals,
    nvm: &mut NvmWrapper,
    timer: &mut CountdownTimer,
) {
    let crc_exp = unsafe { (BOOTLOADER_CRC_ADDR as *const u16).read_unaligned().to_be() };
    // I'd prefer to use [core::slice::from_raw_parts], but that is problematic
    // because the address of the bootloader is 0x0, so the NULL check fails and the functions
    // panics.
    let mut first_four_bytes: [u8; 4] = [0; 4];
    read_four_bytes_at_addr_zero(&mut first_four_bytes);
    let mut digest = CRC_ALGO.digest();
    digest.update(&first_four_bytes);

    ...
}

I circumvented the issue by falling back to assembler, but this feels really hacky to me.. Shouldn't Rust be low level enough to allow me to deal with these issues? I found this pre-RFC: Pre-RFC: Conditionally-supported volatile access to address 0 - libs - Rust Internals.

Source: Reading from physical address 0x0

The project is published as va108xx and va416xx on crates.io and sources available at

2 Likes