Could we make the pointer niche bigger?

I had this crazy idea, and I don't know if it can work, so I'm just gonna ask.

In Rust, every reference and pointer has a niche (equivalent to the null pointer in C/C++), which can store an enum variant such as Option::None. But just one variant isn't much; so what if we made the niche bigger? This would mean that some memory addresses need to be reserved.

If we just reserved 256 addresses, pointers could store an entire additional byte. So a Result<Box<T>, E> would still be pointer-sized if E is a bool, a u8 or an enum with at most 256 variants.

The main challenge is probably preventing the allocator from using the addresses that are used for the niche. Is this possible?

2 Likes

There's two big opportunities here:

  • Zero page niches (no link, sorry, though I know it's been discussed before). This is more what you're talking about: on most hosted (i.e. you have an OS) platforms, not just address 0x00 is unused, but at least the whole lowest page. On x64, that's at least 4KiB (212) of space, and potentially upwards of 2MiB (221). (It depends on the target and potentially OS configuration...) Similarly, the kernel/userspace split potentially cordens off half of the address space that userspace isn't allowed to touch. These offer large platform-dependent niches.
  • Alignment niches. Because references are well aligned, poorly aligned pointers can also be used as niche values. Read the RFC for more, but it's likely rustc would only ever use 0..align and !align + 1..=MAX as niches, as a) that's the niche support currently in the compiler, and b) more complicated niche extraction starts to not be as clear of a win (becoming more of a size/time tradeoff).

The compiler teams have shown generally positive support for both of these possibilities, and it's mainly just a question of actually implementing the support for them. Alignment niching is likely to come first imho, as it's platform independent and already has a well-supported RFC. Target dependent zero page optimizations will require more work.

14 Likes

SmartString uses alignment to store the first 23 bytes inline. I think that's pretty neat. If std String did that, I think that would be neat too.

1 Like

CompactStr can store up to the full 24 bytes on the stack! This comes at slightly more expense to extract the string. (It works off the principle that the final byte in a UTF-8 string cannot be 0b11XXXXXX, so uses that to mark non-inline strings.)

std String guarantees that it doesn't do a small string optimization, for better or for worse. I think for better; String really is StrBuf, and this allows you to choose your smartstring method. The one sad bit is not being able to pass around owned strings with the small string optimization, but this (owned string passing) is generally much less used than in e.g. C++ due to safe and efficient slicing (the biggest pain with C++ std::string_view ime is when you need a null terminated string for extern calls and then have to make a local buffer copy anyway).

6 Likes

Is that because of a more fundamental guarantee about the structure of a String that prevents doing fancy stuff with the pointers? Because I can think of other interesting use cases regarding String. For example it would be neat if String was like a Cow<'static, str> and there was a const way to create a String with some initial value.

String::const_from("Static str")

It's because it promises to unsafe code that the bytes of the string will be in the same place even if the String itself is moved (assuming no reallocations).

String is there to be the fundamental one, since you can build a CleverString on a String, but you can't recover the "s.as_ptr() stays the same on move" if String doesn't have it.

People tried to find a way to hide string literals in Strings in a way that would work, but the problem is that unsafe code has certain expectations. Like almost everything works if you just let made-from-a-literal Strings have their capacity be zero, since that can still cause problems where, for example, code looks at capacity - length to know whether something will reallocate.

8 Likes

For the zero-page niche, there's a second issue: since it can depend on runtime configuration and not just on target, we may want some way to configure it (e.g. a codegen option), but if you configure it differently than the default for the target, you can't use Rust code compiled with a different value, including the standard library (not even no_std, you might have to use no_core).

I would love to see this niche added, so that it'd be possible to store 4096 values in the niche of a pointer on many targets. But in doing so, we'd have to have a good answer for what happens if you want the value to be different (most commonly smaller, though someone could also want it larger).

Similarly, the kernel/userspace split potentially corners off half of the address space that userspace isn't allowed to touch. These offer large platform-dependent niches.

Using high address bits can get complex as well: Pointer tagging for x86 systems [LWN.net]

4 Likes

Can you please explain why? I have my suspicions, but I want to be absolutely certain as to what is happening.

When a crate is compiled, the memory layout of all used types must be known. But the standard library is distributed pre-compiled for each target, so the same layout used when the stdlib was compiled must be used when compiling everything else. This makes it impossible to make the layout of types used in the stdlib configurable, unless the stdlib is recompiled with -Z build-std. Am I correct?

1 Like

But wouldn't that imply that the standard library has hard coded values for objects that place them in the first page of memory? I thought that the libraries were position independent, so would be placed wherever they fit in memory.

Hardcoded addresses aren't the issue.

If the standard library is compiled to assume object layouts with niches that rely on no valid address ever being within the first 4k of memory, and if an application is compiled differently and that application creates an object within the first 4k of memory, the application and the standard library would disagree on the layout of that object, causing the standard library to misinterpret it.

For that matter, even if the application doesn't create an object within the first 4k of memory, the object layouts would still differ and cause the standard library to misinterpret objects. For instance, Result<&T, u8> could be the same size as &T if there's a niche in &T large enough for u8, but if the niche is only at 0 then Result<&T, u8> needs an extra byte for the u8. This would cause the standard library and the application to disagree on ABI.

2 Likes

That was what I thought.

I didn't know that...

Would it be possible to add in some new constants/functions/whatever to core that can be queried by the compiler to see what niche value assumptions it was compiled with? The compiler could then see if the shipped libraries need to be recompiled before they are linked with the crate.

EDIT

I just realized how this could be done. Assuming that this setting the niche values (and all the other values) to the standard library are done through command-line switches, we could just make a constant string that contains the entire option list that was used to compile the standard library. The compiler can then look for that symbol in the compiled crate. If it's there, it can immediately do some logic to figure out if the current set of options are compatible with the compiled crates options, and if not, recompile it. The nice thing about this is that it works regardless of the output binary's format, it'll just be something like a constant string somewhere.

While that's certainly possible, it'd require the ability to recompile the standard library, and build-std is a complex feature that's been in development for a long time.

I'd love to see this happen, though.

6 Likes

Yeah, I agree that it would be complex to implement. What I was thinking though is that it could be a neat way of storing information in any precompiled library that you might want to link in. If the library doesn't have the symbol for the string in place, then you can assume that it was created before the convention was in place, which means that niche values or other tricks weren't being done (essentially create a default string for that piece of code).

The nice part about this trick is that it is FFI-friendly; I can see C/C++ code linking in a rust binary blob and using this string symbol to figure out how to link it in. That means that ABI compatibility can change on edition boundaries fairly easily.

I haven't yet seen anyone mention that it is currently considered sound to create a reference to page zero memory.

let v: Vec<i32> = vec![];
println!("{:p}", &*v);
let u = vec![(), (), ()];
println!("{:p}", &*u);

prints

0x4
0x1

Any proposal to widen the niche at 0x0 will have to deal with the allocator's tendency to use page 0 for its own purposes, which has the potential to be highly annoying in certain cases.

2 Likes
1 Like

It's not a nice solution, but the zero page niche could "just" be limited to non-ZSTs. (I know lccc would be sad, because they want to be able to do optimizations pre-monomorphization when still dealing with for<T> &T.)

The fact that you can forge a ZST allocation at any non-null address is fairly well documented at this point, so I don't think it's possible to break &*ptr::invalid::<ZST>(mem::align_of::<ZST>()), let alone practical.

(And IIUC this makes zero page niches impossible for [T] as well, since they can be zero sized. It's probably best to disable the zero page niche for all !Sized types...)

2 Likes

This wouldn't preclude using Zeropage niches (niche optimization does happen post-mono, specifically in layout computation). The thing that precludes it is what was discussed above (differing in ABI between the stdlib and the application) and that I don't want to spec "hosted-only" layout optimization (it should be fairly clear that Zero-page niche is very much invalid for freestanding, where page 0 can 100% be mapped. Heck, on w65, the Zero Page, depending on the definition of "page", can be the host of the entire stack).

TBH, as long as it doesn't become a mandatory niche like null-pointer optimization, I don't particularily care about it. rustc can exploit whatever unspecified niches it chooses. When it becomes a mandatory niche is when I start to care.

8 Likes

I assume that the idea is for the optimization to be target-specific (at least; as discussed above, this might not be enough). Are there any targets which can be both "hosted" and "freestanding"?

(FWIW: I don't think there's any interest in making #[repr(Rust)] niching any more mandated/specified/guaranteed than it currently is with null-pointer opts. The problem is that the moment you describe a stable niching algorithm you're making explicit space for someone to do it better, and #[repr(Rust)] is the unstable best-effort layout.)

(It's maybe possible that specific niches could be explicitly requested in the future, but imho it's also fairly easy to say that such code just isn't portable. I don't know if such is enough to make lccc happy, but that's a bridge to cross far in the future when such is actually being discussed in a platform-tied way.)

4 Likes