Rules for alignment and non-NULLness of references

From what I gathered in discussion with @arielb1 on IRC, it seems to be a common opinion that the following program should be UB because the reference is not properly aligned:

fn main() {
    let x = 2usize as *const u32;
    let y : &u32 = unsafe { &*x };
}

Notice that the function contains an unsafe block, so discussion is still open whether the usual aliasing rules have to apply to references here. But it is my impression that alignment and non-NULLness are considered stronger guarantees than aliasing and being “not dangling”. That is somewhat surprising to me; we are now talking about some form of “two-staged validity” where some guarantees have to always hold, even in unsafe code, while others may be temporarily violated in unsafe code.

One piece of evidence for this (maybe?) is that somewhat recently (?), ZST pointers in Rust were changed from being 1 as *const T to being align as *const T. Was there discussion somewhere about why this is necessary? After all, such pointers will never be dereferenced, so why should alignment matter?

Furthermore, @arielb1 mentioned that there is an exception to the rule that alignment must be satisfied for the case where the reference is immediately casted to a pointer, like in &*x as *const _. This makes me wonder, what if we change the bad program above like so:

fn main() {
    let x = 2usize as *const u32;
    let y : &u32 = unsafe { &*x };
    let y = y as *const _;
}

Is this UB or not? On the one hand, if the first program above is UB, this one must be – or else, being UB would depend on actions in the future, which can’t be right. On the other hand, the program pretty much does “immediately cast the reference to a pointer”, and indeed, in MIR, these two cases cannot be distinguished.

So the aligned ZST pointers were introduced by @Gankra in https://github.com/rust-lang/rust/pull/41064. This doesn’t actually seem to be concerned with UB because of unused references being non-aligned, but instead it is about Shared and Unique pointers being aligned (not exactly sure what the benefit here is).

However, while the discussion in the PR is all about Unique and Shared, the PR also changes exchange_malloc. As far as I can tell, that change happened just for consistency. @Gankra, could you enlighten us? :wink:

Somewhat of an aside but:

For reference, slice iterators still use & *(1 as *const T).

Also, there was recently a proposal to end this debate once and for all by making references to zero-sized types zero-sized:

For ZSTs, you mean? Good point. Also see https://github.com/rust-lang/rust/issues/42789. This can actually lead to ill-aligned references for ZSTs like [u32; 0].

Just waking up answer:

The stuff I did here was just future-proofing in case we wanted to mandate Unique/Shared are aligned even when dangling. This would let enums use the LSBs for forbidden values. Note that HashMap is currently relying on the fact that they aren’t to pack some data in the LSBs.

It’s a good point about highly aligned ZSTs. Note that mandating alignment of pointers to them would make capacity a bit wonky? I guess you could use the current code but just round the pointer to alignment when you yield the value?

Looking at what LLVM does as a guideline…

The load instruction (http://llvm.org/docs/LangRef.html#load-instruction) does specifically say that mis-aligned loads are UB (“Overestimating the alignment results in undefined behavior.”)

The int-to-ptr cast (http://llvm.org/docs/LangRef.html#inttoptr-to-instruction) doesn’t explicitly say anything about UB, although “This one is really dangerous!” (http://llvm.org/docs/LangRef.html#constant-expressions)

Rust is free to introduce UB if it likes of course. Perhaps &*1 should be an undef value rather than UB?

Which capacity?

If you stride your pointer by (say) 4 bytes for the iterator, then max capacity isn’t usize::max, but rather usize::max/4. However we have a vague rule in practice that zero sized types don’t really have pointer identity, so there’s nothing wrong with yielding the same address for every element.

Specifically all we need to change is that lines like this https://doc.rust-lang.org/nightly/src/alloc/vec.rs.html#2259 should be changed to cast mem::align instead of 1.

(the slice ones need to be factored out a bit more)

1 Like

Nice change: that Box pointing to zero sized allocation is now aligned. :+1: (Rust 1.19 feature apparently). Hah, and of course there was code that broke due to this change…

Testing snippet for Box’s pointer value: https://play.rust-lang.org/?gist=00e6d4d90fe5c84976ce9fdbacb46c5f&version=nightly&backtrace=0

Okay so let me try to see whether I puzzled this together correctly... took me a while.^^ You are talking about the iterator over slices of ZST types. That iterator currently internally works by bumping up a pointer by 1 until we reached a pre-computed end pointer. (That's a special case; usually it bumps up by the size but with a ZST that would literally go nowhere.) Now if we are to bump up by alignment (rather than 1), then we get in trouble if a slice is so big that this makes the addition overflow. On a slightly higher level, the trouble is that ZSTs are the only types for which we can have align > size. Is that what you are talking about?

However, notice that the slice iterator currently already yields just a stream of 0x1 [^1] (ignoring alignment). It could instead yield a stream of aligned pointers. That would of course still leave the internal pointer used by the iterator unaligned; maybe that one should be a raw pointer and not a reference. (Maybe it already is?)

[^1] Also see https://github.com/rust-lang/rust/issues/42789.

Speaking of raw pointers, @eddyb mentioned that LLVM may rely on pointers being aligned in general, not just in some well-controlled places. It may, for example, optimize away bit operations on pointers that become NOPs for aligned pointers. Gathering some evidence around this would be very interesting, because Rust allows creating unaligned raw pointers in safe code, and it IMHO be a severe bug if LLVM optimized things assuming that the pointers are aligned.

In fact, I just realized (by reading some miri testcases) Rust allows creating and using unaligned pointers in safe code...

#[repr(packed)]
struct Foo {
    x: u16,
    y: u32,
}

fn main() {
    let mut b = Box::new(Foo { x: 0, y: 0 });
    let y = &mut b.y;
    println!("{}", *y);
    *y = 13;
    println!("{}", *y);
    println!("{}", (y as *const _ as usize) % 4);
}

This prints 0 13 2 on playpen.

I am... confused now. It doesn't seem like Rust can make any assumptions whatsoever about the alignment even of safe references, can it? Does it?

Taking references to fields of repr(packed) types is unsafe precisely because of this issue, and RFC 1240 has long called for requiring unsafe for it, but apparently it's still not implemented.

1 Like

Ralf: your analysis seems basically correct. Some notes:

  • The linked iterator is IntoIter – we’re just reading out of 0x1 to spawn a value from the aether (no idea what the formalism is for that).

  • I believe all the slice/vec iterators contain raw pointers. This is necessary because they’re C-style “pair of pointer” iterators, and the empty iterator contains aliasing pointers, as well as potentially dangling ones for empty arrays. Ostensibly we don’t use slices because pair-of-pointers has minimal overhead (and optimizes easier).

  • I believe llvm’s align rules are similar to its null rules – they can be unaligned as long as you don’t load/store. There’s a special intrinsic for unaligned loads/stores that rustc exposes.

  • As rkruppe notes, packed structs are known to be unsound.

Okay. It’s somewhat irritating that this got stabilized without implementing the unsafe check, but well, hindsight is cheap.

The only way LLVM can optimize bit operations around pointers is if we emit !align metadata on loads/arguments, which acts like !nonnull by causing immediate UB. We won't do that with raw pointers of course.

I can’t find the code in LLVM anymore so perhaps it has been fixed now. What was happening was that arguments were considered always aligned to the pointee’s alignment even without any attributes.

The only remaining such logic appears to be specifically for loads (maybe stores too?), i.e. an unannotated load from a pointer which didn’t come from an alignment-specifying-source is always aligned, which seems reasonable.

Why that? ptr::read uses copy_nonoverlaping with an alignment of 1, so it actually doesn't seem to care about alignment. Furthermore, even if it did, one could argue that copy_nonoverlapping of size 0 is fine even for miss-aligned pointers because nothing actually happens.

That’s not so – ptr::read uses copy_nonoverlapping with the real raw pointer type, and the function takes the alignment from the pointer type. (The 1 is the count, not alignment.)

I totally agree with the zero size argument though. It is very hard to do something wrong with ZST reads/writes since they don’t actually read/write anything.

oops You are of course right.

Hmm, but if the @llvm.memcpy call states an alignment (greater than one), LLVM is presumably allowed to optimize under the assumption that the alignment is correct, so even though nothing is read it might still be UB.

Phrased differently, it's possible that ptr::read(p as *const [f64;0]) is actually equivalent to intrinsics::assume((p as usize) % 8 == 0).