Faster PartialEq for small arrays of non 2^n length

I thought that was only true for raw pointers, and that constructing a reference to invalid data was immediate undefined behavior? for example, (&2 as *const u8 as *const bool) would be sound, but &*(&2 as *const u8 as *const bool) would not.

(member of T-opsem, but not speaking normatively)

It is not normative yet, but both proposed memory models for Rust say that reborrowing a memory region (&*ptr_or_ref, incl. implicit reborrows) do not inspect the contents of memory in any way.

I previously wrote a decent bit on this at

It's still very unsafe to create references to invalid, and it's certainly unsound as soon as it escapes to code which you don't control and doesn't document that it's fine with handling such, but the current model says that the reborrow is considered valid (not UB).

doesn't that change make &'static T effectively equivalent to NonNull<T>? i guess the difference is you can return raw pointers to invalid values from a safe function without it being unsound.

The high-level difference is that NonNull<T> doesn't place any requirements on the pointed to memory, whereas &T does require the referenced memory to exist and be borrowed. This allows us to optimize reference arguments with the LLVM attributes of dereferencable(N) and noalias. For a more precise explanation of what that entails, the explainer for Tree Borrows is actually fairly approachable.

We don't make validity recursive because there's no benefit to compiler optimizations in doing so, mainly because once you inspect the value that does require it to be valid. The superpower of Rust is in safety, not just validity. The compiler needs to accept code that is valid but questionable, but that doesn't mean you need to. Doing cursed things is always unsafe and ill advised, but soundness only makes sense to discuss as a property of an API, and it's perfectly possible to encapsulate an unsafe operation in a sound interface so long as it doesn't include any invalid operations.

(I'm speaking from my own understanding, and not representing a team position.)

2 Likes

You do it by starting from a UTF-8 str, then using https://doc.rust-lang.org/std/primitive.str.html#method.as_bytes_mut.

The reason for the str change is to make it explicitly legal to temporarily change the bytes in ways such that it's temporarily not valid UTF-8. Because realistically, overwriting a 2-byte-encoded sequence with nulls should be legal to do a byte at a time, even though if you write only one of the bytes you've made it temporarily not UTF-8.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.