Why even unused data needs to be valid

I think here you are mixing up Rust's two kinds of invariants. Only the validity invariant is UB when being violated. The validity invariant is fixed by the language spec, the user has no influence here. In contrast, what you are describing is a safety invariant.

The compiler doesn't care when code violates safety invariants, it doesn't even know what safety invariants are. You only get actual, Miri-detectable, "language UB" once the code does something that is specified as UB in the reference.

When libraries specify assumptions they make about user code, violations of those assumptions do not necessarily lead to language-level UB, but they could. We could call this "library UB", and it basically means you are leaving the stability guarantee provided by the library and may encounter undocumented behavior (which may or may not be language UB now, and that could change in the future as well with library upgrades).

For example, it is not UB to create a non-UTF-8 &str. But it could be UB to call a &str-taking method on such an ill-formed str (depending on what that method does, it may crucially rely on UTF-8). The ill-formed str violates the safety invariant but satisfies the validity invariant (the latter is the same as the validity invariant of &[u8]).

5 Likes