That’s not true, and part of the point of this post is to explain that.
(Or rather, it is true if you define Byte the way I do in the post. It is not true if you define Byte as "an element of 0..256.)
You can’t just assume there to be an in-memory representation. You have to define it. And these complex pointers do not fit into “realistic” representations that have 8 bits per byte. Hence the problem.
In other words, those complex pointers don’t really exist on the machine. They are an abstract concept introduced to define the language semantics, in something like a “virtual machine”. I wrote a blog post last year that explains this idea in more detail.
So you are saying it is not a problem that this operation is not allowed? That’s a fair position.
Unfortunately for us, it is possible to do this in safe Rust: (Box::new(0).into_raw() as usize) * 2. So we kind of have to define what this does, we cannot make it UB.
Another use case is having a HashMap where the key is a pointer type. Then you need to hash pointers, which typically involves things “much worse” than multiplying by 2.
You also didn’t say what to do instead, though.
Or are you not accepting that in a program like
use std::mem;
unsafe fn memcpy(src: *const u8, dest: *mut u8, n: usize) {
for i in 0..n {
let v = *src.offset(i as isize);
*dest.offset(i as isize) = v;
}
}
fn main() { unsafe {
let x = Box::into_raw(Box::new(0));
let mut y : *mut i32 = mem::uninitialized();
memcpy(&x as *const _ as *const u8, &mut y as *mut _ as *mut u8, mem::size_of::<*mut i32>());
let z = Box::from_raw(y);
drop(z);
} }
we have to define what the value of v is each time around the loop? This is reading a pointer byte-per-byte. We have to explain what happens. We have to pick the set of “values” v can have, and say which of those it is that it actually has in this execution.
I am explaining in the post why just considering integers is not enough for this set of values.
Those are all implementations of the language. None of them defines, abstractly, what the behavior must be for all conforming implementations. It would be a rather bad choice to say “you must use exactly what glibc’s malloc/jemalloc do” just because that’s what rustc happens to do right now.
Such a concrete model is also not suited for optimization, because it specifies way too many low-level details. For example, most of these fancy alias-analysis based optimizations would be entirely illegal (and my post explains why) if LLVM would actually let you rely on the fact that the pointers you are getting are integers computed by malloc.