My understanding is, that the length of slices is limited to at most isize::MAX. Slices longer than that would be unsound. Therefore, i.e. adding up the length of two different slices or adding a small number should never overflow/wrap.
Still, when looking at the compiler output for such operations, this guarantee seems to not be picked up: [godbolt]
pub fn add_one<T>(slice: &[T]) -> usize {
match slice.len().checked_add(1) {
Some(n) => n,
None => unreachable!()
}
}
pub fn add_one_assume<T>(slice: &[T]) -> usize {
unsafe { assert_unchecked(slice.len() <= isize::MAX as usize) };
match slice.len().checked_add(1) {
Some(n) => n,
None => unreachable!()
}
}
pub const unsafe fn assert_unchecked(expr: bool) {
if !expr {
std::hint::unreachable_unchecked();
}
}
Here, the second function generates panic-free code, as expected, while the first function does not.
In practice, this has negative performance implications for i.e. array-vectors, since for a function like copy_from_slice the compiler cannot guarantee proof that self.len() + slice.len() >= self.len() which leads to unnecessary panicking branches.
A really simple solution would be to just add an assert_unchecked statement to the slice len function. Would a PR for that be accepted?
As one detail, only slices with T not zero-sized have guaranteed len <= isize::MAX as usize. The guarantee is more precisely that len * size_of::<T>() <= isize::MAX as usize, so that's an even stronger guarantee in case that size_of::<T>() > 1. (I'm not sure if this stronger guarantee needs to be re-written into something of a len <= calculated_max_len::<T>() form for LLVM to make full use of it.)
I have adding validity ranges to the &[T] metadata on my todo list but it requires changes to the compiler. Adding assumes in the library won't get us the niches but should be easier to implement.
Is it actually always true? That's true for 64-bit systems, probably for 32-bit ones. But what about 16-bit, or any other possible bitness? I would assume that such users would not be happy to slash half of the possible address space.
They can use the full address space, but not in a single object. Pointer offsets are signed, all the way down to LLVM getelementptr, so must not exceed isize::MAX.
So long as we're using LLVM and interoping with C, it'll be the rule regardless of pointer size. So we've just adopted it into Rust as a fundamental restriction on objects.