Assume slice length is smaller equal `isize::MAX`

Currently, the slice len method just reads the pointer metadata and returns it:

pub const fn len(&self) -> usize {
    ptr::metadata(self)
}

My understanding is, that the length of slices is limited to at most isize::MAX. Slices longer than that would be unsound. Therefore, i.e. adding up the length of two different slices or adding a small number should never overflow/wrap.

Still, when looking at the compiler output for such operations, this guarantee seems to not be picked up: [godbolt]

pub fn add_one<T>(slice: &[T]) -> usize {
    match slice.len().checked_add(1) {
        Some(n) => n,
        None => unreachable!()
    }
}

pub fn add_one_assume<T>(slice: &[T]) -> usize {
    unsafe { assert_unchecked(slice.len() <= isize::MAX as usize) };
    match slice.len().checked_add(1) {
        Some(n) => n,
        None => unreachable!()
    }
}

pub const unsafe fn assert_unchecked(expr: bool) {
    if !expr {
        std::hint::unreachable_unchecked();
    }
}

Here, the second function generates panic-free code, as expected, while the first function does not. In practice, this has negative performance implications for i.e. array-vectors, since for a function like copy_from_slice the compiler cannot guarantee proof that self.len() + slice.len() >= self.len() which leads to unnecessary panicking branches.

A really simple solution would be to just add an assert_unchecked statement to the slice len function. Would a PR for that be accepted?

3 Likes

As one detail, only slices with T not zero-sized have guaranteed len <= isize::MAX as usize. The guarantee is more precisely that len * size_of::<T>() <= isize::MAX as usize, so that's an even stronger guarantee in case that size_of::<T>() > 1. (I'm not sure if this stronger guarantee needs to be re-written into something of a len <= calculated_max_len::<T>() form for LLVM to make full use of it.)

2 Likes

I have adding validity ranges to the &[T] metadata on my todo list but it requires changes to the compiler. Adding assumes in the library won't get us the niches but should be easier to implement.

1 Like

Lang conversation about what the actual validity rule for reference-to-slice should be:

Is it actually always true? That's true for 64-bit systems, probably for 32-bit ones. But what about 16-bit, or any other possible bitness? I would assume that such users would not be happy to slash half of the possible address space.

1 Like

They can use the full address space, but not in a single object. Pointer offsets are signed, all the way down to LLVM getelementptr, so must not exceed isize::MAX.

1 Like

So long as we're using LLVM and interoping with C, it'll be the rule regardless of pointer size. So we've just adopted it into Rust as a fundamental restriction on objects.