Storing A Smaller Length Of A Slice And Accessing Beyond The Stored Length (But Inside The Actual Length)

Is this code UB?

let x = [0,1,2,3];

let y = &x[0..2];

unsafe { 
   let z = *y.as_ptr().add(3);
}

Yes, it is UB. This is better posted on users.rust-lang.org, internals is meant for the development of Rust, not general questions about Rust.

I thought unsafe coda guidelines was supposed to discuss UB... I mean, why is this UB?

Anyway: https://users.rust-lang.org/t/storing-a-smaller-length-of-a-slice-and-accessing-beyond-the-stored-length-but-inside-the-actual-length/41630

I think it would be on-topic to discuss whether or not this should be UB, as it would be to have a discussion on writing documentation on what is and isn't UB.

However, this question comes across much more as "can I write this code?" than "should rustc count this code as valid, theoretically?", and as such, it seems much, much more suited for users.rust-lang.org. Sure, it's about unsafe code, but that doesn't change the fact that this question is about using rust, and how rust behaves. You might be asking about the internal behavior of rust, but you're still asking as someone writing rust code, not as someone developing it, so the question belongs in the users forum.

This doesn't mean the question isn't important. It's just that this forum is most frequented by people wanting to discuss the internals of rust and plans for changing them, and questions like this are simply off-topic. This isn't a matter of priority, either - you'll get much better answers somewhere where the question is actually on-topic, and users.rust-lang.org is in general much more active than internals.

3 Likes

I presume it's because it makes reasoning about slices simple. It makes slices only what they are on the surface: (data, len), and there's no need to track where they originated from, and what length they really have.

If access outside of a slice was allowed, it'd make slice.split_at_mut() unsafe. One half could peek into the other, causing mutable aliasing. It might be possible to define rules around that, but that would be creating new type of shadow-slice with invisible extra lifetimes to track how it was created.

Beyond what @kornel said, in your particular case both arrays are constants, and the compiler is aware of this. That means that under the right circumstances (all uses of x are known at compile time, and x is never used except through y), the compiler could decide to reuse the trailing portion of the slice with the knowledge that attempts to access beyond the slice are actually illegal.

Now, before anyone gets the wrong idea, as far as I know the compiler does not do this at the current time. But it could be a useful optimization in the future for small embedded systems.

1 Like

I don’t think these comments about “what if the slice came from split_at_mut” are helping the discussion at all. On the user.rust-lang.org thread there’s a similar answer. The question is not about what-ifs regarding any different scenarios, it is also not about soundness if this kind-of code was provided in some API. The question just is: is this concrete piece of code UB or not.

I think it's relevant, because roughly speaking this code is UB, because of split_at_mut existing. Rust has one set of rules for all code. Rules that require making things like split_at_mut safe also declare OP's benign code as UB, just because what is UB is a general definition without exceptions for innocent cases.

I disagree.

Yes it is, because y is a reference to just the first 2 elements of the array. When a reference is cast to a raw pointer (such as with as_ptr here), that raw pointer may only be used for the memory the reference "points to", as determined by size_of_val(y). The code violates this by using the raw pointer outside of the memory the reference is valid for.

Whether or not this rule should be weakened, and whether it even can be weakened without sacrificing many optimizations, is being discussed at

Miri currently fails to detect this particular code as UB because Miri does not fully precisely track raw pointers. Doing so requires figuring out better what we want to do with integer-pointer casts, and also getting &raw used more throughout the ecosystem.

7 Likes

While split_at_mut is not directly relevant to the "is it UB?" question, it is extremely relevant for answering the "should it be UB?" or "could it ever not be UB?" follow-up questions that people almost always end up asking (and often kinda already intended in their original question) on "is it UB?" threads.

I can't really imagine us defining this behavior only for const raw pointers but leaving it UB for mut raw pointers, and IIUC aliasing mut raws is (in Stacked Borrows) already categorically UB unless there's an UnsafeCell involved. So "things like split_at_mut need to be implementable" seems like a knockdown argument here, unless I'm missing something.

3 Likes

FWIW I honestly do not see the connection to split_at_mut here. split_at_mut could be sound even if this code was allowed, as the code does not even call split_at_mut. Sure, doing both this and split_at_mut would be UB, but that in no way implies that doing just this has to be UB.

4 Likes

Another question: Aren’t pointers to slice entries allowed to point one past the end so that you can have a pointer to compare to if you finished iteration? Then, this whole code would possibly not be UB...

Edit: To elaborate, the documentation for pointer add says

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

where the explicit mention of “byte” confuses me a bit, but in essence the reason for this specification is stuff like iterating over a slice, isn’t it?

Edit2: I miscalculated.. the smaller slice only goes to 2 not inclusive :smiley:

Edit3: This however proves my previous point (of this having nothing to do with split_at_mut) if this means that .add(2) would not have been UB.

Edit4: Damn I’m blind, I didn’t spot the * dereferencing the thing all the time. I thought this was about if the pointer add is legal.

@RalfJung Would you say that the pointer addition y.as_ptr().add(3) itself is already UB in the code?

Ah, I think I see the confusion. I was imagining this code snippet being exposed directly to safe code as a safe API which then could be combined with split_at_mut by additional safe client code to cause problems. But if this pointer and anything derived from it stay within the module, then you're right that there's tons of wiggle room here.

No it is not UB. The documentation for add says that only the limits of the allocation itself are relevant for pointer addition.

The concerns I raised (about where the pointer "comes from") only enter the picture once you actually use the pointer.

Ah I see. That would be the question of whether this code is sound, and to ask that question we'd have to see at which types the code is exposed to safe code.

But the OP was (I think) asking about whether the code has UB, not whether it can be soundly exposed to arbitrary safe code.

Interisting analysis. My next question: Would that mean it’s not a valid optimization for a compiler when given code like

let x = [0; 10000];
let y = &x[0..10]
// ... rest of code only uses y, not x

to argue, “only the first 10 entries of x are ever used, let’s save loads of stack space and turn this into

let x = [0; 10];
let y = &x[0..10]
// ... rest of code only uses y, not x

”?

1 Like

As a source-to-source transformation, that would indeed be incorrect. In a lower-level IR, after removing some requirements about inbounds pointer arithmetic, such transformations could still be possible.

I'm surprised you think it's not relevant. So does this mean that in this example reach_beyond may or may not be UB depending on which codepath is taken?

fn split_shared(x: &mut [0; 1000])
    reach_beyond(x.split_at(500).0)
}

fn split_mut(x: &mut [0; 1000])
    reach_beyond(x.split_at_mut(500).0)
}

fn reach_beyond(x: &[i32]) {
    unsafe {
        *x.as_ptr().add(x.len()+1);
    }
}

Functions aren't UB, programs are. It makes no sense to ask if a function is UB, just like it makes no sense to ask of a function terminates. You have to say for which inputs.

For functions, we can just ask if they are sound, which means "cannot cause UB when invoked from safe code" (or, equivalently, "cannot cause UB when invoked with inputs that satisfy the safety invariant of the respective types").

For your example, both split_mut and split_shared are unsound for the reasons I explained above (using a raw pointer outside its valid range). I don't see how split_at_mut would make a difference here.

3 Likes