Storing A Smaller Length Of A Slice And Accessing Beyond The Stored Length (But Inside The Actual Length)

brunoczim · April 27, 2020, 3:32am

Is this code UB?

let x = [0,1,2,3];

let y = &x[0..2];

unsafe { 
   let z = *y.as_ptr().add(3);
}

RustyYato · April 27, 2020, 3:48am

Yes, it is UB. This is better posted on users.rust-lang.org, internals is meant for the development of Rust, not general questions about Rust.

brunoczim · April 27, 2020, 4:08am

I thought unsafe coda guidelines was supposed to discuss UB... I mean, why is this UB?

Anyway: https://users.rust-lang.org/t/storing-a-smaller-length-of-a-slice-and-accessing-beyond-the-stored-length-but-inside-the-actual-length/41630

daboross · April 27, 2020, 5:40am

I think it would be on-topic to discuss whether or not this should be UB, as it would be to have a discussion on writing documentation on what is and isn't UB.

However, this question comes across much more as "can I write this code?" than "should rustc count this code as valid, theoretically?", and as such, it seems much, much more suited for users.rust-lang.org. Sure, it's about unsafe code, but that doesn't change the fact that this question is about using rust, and how rust behaves. You might be asking about the internal behavior of rust, but you're still asking as someone writing rust code, not as someone developing it, so the question belongs in the users forum.

This doesn't mean the question isn't important. It's just that this forum is most frequented by people wanting to discuss the internals of rust and plans for changing them, and questions like this are simply off-topic. This isn't a matter of priority, either - you'll get much better answers somewhere where the question is actually on-topic, and users.rust-lang.org is in general much more active than internals.

kornel · April 27, 2020, 10:24am

I presume it's because it makes reasoning about slices simple. It makes slices only what they are on the surface: (data, len), and there's no need to track where they originated from, and what length they really have.

If access outside of a slice was allowed, it'd make slice.split_at_mut() unsafe. One half could peek into the other, causing mutable aliasing. It might be possible to define rules around that, but that would be creating new type of shadow-slice with invisible extra lifetimes to track how it was created.

ckaran · April 27, 2020, 12:53pm

Beyond what @kornel said, in your particular case both arrays are constants, and the compiler is aware of this. That means that under the right circumstances (all uses of x are known at compile time, and x is never used except through y), the compiler could decide to reuse the trailing portion of the slice with the knowledge that attempts to access beyond the slice are actually illegal.

Now, before anyone gets the wrong idea, as far as I know the compiler does not do this at the current time. But it could be a useful optimization in the future for small embedded systems.

steffahn · April 27, 2020, 12:55pm

I don’t think these comments about “what if the slice came from split_at_mut” are helping the discussion at all. On the user.rust-lang.org thread there’s a similar answer. The question is not about what-ifs regarding any different scenarios, it is also not about soundness if this kind-of code was provided in some API. The question just is: is this concrete piece of code UB or not.

kornel · April 27, 2020, 12:59pm

I think it's relevant, because roughly speaking this code is UB, because of split_at_mut existing. Rust has one set of rules for all code. Rules that require making things like split_at_mut safe also declare OP's benign code as UB, just because what is UB is a general definition without exceptions for innocent cases.

steffahn · April 27, 2020, 12:59pm

I disagree.

RalfJung · April 27, 2020, 1:01pm

Yes it is, because y is a reference to just the first 2 elements of the array. When a reference is cast to a raw pointer (such as with as_ptr here), that raw pointer may only be used for the memory the reference "points to", as determined by size_of_val(y). The code violates this by using the raw pointer outside of the memory the reference is valid for.

Whether or not this rule should be weakened, and whether it even can be weakened without sacrificing many optimizations, is being discussed at

github.com/rust-lang/unsafe-code-guidelines

Stacked Borrows: raw pointer usable only for `T` too strict?

opened 08:19AM - 28 May 19 UTC

RalfJung

C-open-question A-stacked-borrows S-pending-design A-SB-vs-TB

Currently, the following is illegal according to Stacked Borrows: ```rust let …val = [1u8, 2]; let ptr = &val[0] as *const u8; let _val = unsafe { *ptr.add(1) }; ``` The problem is that the cast to `*const u8` creates a raw pointer that may only be used for the `u8` it points to, not anything else. The most common case is to do `&slice[0] as *const _` instead of `slice.as_ptr()`. This has lead to problems: * [rand did the `&slice[0]` thing](https://github.com/rust-random/rand/issues/779). * [Same for hashbrown](https://github.com/rust-lang/hashbrown/pull/80). * [`Rc::into_raw`+`Rc::from_raw` don't work well together because of this](https://github.com/rust-lang/unsafe-code-guidelines/issues/134#issuecomment-496469397). * [capnproto also used the `&slice[0]` pattern](https://github.com/capnproto/capnproto-rust/commit/72480efb3514d32278bd2502a7b90b22a34d12b8) Maybe this is too restrictive and raw pointers should be allowed to access their "surroundings"? I am not sure what exactly that would look like though. I'll use this issue to collect such cases.

Miri currently fails to detect this particular code as UB because Miri does not fully precisely track raw pointers. Doing so requires figuring out better what we want to do with integer-pointer casts, and also getting &raw used more throughout the ecosystem.

Ixrec · April 27, 2020, 1:11pm

While split_at_mut is not directly relevant to the "is it UB?" question, it is extremely relevant for answering the "should it be UB?" or "could it ever not be UB?" follow-up questions that people almost always end up asking (and often kinda already intended in their original question) on "is it UB?" threads.

I can't really imagine us defining this behavior only for const raw pointers but leaving it UB for mut raw pointers, and IIUC aliasing mut raws is (in Stacked Borrows) already categorically UB unless there's an UnsafeCell involved. So "things like split_at_mut need to be implementable" seems like a knockdown argument here, unless I'm missing something.

RalfJung · April 27, 2020, 1:14pm

FWIW I honestly do not see the connection to split_at_mut here. split_at_mut could be sound even if this code was allowed, as the code does not even call split_at_mut. Sure, doing both this and split_at_mut would be UB, but that in no way implies that doing just this has to be UB.

steffahn · April 27, 2020, 1:15pm

Another question: Aren’t pointers to slice entries allowed to point one past the end so that you can have a pointer to compare to if you finished iteration? ~~Then, this whole code would possibly not be UB...~~

Edit: To elaborate, the documentation for pointer add says

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

where the explicit mention of “byte” confuses me a bit, but in essence the reason for this specification is stuff like iterating over a slice, isn’t it?

Edit2: I miscalculated.. the smaller slice only goes to 2 not inclusive

Edit3: This however proves my previous point (of this having nothing to do with split_at_mut) if this means that .add(2) would not have been UB.

Edit4: Damn I’m blind, I didn’t spot the * dereferencing the thing all the time. I thought this was about if the pointer add is legal.

@RalfJung Would you say that the pointer addition y.as_ptr().add(3) itself is already UB in the code?

Ixrec · April 27, 2020, 1:20pm

Ah, I think I see the confusion. I was imagining this code snippet being exposed directly to safe code as a safe API which then could be combined with split_at_mut by additional safe client code to cause problems. But if this pointer and anything derived from it stay within the module, then you're right that there's tons of wiggle room here.

RalfJung · April 27, 2020, 3:12pm

No it is not UB. The documentation for add says that only the limits of the allocation itself are relevant for pointer addition.

The concerns I raised (about where the pointer "comes from") only enter the picture once you actually use the pointer.

Ah I see. That would be the question of whether this code is sound, and to ask that question we'd have to see at which types the code is exposed to safe code.

But the OP was (I think) asking about whether the code has UB, not whether it can be soundly exposed to arbitrary safe code.

steffahn · April 27, 2020, 3:27pm

Interisting analysis. My next question: Would that mean it’s not a valid optimization for a compiler when given code like

let x = [0; 10000];
let y = &x[0..10]
// ... rest of code only uses y, not x

to argue, “only the first 10 entries of x are ever used, let’s save loads of stack space and turn this into

let x = [0; 10];
let y = &x[0..10]
// ... rest of code only uses y, not x

”?

RalfJung · April 27, 2020, 3:40pm

As a source-to-source transformation, that would indeed be incorrect. In a lower-level IR, after removing some requirements about inbounds pointer arithmetic, such transformations could still be possible.

kornel · April 27, 2020, 5:15pm

I'm surprised you think it's not relevant. So does this mean that in this example reach_beyond may or may not be UB depending on which codepath is taken?

fn split_shared(x: &mut [0; 1000])
    reach_beyond(x.split_at(500).0)
}

fn split_mut(x: &mut [0; 1000])
    reach_beyond(x.split_at_mut(500).0)
}

fn reach_beyond(x: &[i32]) {
    unsafe {
        *x.as_ptr().add(x.len()+1);
    }
}

RalfJung · April 27, 2020, 10:55pm

Functions aren't UB, programs are. It makes no sense to ask if a function is UB, just like it makes no sense to ask of a function terminates. You have to say for which inputs.

For functions, we can just ask if they are sound, which means "cannot cause UB when invoked from safe code" (or, equivalently, "cannot cause UB when invoked with inputs that satisfy the safety invariant of the respective types").

For your example, both split_mut and split_shared are unsound for the reasons I explained above (using a raw pointer outside its valid range). I don't see how split_at_mut would make a difference here.

Topic		Replies	Views
Split keyword `unsafe` into `unsafe` / `checked` language design	7	662	March 4, 2024
Terminology around unsafe, undefined behaviour, and invariants Unsafe Code Guidelines	39	2743	July 24, 2020
Bounds checking at initialization ideas (deprecated)	10	3658	March 25, 2019
Uninitialized memory	57	9971	March 25, 2019
Ergonomics of raw-pointer slices language design	7	1096	December 10, 2022

Storing A Smaller Length Of A Slice And Accessing Beyond The Stored Length (But Inside The Actual Length)

Related Topics