Proposal: Get Range of sub-slice

Swatinem · May 1, 2022, 12:15pm

I would like to have a slice method that returns the Range of a sub-slice, essentially mirroring .get(Range<usize>). If the sub-slice is fully contained within self, it returns the Range<usize> that would guarantee a roundtrip.

So we would have:

// &[T].get(Range<usize>) -> Option<&[T]>
// &[T].range_of(&[T]) -> Option<Range<usize>>

if let Some(subslice) = slice.get(1..3) {
  if let range = slice.range_of(subslice) {
    assert_eq!(range.clone(), 1..3);
    if let Some(roundtrip) = slice.get(range) {
      assert!(std::ptr::eq(subslice, roundtrip));
    }
  }
}

Context: While writing Self-referential structs and alternatives, I was a bit surprised this does not exist yet, and doing it right now even requires unsafe code.

bluss · May 1, 2022, 12:31pm

I think you can implement it using raw pointers, with no unsafe code. With ZST element types there will be a problem - the offset information is lost because the "pointer offset" per element is zero. This becomes a problem for generic code, where code that looks like it will have the same meaning for any element type, will have different results for T ZST or not.

CAD97 · May 1, 2022, 12:32pm

Not true: while we don't have a wrapping_sub_ptr, it's "just" doing math on the addresses, and can be polyfilled trivially: [playground]

fn wrapping_sub_ptr<T>(lhs: *const T, rhs: *const T) -> usize {
    let pointee_size = std::mem::size_of::<T>();
    (lhs as usize - rhs as usize) / pointee_size
}

pub fn range_of<T>(outer: &[T], inner: &[T]) -> Option<Range<usize>> {
    let outer = outer.as_ptr_range();
    let inner = inner.as_ptr_range();
    if outer.start <= inner.start && inner.end <= outer.end {
        Some(wrapping_sub_ptr(inner.start, outer.start)..wrapping_sub_ptr(inner.end, outer.start))
    } else {
        None
    }
}

That said, this could use the unsafe sub_ptr, so might be worth having in std, but it will need to do something special for ZSTs. (In fact, it's a meaningless question to ask if one ZST slice is "inside" another, and especially "where" inside it is.)

Swatinem · May 1, 2022, 12:47pm

Indeed dropping down to usize makes this a lot easier (and also "safe"). In my experiments I was trying to stay in ptr-land as much as possible, and tried using ptr::offset_from, which was unsafe and I guess might not have been the best choice.

So would you say this proposal is worth having in std? In which case I can play around and draft up a PR.

The question of ZST is an inderesting one indeed. Not sure what get does there currently.

With the recent talk about provenance, I’m also wondering if there is anything special to consider here, or if relational operators (the outer.start <= inner.start && inner.end <= outer.end you proposed) just work in that case?

CAD97 · May 1, 2022, 12:53pm

Pointer comparisons always work in Rust. In C++ land, comparisons have to be within the same allocated object, but in Rust, they're defined to give consistent but meaningless results when comparing different allocated objects. (This is a fancy way of specifying that they just compare the integer address, and do not care about provenance.)

get on a slice of ZST just operates on the indices. So &[(); 5][..].get(1..3) compiles down to basically just if 3 < 5 { Some((ptr, 3 - 1)) } else { None } since offsetting ZST pointers by stride is a no-op.

Swatinem · May 1, 2022, 1:14pm

Interesting, this indeed means that a range->range roundtrip is impossible to guarantee with ZSTs.

I wonder if just taking the easy was out and panic for ZST as a couple of other APIs do would be appropriate here.

Swatinem · May 1, 2022, 1:18pm

Hm, maybe I’m overthinking this a bit, but wouldn’t subslice checking need to take provenance into account? As in: the subslice has to be inside the allocation of the parent slice. I remember @Gankra mentioned hypothetical segmented architectures in this context, on which going purely by the integer address would be the wrong thing?

kpreid · May 1, 2022, 2:31pm

I think that, assuming this is sound, it would be worth including this operation in the standard library.

This is because the question has come up several times, and whenever it does, there's always a discussion of soundness and how best to implement it. A documented function would clarify the matter.

(It might also help steer people away from the variation that can't (?) be done soundly w.r.t. provenance, where instead of finding a slice in a larger slice, one attempts to "concatenate" two slices if they are determined to be adjacent in memory. Or, maybe there's even a way to do that with compiler support.)

CAD97 · May 1, 2022, 3:16pm

No, that's not possible, period, as the two slices could be in two adjacent allocations, and there is no way to detect that. If the two slices overlap at all, it is theoretically possible to merge them into a single slice, but that would require compiler magic (assuming references do subobject slicing of provenance, which current signs point to probably (with caveats)).

kpreid · May 1, 2022, 3:22pm

Thanks for confirming. So, I would propose that the "find slice in slice" function's documentation should include a note that you can't have that version.

elidupree · May 1, 2022, 4:06pm

… Although, thinking about it in this context, if you already possess the parent slice AND both subslices, that would allow you to have a convenience function to concatenate them.

chrefr · May 1, 2022, 9:38pm

Are there any benefits to using offset_from() or sub_ptr(), since we anyway check the preconditions? If not, I don't think we should provide such function in std when everybody can do that easily.

quinedot · May 1, 2022, 10:30pm

This is (part of) the (open) RFC 2796.

scottmcm · May 2, 2022, 4:02am

The main things those do compared to implementing them yourself is that they pass extra flags to LLVM to let it know about overflow or about the exactness of the division. It might be that LLVM can derive those given the other checks you're doing, however.

chrefr · May 2, 2022, 5:34am

Yes but we derive them from aligned pointers (so LLVM should, even though I don't know if it does, know the division is exact), and we already check the pointers are in range so overflow won't occur and LLVM should (ditto).be able to prove that. If that's true, I see no reason to provide this function in std unless it's very useful.

CAD97 · May 2, 2022, 5:46am

Another benefit (other than the function being nontrivial to write correctly, especially due to the ZST issue) is that sub_ptr, due to being known by the compiler, can be implemented in const, whereas ptr as usize can never be const.

And as I understand it, that's the main benefit of using the intrinsic form for fn(ptr, ptr) -> usize intrinsics; asserting that the pointers are to the same allocation allows the (const eval | optimizer) to actually know that the pointers have a consistent relationship which can be constant folded.

Given C++ requires pointer comparisons to be within the same allocation anyway, I wouldn't be surprised if LLVM actually isn't able to derive from potentially-allocation-crossing pointer comparison that the inner span is actually within the outer span's allocation, since that's just assumable in C++ once you're doing the comparisons on pointers. But at the same time, any optimization potential is fairly second-order "the optimizer knows more" anyway if it's not just been outright inlined with a fully constant foldable answer anyway.

TL;DR using sub_ptr allows it to be Rust const and gives LLVM information that'd be given in the normal C++ way to write it.

Swatinem · May 2, 2022, 8:57am

FYI if anyone has missed it, I opened this as Add `range_of` to slice/str return a `Range`, opposite to `get` by Swatinem · Pull Request #96612 · rust-lang/rust · GitHub (and corresponding tracking issue). So far just dropping down to integer math, but with a NOTE to switch to sub_ptr once that is available.

I was trying to make this const, but the compiler didn’t like the pointer comparisons.

bluss · May 2, 2022, 6:26pm

I'll leave PR & RFC discussions to those that have the presence of mind and skills to participate, but the ZST problem is rather severe for this API in my opinion, since it's a "non-parametric" restriction to the usage of it, i.e that it can show up in most generic code without warning. I would avoid using panic for that case.

chrefr · May 2, 2022, 9:05pm

Do you have a better alternative than a panic, and e.g. offset_from() does the same. It would be better to have an error (maybe even a post-monomorphization error) but there is no easy way to get that as of today.

bluss · May 3, 2022, 5:24pm

offset_from is low-level and for raw pointers unfortunately a lot needs to be done differently for ZST. So the panic is more understandable there. With subslice range I'm just fearful the panic will show up layers removed from the original code, when someone unexpectedly inserts a ZST type into a generic algorithm. Result/error would be strict but inconvenient.

Topic		Replies	Views
Assume slice length is smaller equal `isize::MAX` libs	7	450	June 18, 2024
Slice.take function libs	17	903	June 24, 2019
`slice` lacks `get_array_ref` functionality	13	1020	April 18, 2022
Get a subslice centered around a value: `slice::window` libs	8	608	February 23, 2024
Ergonomics of raw-pointer slices language design	7	1548	December 10, 2022

Proposal: Get Range of sub-slice

Related topics