&str.is_substr(&str) -> Option<usize>

Writing a parser. Thought of this. Why keep track of position twice if the language already does it for you? Only problem is there's currently no way to get that position back. For the most part, "position", as an offset from the start of the string, is uninteresting, but "address" is useful... except in error messages and other user-facing locations, where "position" is more useful. But well, you can easily get a position if you have two addresses and you know that one of them is contained within the range of the other. So it'd be nice to have something like that for &str and for &[T].

One thing to note is that this has some interesting edge-cases around &s[0..0] etc. Not all empty strings are created the same. But it's not really a problem as that's already exposed with std::ptr::eq.

This already exists

str::find

// or do you mean the arguments the other way around?
// or something entirely different?
fn is_substr(this: &str, pattern: &str) -> Option<usize> {
    this.find(pattern)
}

(playground)

1 Like
let x = "aa";
let y = &x[0..1];
let z = &x[1..2];

str::find will not return the correct position for these substrings, which makes it completely useless for outputting the position in error messages.

Ah, got you. You mean to detect some actual subslices, i.e. when it’s the same piece of memory.

So something like this?

1 Like

something like that, yeah.

And also for slices ;-)

interesting use of the end/length thing. but hm maybe panic on empty strings/slices? Rust Playground

1 Like

Maybe also panic for zero-sized types... I mean—I guess it already panics when the modulo by size is hit.

2 Likes

can you make this a crate? (like, it is your code and stuff.)

I thought about it. I don’t feel like I need to have this as a crate of mine. I don’t feel like the current naming is optimal and I’m also too lazy to create the missing assertions and add proper documentation. If you think this is something you need, feel free to use and modify the code from my playground however you like.

1 Like

This is RFC 2796 (still open). There's some discussion around ZSTs there too.

Damn that RFC proposed implementation is full of UB ... I mean, I guess the “empty slice right at the start of the next allocation” issue was already called out over there, but it’s also sound to obtain pointers to &[[T; 2]] sub-slices that aren’t properly aligned, e.g. a subslice [[2,3]] of [[1,2],[3,4]]. This is for example possible with the bytemuck crate.

And offset_from needs

  • The distance between the pointers, in bytes, must be an exact multiple of the size of T .

for safety.

Also @Soni, in case you were wondering, this kind of setting is why there’s a offset % size == 0 check in my playground.

1 Like

thoughts on range_start_of as a name?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.