Index by Regex?

DanielFath · March 14, 2015, 11:42pm

Well, often regex try to capture some combination of grapheme and not codepoints. My experience is that people don’t think in term of codepoints, they think more in terms of XX.XX.XXXX that matches visual appearance.

burntsushi · March 15, 2015, 4:09am

I’m honestly not a huge fan. It’s a nice trick, but it encourages task failure whenever the regex fails to match.

If you really want the indexing, I see no reason why this has to be in the regex crate. A wrapper type is a not a burden here, and it could even be less typing. (Just create your own constructor that calls Regex:new.)

Also, I think we should be moving toward stabilizing the regex crate. It’s rather popular, so we’ll risk making it de facto stable if we wait too long.

kstep · March 16, 2015, 10:36am

Yes, I get your POV. You are basically right about panics. In Ruby when regex “indexing” fails, it just returns nil, so it’s OK to have such sugar, but in Rust it’s not that simple. Now I think this feature doesn’t fit Rust ideology after all. And I don’t think a crate with such feature will be used by any significant number of users, so it just doesn’t worth efforts to create it.

ArtemGr · March 16, 2015, 11:40am

In Rust that would be Option<something>.

Now I think this feature doesn't fit Rust ideology after all.

Right now Rust is in it's "be minimalistic before 1.0" stage, everyone's holding they breath waiting for Rust to become stable, but I've heard voices that Rust will start to grow some extra weight later.

Writing

if let Some (match) = a_string[regex!(r"...")] {}

seems useful to me, it's easier to remember than the names of the methods regex crate uses (I keep hitting the docs to check them).

I'd vouch for making a crate even if there would be a few users. At least the gist will be easy to find that way.

kstep · March 16, 2015, 12:23pm

You can’t do return Option<something> from index. See the signatures (simplified a little):

trait Index<Idx> {
  type Output;
  fn index(&'a self, index: &Idx) -> &'a Output;
}

You have to return a reference to something, no owned data can be returned from index(). And the reference inherently must to be something from self (due to lifetime restrictions).

burntsushi · March 16, 2015, 12:23pm

Right, but when you're indexing, the regex match has to be unwrapped which will cause task failure if the match fails. I suppose an alternative to this is to return a zero length slice if the regex doesn't match. (This will conflate zero-length matches with non-matches, but that's probably OK to gloss over for a convenience such as indexing.)

I agree it's a neat trick, I'm just trying to say that this can be done entirely outside of the regex crate with almost no downsides. (Wrapper types are usually inconvenient because you have to re-implement all of the behavior of the underlying type, but in this case, all you really need it for is construction.)

Also, more importantly, I'm pretty sure that your if let Some(match) = a_string[regex!(r"...")] {} construct is not possible. (Whoops, @kstep beat me to it.)

kstep · March 16, 2015, 12:29pm

Hmm, that's not that elegant, but it may work. Not sure about ergonomics of such implementation, but I think it worth to make a PoC crate and give it a try in real world.

ArtemGr · March 16, 2015, 12:38pm

Ahh. Too bad one can’t get an Option from an index. That makes it less useful than I thought.

kstep · March 20, 2015, 10:13am

I just published simple PoC crate regindex (repository).

ArtemGr · March 20, 2015, 10:43am

Apparently one can’t make it to work as transparent as in Ruby (e.g. without the ReIdx wrapper) without it being a part of the regex crate?

burntsushi · March 20, 2015, 10:58am

My suggestions to make this more ergonomic outside of the regex crate seem to have gotten lost in the noise. I filed an issue with suggestions: https://github.com/kstep/regindex/issues/1

ArtemGr · March 20, 2015, 11:21am

If we’re going so far as to define a new macro, rei!, then why bother and not define matches! ("hello", r"el"), returning an Option to boot. So far I feel this attempt falls short of the Ruby version.

burntsushi · March 20, 2015, 11:24am

I’m here to help you make the best with the tools that we have. I’m not a fan of indexing at all (I can’t imagine when I would use it in real code because it encourages task failure when the regex doesn’t match), but I was trying to work within the parameters of the OP. Certainly, some other macro that doesn’t use indexing at all might be more convenient, although I’m not convinced that it would belong in the regex crate proper.

ArtemGr · March 20, 2015, 11:35am

Sure. And thank you very much. I’m just venting some of my concerns, like not being able to properly implement the indexing operator outside of the regex crate. @kstep, it’s due to the http://www.reddit.com/r/rust/comments/2sg60s/only_traits_defined_in_the_current_crate_can_be/, right?

kstep · March 20, 2015, 11:53am

Yes, that’s the whole point of wrapper newtype in Rust: to work around “trait or type must be in current crate” rule.

kstep · March 21, 2015, 12:59pm

I more or less finished the experimental version of regindex (github). Let’s see what happens next

ArtemGr · March 21, 2015, 8:43pm

I have to point out that it doesn't use the task failure, it just returns an empty slice.

Also, after extending the syntax a bit, this feels quite useful to me:

let foo = &uri[ri! (regex! (r"^/path/(\w+)/$$"), 1)];

Arguably it's intuitive, because taking a slice from a &str is obvious here. One doesn't need to guess at the types, errors and how to handle them, we just get a slice, either an empty one or with the matched group.

If regex! dependency wouldn't have been a problem, I'd even made it into

let foo = &uri[ri! (r"^/path/(\w+)/$$", 1)];

I really wish Rust've choosed the other way 'round the orphans, allowing one to extend the types freely as in Google Go and Scala (and Haskell?).

burntsushi · March 21, 2015, 9:22pm

Go uses structural subtyping, which is a completely different approach to polymorphism.

If you have a chance to talk to an experienced Haskell programmer, ask them about orphan instances and they’ll be eager to share a war story. More seriously though, the orphan rules has been the subject of a lot of attention. @nikomatsakis has a great write up on it: http://smallcultfollowing.com/babysteps/blog/2015/01/14/little-orphan-impls/

ArtemGr · March 21, 2015, 9:40pm

I did some Haskell myself, a long time ago, and speaking of war stories orphan rules might be the least of one’s worries. Write up you mention only explores the one side of the coin and we see that it’s a troublesome side. There might be more truth on the other side, at least it hasn’t been experimentally explored that much in Rust, I take it? Scala choosed the other side (the Dark Side, woo! ) and it nailed the extensibility problem square with it’s implicit classes, IMHO, although the Rust story might be different.

kstep · March 21, 2015, 9:46pm

I write production Scala code now, and what I want to say about implicit bounds, while they are very versatile and often useful, they are also often really confusing. I don’t like the idea of IDE to be a must have tool in order to program in some language, because without its type hints it’s often very difficult to say what’s really happens in your program. It often takes much time and effort just to make out where your implicit instances come from.

Upd. I actually like both Scala and Rust, but I think Rust approach in regard to orphan instances rules (while not being as versatile as Scala’s) is more right, in a sense it’s more obvious for a programmer, more explicit, and thus less error prone and easier to comprehend.

Topic		Replies	Views
Indexing Rust code in IntelliJ Rust tools and infrastructure	3	1423	March 25, 2019
`str` method for slicing code-point (i.e. `char`) ranges libs	23	2848	March 25, 2019
Wild idea: deprecating APIs that conflate str and [u8] libs	59	3550	November 12, 2020
Why doesn't Index for String delegate to Index for str? libs	9	1258	January 9, 2020
Using a more efficient string matching algorithm libs	39	6969	September 15, 2022

Index by Regex?

Related topics