String multimatching?


#1

Note: I tried to search for something similar, but did not find anything. There is a string matching for a single pattern.

I was thinking of writing a string multi-matching based Aho-Corasick’s string matching algorithm. While Knuth-Morris-Pratt matches a single pattern against a text, Aho-Corasick is able to match a set of patterns against a text.

The interface could look something like this:

// some trait impl for &str
fn match_patterns(&self, patterns: &[&str]) -> Matches;

Where Matches would be an iterator over all positions each pattern match, if any.

Would this be something desirable? Is this suitable to be inside a library, if not std then some utility lib?


#2

We already have a regex crate, no? So couldn’t you achieve this effect by building a regex state machine for a single regex pattern constructed from the input string patterns, at runtime?

According to at least one google result: http://patrakov.blogspot.fr/2009/09/matching-multiple-strings.html it sounds like a decent regex library should be able to compete with Aho-Corasick, at least w.r.t. the time it takes to do the search.

(I do not know if there is a significant difference in the time it takes to construct the state machine; if there is, then I imagine the regex library could be revised to treat a series of (aa|bb|cc|...) as a special case.)


#3

Thanks, I didn’t realize we could do that, and it seems to work.