I had this idea in my head for the last few months, in the last weeks I actually got around to actually implement it in a preliminary version, and now I'd like to get some feedback and questions about it. 
Current situation
Right now, string slices define a couple of methods to look for some string pattern in it - usually a char or a &str. To be exact, these are all methods in question:
pub trait StrSlice<'a> {
fn contains<'a>(&self, needle: &'a str) -> bool;
fn contains_char(&self, needle: char) -> bool;
fn split<Sep: CharEq>(&self, sep: Sep) -> CharSplits<'a, Sep>;
fn splitn<Sep: CharEq>(&self, sep: Sep, count: uint) -> CharSplitsN<'a, Sep>;
fn rsplitn<Sep: CharEq>(&self, sep: Sep, count: uint) -> CharSplitsN<'a, Sep>;
fn split_terminator<Sep: CharEq>(&self, sep: Sep) -> CharSplits<'a, Sep>;
fn split_str(&self, &'a str) -> StrSplits<'a>;
fn match_indices(&self, sep: &'a str) -> MatchIndices<'a>;
fn starts_with(&self, needle: &str) -> bool;
fn ends_with(&self, needle: &str) -> bool;
fn trim_chars<C: CharEq>(&self, to_trim: C) -> &'a str;
fn trim_left_chars<C: CharEq>(&self, to_trim: C) -> &'a str;
fn trim_right_chars<C: CharEq>(&self, to_trim: C) -> &'a str;
fn find<C: CharEq>(&self, search: C) -> Option<uint>;
fn rfind<C: CharEq>(&self, search: C) -> Option<uint>;
fn find_str(&self, &str) -> Option<uint>;
// ...
}
In this set of methods, there are quite some inconsistencies: Some are defined to work with char, some with CharEq, and some with &str. Furthermore, some operations are duplicated to provide different implementations for char/CharEq and &str.
In my opinion, this is a perfect opportunity to generically abstract over the concept of a "string pattern" to provide a unified API:
The new &str API
The new API is based on the idea of a string Pattern that can be used to obtain a double-ended-iterator-like matcher to find all occurrences of it in a string:
pub trait StrSlice<'a> {
fn contains<M, P: Pattern<'a, M>>(self, pat: P) -> bool;
fn split<M, P: Pattern<'a, M>>(self, pat: P) -> Splits<M>;
fn rsplit<M, P: Pattern<'a, M>>(self, pat: P) -> RSplits<M>;
fn splitn<M, P: Pattern<'a, M>>(self, pat: P, n: uint) -> NSplits<M>;
fn rsplitn<M, P: Pattern<'a, M>>(self, pat: P, n: uint) -> RNSplits<M>;
fn split_terminator<M, P: Pattern<'a, M>>(self, pat: P) -> TermSplits<M>;
fn rsplit_terminator<M, P: Pattern<'a, M>>(self, pat: P) -> RTermSplits<M>;
fn matches<M, P: Pattern<'a, M>>(self, pat: P) -> Matches<M>;
fn rmatches<M, P: Pattern<'a, M>>(self, pat: P) -> RMatches<M>;
fn match_indices<M, P: Pattern<'a, M>>(self, pat: P) -> MatchIndices<M>;
fn rmatch_indices<M, P: Pattern<'a, M>>(self, pat: P) -> RMatchIndices<M>;
fn starts_with<M: LeftMatcher<'a>, P: Pattern<'a, M>>(self, pat: P) -> bool;
fn ends_with<M: RightMatcher<'a>, P: Pattern<'a, M>>(self, pat: P) -> bool;
fn trim_matches<M: DoubleEndedMatcher<'a>, P: Pattern<'a, M>>(self, pat: P) -> &'a str;
fn trim_left_matches<M: LeftMatcher<'a>, P: Pattern<'a, M>>(self, pat: P) -> &'a str;
fn trim_right_matches<M: RightMatcher<'a>, P: Pattern<'a, M>>(self, pat: P) -> &'a str;
fn find<M: LeftMatcher<'a>, P: Pattern<'a, M>>(self, pat: P) -> Option<uint>;
fn rfind<M: RightMatcher<'a>, P: Pattern<'a, M>>(self, pat: P) -> Option<uint>;
// ...
}
Design details
Separate methods for front and back matching
The API is based around finding all non-overlapping occurrences of an arbitrary pattern in a string. This has one issue though:
If a pattern is able to match overlapping substrings, then finding non-overlapping matches from the front becomes a different operation than finding non-overlapping matches from the back. As an example, take the pattern "aa" and the haystack "aaa". Matching from the front gives you "[aa]a", while matching from the back gives "a[aa]".
For this reason, all methods on StrSlice that return an iterator based on matchers exist in two single-ended versions, rather than in one double-ended one.
Traits
- A
Pattern<'a, M>acts as a builder for creating a string matcherMfromSelf. Types likechar,&strorRegeximplement this in order to be directly usable with the methods ofStrSlice. -
Matcher,LeftMatcher,RightMatcher,DoubleEndedMatcher: Those provide a iterator-like interface for finding all non-overlapping occurrences of a pattern in a string, and are used as a building block for higher level operations like splitting or trimming.-
Matcheris the base trait, used for getting the original haystack&strback out of a matcher. -
LeftMatcheris for advancing the matcher state by finding the next match from the front. -
RightMatcheris for advancing the matcher state by finding the next match from the back. -
DoubleEndedMatcheris a marker trait for expressing that the matches ofLeftMatcherare equal to those ofRightMatcherin reverse order. This allows iterators build on matchers to still provide a double-ended iterator API forPatternimplementations where overlapping matches are impossible. (This feature is optional, but provided to be compatible with the current API)
-
Advantages
- One method name per operation; A single common abstraction rather than special cases for a few types
- Allows extending the string API with custom string patterns (Eg
Ascii,Regex, predicates...) - Allows extending the string API with custom string search algorithms (Eg
BoyerMoore(&str)) - Allows to specialize the implementation per pattern and matcher impl
Disadvantages
- More complexity up front due to generics
WIP Implementation
Issues, open questions
- Performance is untested
- Generics/abstraction layers might cause size bloat
- Generics/abstraction layers might hinder optimizations
- Some type inference bugs around closures get hit by the impl (
.split(|c: char| ... )fails to inferr) - Should
DoubleEndedIteratorimpls be supported where possible?
Future extensions
- A similar abstraction could be implemented for API's that modify a
String, so thatstring.push("foo"),string.push('f'),string.push('f'.to_ascii())all work. - This would allow stuff like
s.replace(®ex!(...), "foo"), which would be generic over both the pattern matched and the string fragment it gets replaced with.
New API with associated items and where clauses
pub trait StrSlice<'a> {
fn contains<P>(self, pat: P) -> bool where P: Pattern<'a>;
fn split<P>(self, pat: P) -> Splits<P::Matcher> where P: Pattern<'a>;
fn rsplit<P>(self, pat: P) -> RSplits<P::Matcher> where P: Pattern<'a>;
fn splitn<P>(self, pat: P, n: uint) -> NSplits<P::Matcher> where P: Pattern<'a>;
fn rsplitn<P>(self, pat: P, n: uint) -> RNSplits<P::Matcher> where P: Pattern<'a>;
fn split_terminator<P>(self, pat: P) -> TermSplits<P::Matcher> where P: Pattern<'a>;
fn rsplit_terminator<P>(self, pat: P) -> RTermSplits<P::Matcher> where P: Pattern<'a>;
fn matches<P>(self, pat: P) -> Matches<P::Matcher> where P: Pattern<'a>;
fn rmatches<P>(self, pat: P) -> RMatches<P::Matcher> where P: Pattern<'a>;
fn match_indices<P>(self, pat: P) -> MatchIndices<P::Matcher> where P: Pattern<'a>;
fn rmatch_indices<P>(self, pat: P) -> RMatchIndices<P::Matcher> where P: Pattern<'a>;
fn starts_with<P>(self, pat: P) -> bool where P: Pattern<'a>,
P::Matcher: LeftMatcher<'a>;
fn ends_with<P>(self, pat: P) -> bool where P: Pattern<'a>,
P::Matcher: RightMatcher<'a>;
fn trim_matches<P>(self, pat: P) -> &'a str where P: Pattern<'a>,
P::Matcher: DoubleEndedMatcher<'a>;
fn trim_left_matches<P>(self, pat: P) -> &'a str where P: Pattern<'a>,
P::Matcher: LeftMatcher<'a>;
fn trim_right_matches<P>(self, pat: P) -> &'a str where P: Pattern<'a>,
P::Matcher: RightMatcher<'a>;
fn find<P>(self, pat: P) -> Option<uint> where P: Pattern<'a>,
P::Matcher: LeftMatcher<'a>;
fn rfind<P>(self, pat: P) -> Option<uint> where P: Pattern<'a>,
P::Matcher: RightMatcher<'a>;
// ...
}