I could not find the following proposal discussed anywhere, which is surprising, since it looks like a very natural addition.
The proposal is to add substring pattern syntax for pattern-matching, which would serve the same purpose as the slice patterns for slices and arrays. The syntax is prone to bikeshedding, but imho it would be most natural to generalize concat!
and concat_bytes!
built-in macros to allow using them in pattern position.
An example which would be easier with this proposal:
// before
fn frobnicate_str(s: &str) -> Frob {
if s.starts_with("foo") || s.starts_with("moo") {
frobnicate(&s["foo".len()..])
} else if s.starts_with("bozoom") {
frobnicate(&s["bozoom".len()..])
} else if s.starts_with("ba") {
barnicate(&s["ba".len()..])
} else if s.ends_with("quux") {
quuxnicate(&s[..s.len() - "quux".len()])
} else {
panic!("not a frobnicatable string")
}
}
// after
fn frobnicate_str(s: &str) -> Frob {
match s {
concat!("foo" | "moo" | "bozoom", rest@..) => frobnicate(rest),
concat!("ba", rest@..) => barnicate(rest),
concat!(start@.., "quux") => quuxnicate(&start),
_ => panic!("not a frobnicatable string"),
}
}
In my opinion, a match is easier to read that a chain of ifs: the conditions are very explicit, the pattern matching allows to encode complex conditions in a simple form, and the compiler guarantees exhaustive checks.
A similar construct should be added to allow matching on byte slices, i.e.
fn frobnicate_slice(s: &[u8]) -> Frob {
match s {
concat_bytes!(b"foo" | b"moo" | b"bozoom", rest@..) => frobnicate(rest),
concat_bytes!(b"ba", rest@..) => barnicate(rest),
concat_bytes!(start@.., b"quux") => quuxnicate(&start),
_ => panic!("not a frobnicatable string"),
}
}
Note that I do not propose to expand the pattern syntax in any way beyond concat!
and concat_bytes!
. In particular, allowing non-literal slices in the pattern, or patterns for containers such as String or Vec, or custom pattern matching, or allowing more than one variable-length pattern are beyond the scope of proposal.
Note that this is more of a syntax sugar than any real extension of the language's expressive power. For byte strings in particular, a pattern match
if let concat_bytes!(b"foo" | b"quux", rest@..) = b { ... }
can be equivalently rewritten to
if let [b'f', b'o', b'o', rest@..] | [b'q', b'u', b'u', b'x', rest@..] = b { ... }
However, the first pattern is much easier to read and write.
For strings, there is currently no direct equivalent of slice patterns. One could match the string slice as a byte slice and manually transcribe a string pattern into the corresponding byte pattern:
if let [b'f', b'o', b'o', rest@..] = s.as_bytes() { ...}
However, this is even less practical than for byte slices in cases where the string pattern goes beyond the ASCII range of characters. Also, rest
will be a byte slice within the match arm, so if we need to work with actual strings, then we must convert it back into a string slice, either performing redundant utf8 checks or using error-prone unsafe code.
Syntax
In the current proposal the obvious, although somewhat clunky, pattern syntax concat(_bytes)!(p1, ..., pn)
is used. Since concat(_bytes)!
is a macro, if a better syntax is ever found, the migration to new syntax can be made seamless by changing the expansion of the macros to the new pattern syntax. Also, since concat(_bytes)!
is a built-in macro, there is nothing preventing it from supporting the new patterns even if there is currently no non-macro way to express them in the language (just as there is currently no other way to concatenate compile-time slices).
The expansion of the macro doesn't need to know the surrounding context (i.e. whether it is expanded in pattern position), since adding the pattern syntax is backwards-compatible. Currently the only valid syntax is a concatenation of slice literals, and the macro expand to their concatenation. With this proposal the concatenation of slices is still a slice, just as before. However, it would be now possible to use more general slice patterns within the macro, in which case it would expand to some (for the moment unstable) slice pattern.
If Rust had a slice concatenation operator (say ++
), then the most natural syntax for string patterns would use ++
instead of concat!
. I find it unlikely that such operator would ever be added, though.
Semantics
The byte string patterns with concat_bytes!
can be directly desugared to the stable slice patterns as above, assuming that there are no or-patterns within the concat_bytes!
call. This means that, in particular, arrays can also be matched with the byte string patterns just like with the usual slice patterns.
Nested or-patterns present the same challenges as any other or-patterns. In principle, they can be directly desugared into sequences of ordinary patterns. In practice, this can lead to an exponential code blowup for deeply nested or-patterns, so some smarter algorithm may be desirable.
String slice patterns with concat!
do not currently have a stable analogue. One option is to introduce special unstable string slice patterns, just like core::ptr::addr_of!
expands to an unstable &raw
reference. Another option is to desugar string slice patterns into byte slice patterns, together with conversions to and from a byte slice.
Alternatives
For byte slices, the currently stable slice patterns allow to do express the same pattern matching as proposed above. However, this involves manually converting byte slices into explicit arrays of individual bytes, which is tedious and confusing when the byte slices actually represent ASCII strings.
For string slices, the stable alternative is to use the string splitting, pattern matching and subslice functions to manually implement pattern matching. This involves more boilerplate code, redundant definitions of the matched strings (possibly extracted to local variables), or manually computed and hardcoded offsets based on the lengths of fixed suffixes and prefixes. For ad-hoc parsing this can result in quite complex code, viz. the example at the start.