[EDIT]
Everything I said previously is irrelevant. @huon made an excellent point which makes this all irrelevant.
Note: I’m not prepared to turn this into an RFC PR. At most I might make this into an RFC issue. I tried to make this clear. I hope it is.
This is a take on what a more flexible for
loop syntax might look to support some more complex cases a regex parser would require. This is also difficult to setup for a regex because the syntax must be extremely flexible. I think more than this would be required. I don’t know if the concept would fit into Rust though. I was mainly just trying to find a reasonable syntax.
cc @P1start (I think you were looking at iterators once)
In a regex parser you might want to parse something like r"(dave)"
or r"([abc]+)"
.
Each different character set such as ()
, []
, and +
all will require differently parsing,
and thus will likely be separated into separate functions. This makes using iterators
difficult because more specialized parsing cannot be delegated to separate functions
// Doesn't work but even if it did, it's repetitive
// repeating the `for` for every new expansion.
fn parse_more(v: Iterator) {
for j in v {
println!("{}, {}", i, j);
}
}
fn main() {
let v = 0..10;
for i in v {
// Cannot send `v` into new function to continue parsing.
parse_more(v);
}
}
This leaves you to manually iterate over the sets via indexing a Vector with loop
and such.
This is problematic because every time you need to specialize parsing a group of characters you
must redo indexing again which is repetitive and ignores iterators.
struct S {
index: usize,
vec: Vec<char>,
}
impl S {
fn curr(&self) -> char { self.vec[self.index] }
fn new(v: &'static str) -> S {
S { index: 0, vec: v.graphemes() }
}
fn next(&self) -> bool {
self.index += 1;
if self.index < self.vec.len()
}
fn parse_paren(&self) {}
fn parse_everything_else(&self) {}
}
fn main() {
let s = S::new(r"(dave)");
while s.next() {
match s.curr() {
'(' => s.parse_paren(),
_ => s.parse_everything_else(),
}
}
}
This proposes multiple things which I think are all required to handle this:
1: for match
: combine for
and match
The problem is by calling parse_*
you are required to pass the interators
current state outside of the for
to handle iterating which cannot be done. Since
you aren’t modifying the iterator, it would be nice to continue using the iterator
while allowing logic separation.
fn paren() {}
fn remains() {}
for c in r"(dave)".graphemes() {
match c {
// Won't work because requires passing
// the iterator state outside the `for` loop.
'(' => paren(),
_ => remains(),
}
}
What about supplying functionality connected to the match
body. This is problematic
because the functionality depends on which iteration. The first (
match state that
all the following iterations will go to that function branch until recalled but there’s
no reason a match
have any foreknowledge of which iteration it’s on.
let mut m = vec![];
for c in r"(dave)ted".graphemes() {
match c {
// If `(`, call `paren` after `match` close. Otherwise, push char
// and move to `remains` *function branch* next iteration.
'(' => to.paren(),
_ => m.push(c),
} paren {
')' => to.normal(), // Recall to normal `match` for remaining iterations.
_ => m.push(c),
}
}
By combining for
and match
, the iterator count could be known. For an AST which could be recursive, each function branch would need to have it’s own information so they could eventually be returned to be appended at the end.
use AST::*;
enum AST {
Char(char),
Group(Vec<AST>),
}
// Trying to parse into: `Vec(Char('a'), Group(Char('b'), Char('c')), Char('d'))`
let chars = r"(a(b,c)d)";
let mut m: Vec<AST> = vec![];
for match c in chars.graphemes() {
// Pass to `paren` functionality next iteration. Initialize `inner` so that
// `paren` can immediately start pushing to it. When `paren` calls `leave`,
// return to normal `match` branching. Values can be returned via `leave`.
'(' => { let inner = vec![];
paren() },
} paren {
// If inside `paren` and finds another `paren`, go recursive. `inner` must
// be kept track of separately.
'(' => { let inner = vec![];
paren() }, // Recursive
_ => {
// Check if already has been run recursively. If so,
// push result onto vector.
if child.exists() { inner.push(child) }
// Finds `)` then return. Variable is saved to `child`.
if v == ')' { leave(Group(inner)) }
inner.push(Char(c));
},
// Not sure the best way to return something here though.
} m.push(leave); // push completed AST onto `m` when complete.
2: peek()
and next()
available inside a loop
peek()
must be available inside a for loop. Maybe it should take a range and yield a slice. Not sure. next()
would take an integer. Names could be changed. Maybe related to continue
.
use AST::*;
enum AST {
Char(char),
Dot,
Ellipsis,
}
// If `.` is anycharacter and `...` is something else then they must
// be considered separately.
let chars = r"...";
let mut m: Vec<AST>;
for match c in chars.graphemes() {
'.' => match peek(1..3) { ".." => { m.push(Ellipsis);
next(2); },
_ => m.push(Dot), },
'_' => m.push(c),
}