Summary
Some API additions/changes to make it cleaner and make handling strings less painful
Motivation
Currently when I want to take, let’s say, 10th character out of a string, I need to do something like this:
s.chars().skip(9).next().unwrap()
I think we can make it simpler.
Detailed design
What if we could do something like this:
s[Byte(0)] // first byte
s[Char(1)] // second character
s[Graph(2)] // third grapheme
s[Word(-1)] // last word
s[Line(-2)] // second to last line
s[Word(1)..] // everything except the first word
s[Char(1)..Word(1)] // first word without the first char
s[Line(0)][Word(-2)..] // last 2 words of the first line
s[Byte(n)..][Char(0)] // char at byte offset n, old indexing behavior
s.iter::<Graph>() // iterator over graphemes
s.iter::<Word>() // iterator over words
// get number of words in string:
let word_num = s.len::<Word>();
// erase everything between lines containing "begin" and "end"
let (a, b): (Line, Line) = (s.find("begin"), s.find("end"));
s2 = s[...a] + s[b..];
Basically, we could use few predefined newtypes to specify if we’re interested in characters, graphemes etc. I didn’t think about how to implement it yet, but I’m sure it wouldn’t be too hard.
Drawbacks
- Adds some complexity to the std API
- Encourages use of potentially more expensive operations
Alternatives
Do nothing. Force programmer to use iterators. This would make it more explicit about how much work must be done by the processor to get nth thing from the string.
Unresolved questions
-
Is
[Char(a)..Char(b)]
even possible? Should it be limited to[Char(a..b)]
?- It seems that
Char(a)..Char(b)
would be possible, but notChar(a)..Graph(b)
- It seems that
- What should be the return types of indexing operation?
- Should result of
[Char(n)]
bechar
or&str
like others? - Result of
[Word(a..b)]
could be aVec<&str>
- Everything must be a str slice
- Should result of
- Isn’t syntax too verbose?
- How to make it more clear when operation is significantly more expensive (nth byte vs nth anything else)