Iterating over Range<char>?

flundstrom2 · December 7, 2018, 2:30pm

@birkenfeld: Today I found myself, for advent-related reasons, wanting to iterate over 'a'..='z' . Turns out this isn’t possible since char doesn’t implement Step

@CAD97: I’d be happy to see Range<char> “just work”, though.

@withoutboats: I think iterating over a range of chars should be possible, and chars should just skip the gap if a range would iterate past it. chars have a defined order and a defined set of valid values, so iterating through a set of consecutive values should be possible (and it shouldn’t fail).

Iterating over chars, in terms of letters, are completely different from iterating over chars in terms of unicode code-points. Say, you want to iterate over 'a'..='z', how many chars will that be?

The naïve response is 26, but that is if you iterate over the English alphabet. However, if the program is written on an English computer, but run on a non-English computer, the result would be different: For example, the Icelandic alphabet doesn't even contain z anymore, so it would imply an undefined behaviour. And even if the alphabet did contain z (which it actually have had) , the Icelandic alphabet contains 32 letters: 'a'..='ö'

The letter ö, by the way, has code-point U+00F6, quite far from z. Not to mention the fact that the letters between a and d, are á (code-point U+00E1), and b, but no c. Another interesting thing, is that prior to 2006, w was not a letter in itself in the Swedish alphabet. Instead it was sorted together with v.

All those examples are trivial - it gets even worse when we leave the unicode Latin-1 block and get code-points above U+00FF ...

So, for a properly working char iterator - treating chars as letters, rather than code-points - it must take the locale into account, and also handle the issues when the user's computer doesn't even contain the letters representing the start and end of the set.

So, all in all, a char iterator is really more tied to the concept of locales, rather than unicode.

Topic		Replies	Views
Mini RFC: Make Range<char> work libs	9	1250	August 25, 2020
`str` method for slicing code-point (i.e. `char`) ranges libs	23	2940	March 25, 2019
More about step_by	15	3515	August 31, 2020
Should &str implement IntoIterator as equivalent to .chars()? libs	12	1187	April 1, 2022
No sane way to generate [1., 1.5, 2., 2.5].iter().cloned()	10	4259	January 24, 2022

Iterating over Range<char>?

Related topics