Iterating over Range<char>?

@birkenfeld: Today I found myself, for advent-related reasons, wanting to iterate over 'a'..='z' . Turns out this isn’t possible since char doesn’t implement Step

@CAD97: I’d be happy to see Range<char> “just work”, though.

@withoutboats: I think iterating over a range of chars should be possible, and chars should just skip the gap if a range would iterate past it. chars have a defined order and a defined set of valid values, so iterating through a set of consecutive values should be possible (and it shouldn’t fail).

Iterating over chars, in terms of letters, are completely different from iterating over chars in terms of unicode code-points. Say, you want to iterate over 'a'..='z', how many chars will that be?

The naïve response is 26, but that is if you iterate over the English alphabet. However, if the program is written on an English computer, but run on a non-English computer, the result would be different: For example, the Icelandic alphabet doesn't even contain z anymore, so it would imply an undefined behaviour. And even if the alphabet did contain z (which it actually have had) , the Icelandic alphabet contains 32 letters: 'a'..='ö'

The letter ö, by the way, has code-point U+00F6, quite far from z. Not to mention the fact that the letters between a and d, are á (code-point U+00E1), and b, but no c. Another interesting thing, is that prior to 2006, w was not a letter in itself in the Swedish alphabet. Instead it was sorted together with v.

All those examples are trivial - it gets even worse when we leave the unicode Latin-1 block and get code-points above U+00FF ...

So, for a properly working char iterator - treating chars as letters, rather than code-points - it must take the locale into account, and also handle the issues when the user's computer doesn't even contain the letters representing the start and end of the set.

So, all in all, a char iterator is really more tied to the concept of locales, rather than unicode.

2 Likes