Range syntax is confusing

Asru · July 15, 2019, 10:20am

Regarding 0..256 as a range of numbers from 0 to 255.

That is an unfortunate choice of syntax. When used in casual discussion it is always wrong and will always need to be qualified or will always be just a little wrong. And it is increasing jargon / specialised use "oh its just how Rust defines ranges". Ouch. Not so fun times ahead if I start teaching Rust.

This forum software is painful: it is not clear which replies go with which comment. Another "ouch" moment. I feel like I'm walking across a grassy field covered in rakes left on the ground ready to slap me in the face any direction I move.

RustyYato · July 15, 2019, 10:25am

Do note that python’s range generator is the same, range(0, 256) is from 0 inclusive to 256 exclusive, same with C#'s ranges as well. So there is precedence in other languages. Rust slso provides the 0..=255 syntax which is included both bounds.

H2CO3 · July 16, 2019, 4:14am

It's not "casual" though. The ellipsis would be, maybe. But the actual Rust syntax, as well as the syntax used throughout Ralf Jung's blog post, is two dots.

Apart from this fact, basically every mainstream language defaults to half-open ranges, because its superior mathematical properties make it much more convenient and less error-prone to use than closed ranges. E.g. the length of begin..end is exactly end - begin as opposed to end - begin + 1 for a closed interval; splitting half-open intervals into two disjoint intervals in the middle is as easy as begin..mid and mid..end as opposed to having to manually add or subtract 1s at the appropriate places. This removes several opportunities for an off-by-one error to sneak in.

Please, let's not repaint the bike shed every few weeks.

Tom-Phinney · July 16, 2019, 4:19am

The motivation, of course, comes from the fact that traditional mathematical ranges are usually [1..N], but C and similar languages use zero-origin indexing. Rather than having to always write [0..N-1], Rust and other languages extend the .. syntax to permit explicit specification of both the inclusive lower bound and an exclusive upper bound. Rust takes this further by permitting elision of either or both bounds if they are the end of the range. Thus a vector slice with subscript range [0..vec.len() - 1] inclusive can be expressed in Rust as vec[0..vec.len()] or vec[..vec.len()] or vec[0..] or simply vec[..].

matklad · July 16, 2019, 7:31am

Anecdata: in Kotlin, 0 .. 10 actually means inclusive range, and that still confuses me from time to time. Additionally, the much more common exclusive range is spelled in a more verbose way, 0 until 10, and that is also annoying.

Note, however, that inclusive ranges are more general, than exclusive ones: 0u8..=255u8 exists, while 0u8..256u8 doesn’t.

gnzlbg · July 16, 2019, 11:21am

This is how Rust defines range syntax, and when I see it used in a Rust forum or a blog post about Rust I know what it does.

When someone uses this syntax in casual discussion and I don't know the context I ask. Chances are they are not talking about Rust, and then the whole point here is moot.

Ouch. Not so fun times ahead if I start teaching Rust.

How is teaching this hard? I've taught this, and I never had to explain anything beyond "a..b is a half-open range, and a..=b is a closed range". There is nothing to understand or debate here - this is just syntax, doesn't even deserve its own slide, at most a footnote the first time you use it. If someone wants to know why, you can point them at the RFCs, but that has never happened to me when teaching this. People just learn it, and move on to more important stuff.

CAD97 · July 16, 2019, 12:18pm

Minor note to those here: this potential confusion point is worsened by many software, including Discourse, as a literal low..high in the markdown post source renders as

low..high

with the full ellipses. And the three dots was used for inclusive range in Rust in patterns, and the Swift syntax is ... for closed, ..< for half open.

Context is important for interpreting range syntax. It doesn't help that English is ambiguous: working 9-5 is half open (you're done at five, not still working through all of five), a budget of $200-$300 is usually closed; I think the pattern is for discrete versus continuous measurement?

There's a reason that range syntax half open and off-by-one errors are such a huge deal in introductory coding classes. You have to choose one way or another, and it's a toss up which people will expect.

Even mathematical syntax of [0, 10) (or is it [0..10[, or...) requires introduction to a reader that hasn't seen it before. If the context is clear, e.g. when talking about a programming language, use the clear range syntax and semantics. In any other case, and maybe even for introductory articles where it might be clear, introduce the range syntax with a footnote or sidenote or such to clarify what it means. There's no perfect solution because everyone brings their own expectations.

RalfJung · July 16, 2019, 8:09pm

They are [1, N] actually. No .. . At least that's what I have seen.

OMG I had no idea. Why would the forum software (or markdown renderer in general) insert a dot?!?

notriddle · July 16, 2019, 9:29pm

It's one of many typographical tweaks that Discourse tries to do:

josh · July 16, 2019, 9:46pm

Please consider turning that off. It’s painful in a forum that regularly discusses code.

(Yes, people can avoid it by writing code blocks in backquotes, but sometimes people forget to do that in running text.)

notriddle · July 16, 2019, 10:58pm

I can’t, but @carols10cents or @erlend_sh can?

josh · July 17, 2019, 2:51am

Range syntax has multiple uses, including slicing and pattern matching.

RalfJung · July 17, 2019, 6:22am

I have to admit I do like the emdash part---as someone who regularly uses those dashes in text.

But turning two dots (..) into a three-dot ellipsis seems like a bad idea indeed in a forum where code is frequently the subject of discussion.

Ranges can also be used as match patterns, which is not (currently) possible with functions.

Tom-Phinney · July 17, 2019, 6:31am

I would have no problem with Markdown replacing three adjacent periods ... with the unicode horizontal ellipsis character … (U+2026), but I have a big problem with Markdown rewriting two adjacent periods .. into either form of horizontal ellipsis.

notriddle · July 17, 2019, 7:49am

You know, we might be the first group who’s actually complained about the .. -> … rewrite. Most people wouldn’t care, even actual programmers since .. isn’t valid syntax in a lot of languages. Not to mention, I’ve complained about --help being turned into –help.

ekuber · July 17, 2019, 4:31pm

FWIW, the most common mistake you could make is caught and receives a suggestion:

error: range endpoint is out of range for `u8`
 --> src/main.rs:2:13
  |
2 |     let x = 0..256u8;
  |             ^^^^^^^^ help: use an inclusive range instead: `0..=255u8`
  |
  = note: #[deny(overflowing_literals)] on by default

CAD97 · July 17, 2019, 11:05pm

Not strictly true. If the type you're matching over has a nontrivial destructor (i.e. isn't Copy) you can't use match guards as that would require moving/consuming the value to test the guard. Example.

dhm · July 18, 2019, 11:20am

This issue will soon be 40 years old

E.W. Dijkstra: Why numbering should start at zero (1982):

To denote the subsequence of natural numbers 2, 3, ..., 12 without the pernicious three dots, four conventions are open to us

a) 2 ≤ i < 13

b) 1 < i ≤ 12

c) 2 ≤ i ≤ 12

d) 1 < i < 13

Are there reasons to prefer one convention to the other? Yes, there are. The observation that conventions a) and b) have the advantage that the difference between the bounds as mentioned equals the length of the subsequence is valid. So is the observation that, as a consequence, in either convention two subsequences are adjacent means that the upper bound of the one equals the lower bound of the other. Valid as these observations are, they don't enable us to choose between a) and b); so let us start afresh.

There is a smallest natural number. Exclusion of the lower bound —as in b) and d)— forces for a subsequence starting at the smallest natural number the lower bound as mentioned into the realm of the unnatural numbers. That is ugly, so for the lower bound we prefer the ≤ as in a) and c). Consider now the subsequences starting at the smallest natural number: inclusion of the upper bound would then force the latter to be unnatural by the time the sequence has shrunk to the empty one. That is ugly, so for the upper bound we prefer < as in a) and d). We conclude that convention a) is to be preferred.

Remark The programming language Mesa, developed at Xerox PARC, has special notations for intervals of integers in all four conventions. Extensive experience with Mesa has shown that the use of the other three conventions has been a constant source of clumsiness and mistakes, and on account of that experience Mesa programmers are now strongly advised not to use the latter three available features. I mention this experimental evidence —for what it is worth— because some people feel uncomfortable with conclusions that have not been confirmed in practice. (End of Remark.)

When dealing with a sequence of length N , the elements of which we wish to distinguish by subscript, the next vexing question is what subscript value to assign to its starting element. Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N +1; starting with 0, however, gives the nicer range 0 ≤ i < N . So let us let our ordinals start at zero: an element's ordinal (subscript) equals the number of elements preceding it in the sequence. And the moral of the story is that we had better regard —after all those centuries!— zero as a most natural number.

Remark Many programming languages have been designed without due attention to this detail. In FORTRAN subscripts always start at 1; in ALGOL 60 and in PASCAL, convention c) has been adopted; the more recent SASL has fallen back on the FORTRAN convention: a sequence in SASL is at the same time a function on the positive integers. Pity! (End of Remark.)

The above has been triggered by a recent incident, when, in an emotional outburst, one of my mathematical colleagues at the University —not a computing scientist— accused a number of younger computing scientists of "pedantry" because —as they do by habit— they started numbering at zero. He took consciously adopting the most sensible convention as a provocation. (Also the "End of ..." convention is viewed of as provocative; but the convention is useful: I know of a student who almost failed at an examination by the tacit assumption that the questions ended at the bottom of the first page.) I think Antony Jay is right when he states: "In corporate religions as in others, the heretic must be cast out not because of the probability that he is wrong but because of the possibility that he is right."

Plataanstraat 5 5671 AL NUENEN The Netherlands 11 August 1982 prof.dr. Edsger W. Dijkstra Burroughs Research Fellow

Transcriber: Kevin Hely. Last revised on Fri, 2 May 2008.

carols10cents · July 18, 2019, 11:57pm

blue button smashing meme

(done, here and on URLO. I really really dislike those kinds of corrections!!!)

josh · July 19, 2019, 12:31am

Thanks, Carol!

Topic		Replies	Views
Idea: In the next edition, stop accepting `0.` as a valid float literal	32	3854	January 15, 2020
Inclusive ranges with RangeFrom (making 0.. work like 0..=255 for u8) language design	6	1650	March 25, 2019
Pre-RFC: Extended array literal syntax language design	5	893	September 21, 2019
Which of the two syntaxes for range patterns is preferred? (Solved, use ..=) language design	5	733	February 26, 2019
Idea : Use Math syntax for excluding/including end value of range language design	10	1066	October 17, 2021

Range syntax is confusing

Related topics