Fix inconsistent naming before 1.0

For motivation, watch Youtube: Scott Meyers – The Most Important Design Guideline

We don’t want to annoy millions of people for the next 30 years, so let us go through the standard library and eradicate any naming inconsistencies before 1.0.

“The only thing better than two naming conventions is one naming convention.”

I’m not too familiar with the standard library, but here are my two cents:

1. "The number of items in some set or sequence of items" Length, abbreviated len, is widely used in the meaning of “number of items in some collection of items”. This is not the case with std::iter::Iterator though. Iterator trait has a method size_hint which hints at the possible “number of items in the collection of items which the iterator iterates over”. The trait method size_hint should be renamed to either length_hint or len_hint (I don’t understand the logic in Rust abbreviations, so I can’t argue whether “length” should be abbreviated in this case or not).

2. "The first or last item in some sequence of items" The std::slice::SliceExt trait has methods named first and last which refer to the first and last items respectively. Whereas std::collections::LinkedList and std::collections::VecDeque (and possibly others?) have methods front and back to refer to the first and last items in those collections. These two competing conventions are both fine, but we should pick just one of them. Somewhat unrelated, std::iter::IteratorExt trait also has a method named last but it has O(n) complexity. I think that’s pretty dangerous for performance, because the common assumption is that methods named like last or back are O(1). Thus, std::iter::IteratorExt::last should probably be renamed to something like skip_to_last.

1 Like

Honestly i never really thought about size_hint; changing it to len_hint makes sense to me.

front/back vs first/last is trickier.

front/back is industry standard terminology for deques. push_first/push_last looks pretty weird to me. Not insane, just weird. However the front/back metaphor was considered a bit more tenuous in other cases like arrays and stacks, which have more of a numbered or top/bottom metaphor. We do model double-ended-iterators as deques (next_back), but it's asymmetric with normal iterators (next) because almost no one actually uses next_back -- it's largely just for rev. So talking about the front or back of an iterator can be a bit weird (just like talking about the front or back of an array).

We used to have some more exotic terminology-unification around collections (you pushed and poped a Map), but that was dropped.

Somewhat unrelated, std::iter::IteratorExt trait also has a method named last but it has O(n) complexity. I think that's pretty dangerous for performance, because the common assumption is that methods named like last or back are O(1).

Disagree. Anyone coming from a functional language knows all-too-well that getting the last element of a sequence can be very expensive. Anyone otherwise familiar with singly linked lists should as well. first and last also strike me as quite niche pieces of iterator functionality, and therefore lower priority.

Regardless with specialization (eventually) we can make these operations legit O(1) for almost all the iterators you'll work with in practice.

Oof, hour long video for motivation. I haven't watched this particular Scott Meyers video, but from what I've seen from him before he's more concerned with serious pitfalls and a swamp of special cases. I'm not sure different naming conventions for different APIs is necessarily the kind of thing he's all that concerned about.

3 Likes

The front and back methods are provided by all C++ standard library containers which can implement them at constant time, like std::array or std::list. Thus, they don't look weird to C++ programmers at all.

Anyone coming from a functional language also knows that indexing into a sequence or a container may be a linear-time operation. My impression is that many C++ minded people consider it to be an api design flaw to provide linear-time indexing into a container (because it makes it harder to reason about the time complexity of the code you read). C++ std library considers container front and back methods in the same vein as indexing, in that they should happen at constant time (EDIT: although some indexing is logarithmic time). For example, the C++ singly-linked list std::forward_list doesn't provide back method at all, because it would have to be O(n).

Like Scott Meyers says in the video I posted, people get used to mistakes in interfaces and stop noticing them (just like with shutting down your computer by first clicking the Start-button). I started this thread to encourage people to pay attention to naming (which is a big part of api's) in the standard library.

1 Like

3. "Give me the next item in some sequence of items and remove the said item from the sequence" What is the first intuition you get from seeing a method named next which returns some value? My intuition is that the method returns the next item in some ordered list or sequence of items. The name of the method next doesn’t give me the slightest hint of any kind of modification or mutation. I wouldn’t expect a method named next to remove the returned next item from the underlying sequence of items. Therefore, std::iter::Iterator::next and std::iter::DoubleEndedIterator::next_back methods should be renamed to better convey what those methods actually do.

Here are some names better suited for the next method: pop_next pop_front pop_first

Here are some names better suited for the next_back method: pop_next_back pop_back pop_last

My vote goes for the method names pop_front and pop_back. Also, if it didn’t become clear from the previous post, I also vote for the std::slice::SliceExt::first and std::slice::SliceExt::last methods to be renamed to front and back respectively.

That one’s definitely a non-starter. next and next_back do not remove elements from storage. They yield the next thing in a sequence. It might be a reference, it might be a value. Neither requires mutation. In the case of into_iter and drain, creating the iterator has semantically removed all the elements from the collection already (and there’s no way to observe otherwise without unsafe code). It is just lazily deferred until you actually request the element.

If you call into_iter or drain that clearly signals mutation. If you hand that off to someone who doesn’t know about where it came from why should they care?

1 Like

I never mentioned the storage of the elements/items in the sequence. You misunderstood me. Here's what I'm talking about:

Iterator represents something that refers to a sequence of items. The iterator may or may not own the storage of those items. In any case, the iterator has some internal representation of the sequence of items which it refers to. When you call the iterator's next method, you mutate the iterator's internal representation of that sequence of items which the iterator refers to. That's the mutation I'm talking about which is not indicated in any way by the name of the method next.

I admit that given that "pop" is a word commonly used in names of methods which mutate containers, like pop_front and pop_back, maybe it should be avoided with iterators. But the name should be something similar given the similarity of those operations. Perhaps exhaust_front and exhaust_back for Iterator and DoubleEndedIterator respectively.

To make myself more understandable, I'll mention that I use the words "set" and "sequence" in the mathematical sense of those words. Set is a collection of items whose order in the collection is not specified. Sequence is a collection of items whose order in the collection is specified.

To a newcomer the following buggy function looks totally reasonable because the word next does not at all hint at the fact that the method actually mutates the iterator (and the newcomer is going to be surprised to find out that the function doesn’t print the first value of the sequence). The argument that “the newcomer is going to learn that the next method mutates the iterator and then it won’t matter anymore” is not an excuse. Just because we get used to mistakes in interfaces quite quickly, it doesn’t mean that those mistakes don’t matter. Potentially a huge number of newcomers are going to get (not pleasantly) surprised by this during the next 30 years.

fn print_all_if_first_item_is_ten<I>(mut items: I)
    where I: Iterator<Item = i32>
{
    if let Some(value) = items.next() {
        if value == 10 {
            for n in items {
                print!("{} ", n);
            }
        }
    }
}

// ... and then this prints: 11 12 13
print_all_if_first_item_is_ten([10, 11, 12, 13].iter().map(|a| *a));

It's worth noting that this is a convention in literally all iterator implementations in all languages I am aware of. Each call to next mutates the iterator (or returns a new iterator).

See:

A particularly enlightening quote from the Scala docs:

Note that even hasNext may cause mutation

The principle of least surprise does not apply in a vacuum; breaking from conventions here would need to be supremely well-motivated. Mixing next and looping is also something I would consider niche. Either you need full control, or you just want to pass the thing to something that knows how to deal with it.

2 Likes

No, they don't.

They are objects that give you elements of the iterand, one at a time. For that, next is a sensible name. It also is precedent in many languages as Gankro already noted.

You can also consider them as pointers to the sequence like C++. Then you'd expect separate operation to produce next item and not move, but next as a name for operation that produces the pointed item and moves the pointer still makes sense.

I think that multi-pass iterators should additionally provide method get() that returns the current element but does not move the iterator. Single-pass iterators (input iterators, function-based generators etc.), shouldn't, because it would require them to cache the value and that complicates things even though most of the time it's not needed (C++ does it, it does complicate things).

I don't know why you listed the C++ free function std::next, because it's the one in your language list which doesn't support your case since std::next specifically does not advance the iterator which you pass to it as the argument. Another exception to your list would be the D programming language, which uses iterators that are quite similar to Rust's. Except that D's iterators (which are called range, and are just a concept, not a specific type nor an interface) they have a method front which returns the next element without advancing the iterator.

I wonder if the designers of all those other languages in your list actually thought that next is a good name which perfectly conveys the method's functionality. Or did they simply follow the convention that had been laid before them by others (I wonder which language was the first to use the name next for that meaning).

I don't think that the quote means that hasNext may actually advance the iterator. That would be silly (reminds me of the South Park skit: "you have $100 in your account... and now it's gone").

The people who won't be surprised by a next method are those that already know one of those languages you listed (excluding C++). People who don't know one of those languages are quite likely going to be surprised by it. It's not clear which one of those groups represents the majority when it comes to newcomers to Rust.

The motivation to break from the convention, which some languages (not all) have set, is that the convention is not good enough. Why would there need to be an extremely strong motivation to break this convention? People coming from those languages you listed are not going to be surprised to find a method named exhaust_front or similar which does the same thing as the next method in their native language. And people who don't know those languages are not going to be surprised either.

Just because it's rarely used doesn't mean that its design doesn't matter.

I don't understand why you disagreed with my definition of Iterator, but I agree with your's (although I object the word "give" since iterators may either give you the ownership of the element or provide you only mutable/shared access to the element). In my definition I used the wording "refers to" as in the meaning of "references" meaning points to the sequence of elements.

Don't you think that the name next or front would be better suited for this hypothetical get method, and that exhaust_next or exhaust_front would be better suited for the current next method in Rust?

You’re totally right; I misremembered the C++ iterator API. Although now I’m remembering why the C++ api is utterly awful to use.

The D (and C++14) range metaphor is unfortunately very incompatible with Rust’s ownership and aliasing rules. Calling next must irreversible “consume” an element because iterators can work with mutable references. If you could do the following:

let it = arr.iter_mut();
let x = it.next();
let y = it.next(); // same as x

Then you’ve violated one of the major rules of safe Rust.

Regardless, Rust already hints to you that next changes things. Calling next requires the iterator be made mutable while passing it to a function or loop doesn’t, it moves it. If you made the iterator binding mutable when it doesn’t need to be you will in fact get a warning. So most people will be used to immutable iterators. If they decide to use next manually they get alerted that it’s a mutating operation.

1 Like

No, I don't. Because next implies “provide me the next element after the one you gave me previously”. Because it's not a sequence. It's a source of elements and I want a new, fresh element from the source. A next element.

Oops, this is not a precedent. This is the odd man out. In two ways. It has a method MoveNext() -> bool and a property Current. At the beginning, MoveNext() must be called to find whether there is an element and then Current returns it. But it is odd man out. It's unusual definition.

And nor is this. C++ iterator has operator++ and operator*. It is tested whether it has a value by comparing it with end iterator and then * gives the value and ++ moves to the next value. No methods here at all, under any names.

The fact they don't know their end is a disadvantage, but I like how they are modelled after pointers (in C++ a dumb pointer is a valid iterator to a dumb array) as it makes them easy to use as indices. The definition does complicate single-pass iterators though.

This example is disingenuous. Any user attempting to construct an iterator manually as per the typical let foo = bar.iter(); will be instantly stopped in their tracks by the compiler telling them that the variable foo must be made mutable. In this example, you’ve merely done the same by putting the mut up in the function signature. The fact that iterators are inherently mutable is not something that will go idly unnoticed.

As to next, none of the proposed “pop”-based names do any better at describing this method’s behavior. Given the very high bar for breaking changes at this point and given the fact that precedence in other languages exists, I see no chance that this will change.

I could get behind renaming size_hint and first/last, if Gankro deems it worth it.

Yes I know this, and I don't know why I even brought up D. Somehow I thought it was an example of a language that had broken the convention of using the name next for the method which advances the iterator (D's ranges have a popFront method for that and it doesn't return the popped element). But since D's ranges are not isomorphic to the kind of ranges/iterators we're talking about here, it's was not a good point to even try to make.

By the way, I don't think C++14 has the new range metaphor yet (the old C++ standard's range metaphor means simply that two Iterators represent a range) nor do I believe that the new range metaphor for C++ (if it ever gets one) is going to be anything like the D's range concept.

This is all good. But it still doesn't change the fact that the method next could have a better name.

4. "Remove that item from that container and give it to me" To pluck an item from a container’s front or back, you use methods named using the word “pop”. To pluck an item from any location of the container you use methods named using the word “remove”. This is not consistent naming. For example: VecDeque has methods pop_front and pop_back for plucking the element from either end, whereas to pluck an item from anywhere in the container, you use methods remove, swap_front_remove and swap_back_remove. There’s similar kind of naming in other containers as well. To make the naming consistent, I suggest we replace the use of “pop” in these cases with “remove”. So VecDeque would have remove_front and remove_back for example.

Another thing, kind of related, kind of isn’t: Vec has a method called pop for removing from the back. I think it would make the code easier to read and understand if it was named pop_back instead (or rather remove_back given the earlier changes).

How about exhaust_front and exhaust_back (for exchange of next and next_back), do you think those names would better describe those methods' behaviour? (Regardless of whether you think any name change for next is going to happen or not)

I disagree completely. As Gankro said, iterators in almost every language (not C++) work the way they do in Rust, and many languages use the precise same names that Rust uses. Iterators in C++ on the other hand are (imo) more like cursors -- they are the generalization of pointers into an array, basically. So if you come with the C++ expectation of what an Iterator is, then the semantics of next might be surprising. But it's really C++ that is the odd one out here.

6 Likes

@Gankra @nikomatsakis,

Would renaming size_hint to len_hint be an acceptable change now that 1,0 beta is out in the wild?

From the 1.0 beta release anouncement:

We don’t plan on making functional changes to stable content, though naturally we may make minor corrections or additions to the library APIs if shortcomings or problems are uncovered (but the bar for such changes is relatively high).

IMHO this is not a "functional change" but a minor naming correction, but I am not sure if the bar is too high for this change or not.

As an alternative, I think we may deprecate but retain size_hint during the 1.x cycle. It would be weird to have a deprecated method in 1.0 stable though.

Filed the size_hint -> len_hint change as RFC 1032, though I personally prefer removing size_hint in 1.0 final, the “deprecate but retain” alternative is the main candidate in the RFC.

EDIT: There was some issue with git or GitHub that my new commits updating RFC 1032 were not showing up in the PR, so I closed RFC 1032 and restarted with RFC 1034. Note the “Rendered View” link in RFC 1032 still works correctly.