What's the status of std::LinkedList? (maybe deprecate in Rust 2018?)

I don't think one even needs that. A simple vector of pointers will work, e.g. Vec<Box<T>> won't move the Ts on growth.

The second is not generally true: to insert into the middle, you’ll need to traverse the list until the middle, which takes linear time, see this talk by Bjarn Stroustrup for some benchmarks.

This is correct. Iff you need to find the list node in which you want to insert/erase/splice, then these operations are O(N) and not O(1) because of the find.

you are using a cursor API, edit and traverse list in changing directions at the same time then yes, it could be faster. But this is definitely an extremely niche use-case.

Iterators to linked-list nodes are not invalidated on insertion/splicing. They are only invalidated on erasure, and only those iterators that point to the erased nodes. Every reasonable doubly-linked list API exposes these invariants, and every reasonable program using linked lists relies on these invariants by keeping iterators to relevant nodes around. I cannot honestly believe that someone that knows enough to dump a vector in favor of a linked list will perform many insertions into the linked list middle by traversing the linked list from the front on every insertion [*].

The appropriate way of doing that is finding the middle once, which is O(N), and then inserting elements while updating the pointer, which is O(1) per element (or even better, O(1) independently of the number of elements being inserted if you are splicing from another list). Both operations are much faster than inserting into the middle of a non-tiny vector, and ridiculously faster as the number of elements in the sequence becomes large (one memory allocation and a couple of pointer swaps vs a memcpy at best, and N/2 individual calls to clone at worst).

So I disagree with your statement, that is not an extremely niche use-case, that's the only reasonable way of working with linked lists.


[*] FWIW Bjarne agrees. There was a Q&A session afterwards where he admitted that the benchmark intent was to prove the point that memcpying half of the vector was faster than traversing a linked list up to its middle or somewhere else (the elements where randomly chosen I think?), and that in no way was this the proper way of working with a doubly-linked list, which was also kind of its point.

If you want to insert n elements into a middle of the vector, you can do it in linear time as well, by, for example, using and iterator API, roughly like xs.take(n).chain(new_elements).chain(xs.drop(n)).collect(). Each operation that processes elements left to write and amortizes to O(1) with linked list will also amortize to O(1) with Vec. Not sure which one will be more performant, but at least the number of allocations in Vec’s case will be smaller.

To be asymptotically faster then vec, you’ll need a sequence of operations like “move ± n elements from the current position and do and op”, otherwise, if you process list in a single direction, you can do the same with vec by constructing a new one alongside.

EDIT: to clarify, I am not trying to say that using cursors with linked list is niche, my statement is that the problems which have better big O complexity when solved with linked lists are rare. In particular, just traversing a list in a single direction adding and removing elements is not such a problem :slight_smile:

Maybe it is subtle but AFAICT this operation is O(N + M) where N is the number of elements in the sequence and M the number of elements that are inserted.

Let's make the problem more concrete: Given a sequence with O(N) elements, insert M elements from a Stream [*] into the sequence in O(M).

If you use vectors, and insert the elements one by one, you get O(N * M) complexity, because for each element from the Stream that you insert, you have to shift N vector elements. Here it is much better to do what your iterator code is doing: insert M elements into a separate vector which is amortized O(1) per element, and then you insert that vector into the original one once the Stream has been exhausted. This results in O(N + M) complexity and requires O(M) extra storage.

With a doubly-linked list, you have to find the middle once, which is O(N), but then all insertions are O(1). So you get O(N+M) complexity. However, depending on how you see things, finding the middle is amortized over the M insertions, so you could also argue that inserting is amortized O(1), and this takes O(M) time for a list. Also, if you already have a pointer to the middle stored somewhere, this is unarguably an O(M) operation. So for a list, it makes a bit of a difference whether you have a pointer to the insertion point lying around or not.

Using the Stream here is not necessary. If you have the elements in a vector already, inserting into a vector is still O(N + M), but again inserting into a list can be O(N+M) or O(M) depending on whether you have a pointer to the middle around or not. What the Stream shows is that with a list you don't need extra storage. You also don't need multiple passes to implement this operation, which can be important, for example, if concurrency is involved (although in that case a "normal" list won't work).

Also, things change even further if instead of having the elements to insert in a vector, you already have them in a list, because then inserting into a vector is still O(N + M) but inserting them into another list is O(N), or if you have a pointer to the insertion point lying around: O(1).

Or in other words, if you can a priori have a pointer to the insertion point and reuse it, inserting into a vector is always O(N + M), but inserting into a list is either O(M) or O(1). This can and is much faster than inserting into a vector, particularly if N is large and M is small. OTOH if you don't have a pointer to the insertion point, then this becomes O(N + M) and O(N) for the list, in which case the fastest solution will depend on the "list traversal cost" vs the "vector memcpy cost".

The take away here isn't that list can be faster than vector, or that vector is often/generally faster than list, but that for lists having a pointer to the insertion/splicing/erasing point makes a huge difference performance wise. If a list API does not support this, then you are better off just using a vector because this is the main advantage of lists.

[*] That is, the number of elements to be inserted is a priori unknown, one only knows M once the Stream has been exhausted.

Yep, I should have stated explicitly that linear applies to both input parameters.

With a doubly-linked list, you have to find the middle once, which is O(N), but then all insertions are O(1) (or amortized O(1) if you count the find + insertions together). So you get O(M) complexity with O(1) storage.

Hm, that's still O(N + M), N for finding, M for insertions.

The situation for allocation is tricky to analyze precisely. If you are willing to give up on memcpy you can append M elements to the end and then rearrange vectors in place in linear time (but given that vectors often have handy spare capacity, you might be able to place some nice trick with it).

So I still maintain that complexity wise both data structures are the same for this particular problem :slight_smile: I won't make any claims regarding the actual wall-clock time performance :slight_smile:

Oh sorry, I posted too early and was still editing the post.

Hm, that’s still O(N + M), N for finding, M for insertions.

Yeah, I think this depends on how you want to see it. Strictly speaking this is O(N + M), but I also do find reasonable the argument that inserting into the list is amortized O(1) if you have to find the insertion point (although maybe this is wrong? I can't recall exactly when we are allowed to say that an operation amortizes over another).

So I still maintain that complexity wise both data structures are the same for this particular problem

They are the same as long as you don't have a pointer to the middle available. If I give you a pointer to the middle, the vector solution is still O(N + M) but the one using a list is O(M) or O(1), depending on the form in which I provide the elements to insert. That can make a big difference if N is much larger than M (these are all orders of magnitude anyways).

Complexity aside, a nice intrusive linked list implementation provides more functionality than something like Vec gives you out of the box. Instead of needing a separate iterator type, you can remove an item from a list with just a pointer to it, a pattern that fits naturally with an object-oriented style:

impl PendingRequest {
    fn remove_from_queue(&mut self) {
        self.link.remove();
    }
}

Similarly, you can use a pointer as an insertion point, although in my experience that’s less often needed.

It’s not actually difficult to implement this on top of Vec or Vec-of-Vec: you just need to have each element store its own index, and adjust the indices whenever you move elements around. (This might add overhead but won’t change asymptotic complexity.) But I’ve never seen a library that does this automatically, and doing it manually is a pain. In higher level languages I’ve often just used a hash set for this purpose, which is a bit wasteful but similarly allows constructs like set.remove(object).

Intrusive linked lists also never allocate memory; if you’re in an environment where allocation can fail, having one less error case to think about is often much more important than any small difference in runtime performance (one way or the other). They also work if you have no allocator at all.

Finally, intrusive lists are less complex to implement by hand than something like Vec-of-Vec, which can matter if you’re worried about code size, or if (like me) you just feel like writing everything from scratch :slight_smile:

As for performance:

  • If you are doing anything O(N) with linked lists (other than iterating over the list), you’re doing it wrong.
  • If N is large, an intrusive linked list will probably be slower than a Vec-based approach (but not asymptotically slower) due to cache effects. But if N is small, it’s probably faster – for example, accessing the first element is a single indirection (list.first) rather than two, and you save the cost of allocating/deallocating the elements array.

All in all, from a language-agnostic perspective, I think there are compelling use cases for intrusive linked lists. They’re less compelling in Rust where they interact poorly with the borrow checker (for now), but they still have their uses.

However, I’ve been referring exclusively to intrusive lists. Non-intrusive lists have almost none of the above benefits. They don’t support removing by pointer; they rely on an allocator; and if the element type needs to be a pointer (e.g. because you’re using Rc), then they have more indirections and more allocations than you’d get with Vec. (If you don’t have experience with intrusive lists, I can understand why linked lists as a whole seem ‘obviously’ useless!)

So, in my opinion, by all means deprecate std::LinkedList. But… maybe hold out hope that someday we’ll find a way to make the borrow checker expressive enough to handle more kinds of data structures safely, including intrusive linked lists.

4 Likes

In hindsight, do you feel this worked better than a BTreeSet<Box<_>> would've?

The simple implementation would be a Vec<Box<_>>, since the items are stored in LRU order, rather than sorted order.

For performance reasons, uluru stores all its data inline in an array, without any heap allocations or pointer indirection. (It’s used in very hot code in Firefox and Servo.) This has the side benefit that uluru does not depend on std.

Here is a query which shows 186 uses of LinkedList (it has some false positives, but most of the hits use std::collection::LinkedList):

https://sourcegraph.com/search?q=repogroup:crates+case:yes+max:10000+LinkedList

I think it makes sense to either deprecate it or replace current implementation with O(1) implementation (if it’s possible), even if for most of the current uses it will have worser performance.

Rayon implements its traits for LinkedList, along with all of the other std collections, so I wouldn’t really count that as a “use”. But we do use it for real in the generic collect and extend implementations, where it’s helpful to have very fast concatenation of two lists for parallel reduction. We have benchmarks of different approaches in rayon-demo, and you can see the last time I measured it here:

4 Likes

Late to the party. Most of the cases in my career that used linked lists also had some other data structure attached to the objects. Would this be a case for pinned references then?

I’ve wanted linked lists in Rust but never std::collections::LinkedList. Instead, I’ve wanted a linked list trait so that I could roll my own index-based linked list inside some arrangement of other data structures, so vaguely a linked list in custom arenas.

1 Like

As many pointed out already, a linked list like std::LinkedList or std::list is almost never the right choice. This is why either of them is very rarely used. What is often a good choice is an intrusive linked list. There are many good points above on when an intrusive linked list is a good choice. There is a very important one missing: when an ordered collection of trait objects is needed (or more generically, a polymorphic ordered collection), an intrusive doubly linked list is quite often the right choice. And since neither std::LinkedList or std::list support polymorphic T’s we get to roll our own.

1 Like

+1 to deprecating it. With fire.

6 Likes

To me, it shouldn’t be deprecated without first creating a List trait and/or impl that is better/actually meets the needs mentioned (like polymorphic T’s). It is already recommended in the docs that it shouldn’t be used as there are better options.

1 Like

Currently, none of the existing implementations in crates.io is good enough for all usecases. It is unclear to me how they could be, since trade-offs like intrusive/non-intrusive, skip lists vs singly vs doubly linked, arena backed, w/o cursors, etc. appear irreconcilable. I am also extremely skeptical that coming up with a List trait that abstracts over all of those while remaining useful is even possible.

Basically, when you want a linked list, you want the linked list that's good for your problem, which is different from the linked lists that are good for all other problems. That's just the opposite of "genericity".

If the current std::LinkedList is good enough for you, you can still use it after deprecation just fine, or use any of the crates in crates.io, or use VecDeque if its good enough, etc.

1 Like

Right, I understand that. I’m just worried about the perception of the non-Rust community and those considering adopting Rust. Does seeing things deprecated from the stdlib without a replacement create a perception of instability? I realize that the justification for deprecating it is right, but, it would be good to have “something” (if that is possible) before deprecating. If the consensus that there can no useful “something” in its place, then, that is fine.

Rust cannot remove anything from the stdlib, not even across editions. That is, you will be able to use std::LinkedList as is forever and ever.

What deprecated means is that nobody has to maintain it, update it with new std APIs, curate the docs, review PRs doing all that, etc. as long as it keeps compiling and the tests keep passing.

It also means that if somebody someday decides to put in the work and make the collection useful we can un-deprecate it.

If you want a better linked list crate, targetting the same types of use cases as std::LinkedList, then the one in the contain-rs organization: linked_list - Rust is probably it.

3 Likes

I am developing a general purpose tree library and seeking ideas for it. Would you please describe something about what a useful cursor API could look like?

Have a look at the cursor API in intrusive-collections.