std library has a
LinkedList data-structure implementing a doubly-linked list.
Is anybody using this for anything?
AFAICT, it does not support O(1) splice, O(1) node erasure, O(1) node insertion, … It appears to only support O(1) append/erase on the lists ends, which kind of makes it a collection with all the disadvantages of a doubly-linked list, but none of its advantages.
The current API actually does confuse beginners about what is possible in Rust (e.g.
split_off being O(N) but not having any O(1) split method). Is anybody interested into bumping its API up to make it useful?
There appear to be better implementations on crates.io that provide all/most of the missing algorithms on doubly-linked lists (e.g http://contain-rs.github.io/linked-list/linked_list/) So maybe maybe we should consider deprecating this for Rust2018, and point those who want to use a doubly-linked list to crates.io . In its current state, its probably not something that should be used anyways. I don’t know if there is a policy about un-deprecating something, but if somebody in the future is interested on picking up the torch and making
LinkedList useful, I guess we could un-deprecated it.
Current implementation aside, even as a proponent of a small
std, not having a linked list implementation seems a little minimalist by any standards. I’d much rather see an implementation like
linked-list moved into
rust-lang and then replace the current implementation.
I’d much rather see an implementation like linked-list moved into rust-lang and then replace the current implementation.
We just need to find a hero that cares enough about
LinkedList to actually submit a PR with the implementation, write an RFC, and push it through the process.
Given that there is already an alternative on crates.io, this does look like very low priority to me. But still, we should somehow at least hint newcomers/beginners that if they need a
LinkedList implementation, they should very probably take a look in crates.io instead of trying to use the one in standard, which is far from being in a finished/polished state.
Could you give a rationale as to why it is useful to have LinkedList in std? I personally find the presence of linked lists in standard libraries actively harmful, due to two factors:
- Often people use it as their default container type, presumably because linked lists are the primary classroom data structure.
- For almost any practical purpose, a
VecDeque is a vastly superior choice of a data structure.
If you really need a linked list, you are definitely out of the common case, served by the standard library.
Keep in mind that the documentation does state:
Almost always it is better to use Vec or VecDeque instead of LinkedList. In general, array-based containers are faster, more memory efficient and make better use of CPU cache.
So your point 1. is at least addressed in documentation.
I agree with @matklad that doubly-linked lists are most of the time not what you want.
This premise begs the questions: When do you actually want a doubly linked list? And does that happen enough for them to be in
The only case in which I would personally consider a doubly-linked list over similar data-structures (singly-linked lists, double-ended queues, …) is when I need O(1) splice as an algorithmic building block (
VecDeque and singly-linked lists have O(n+m) splice). Are there other cases you can think off?
This is an extremely niche requirement, and I don’t personally think it is enough for putting a collection in standard. Even if I would have this requirement for a problem, a
VecDeque is probably much better for small
Copy types and not-unreasonably large lengths because
memcpy is way too fast`.
Also, the current
std does not help me here, because it can only do
O(N) splice. I don’t even know if the doubly-linked lists in crates.io would help me here, because the ones based on
Cursors do not let you have a reference to multiple nodes of a list, while you are splicing into it (even though splicing does not invalidate references to other nodes). So depending on what the algorithm requirements are, one might well be completely out-of-luck here.
I think that solving all these problems is really interesting, but it appears to be also in a very research-y state. Whether
LinkedList can be upgraded in a backwards compatible way to solve these issues remains to be shown, but if that can be done we can always un-deprecate it later. We can’t ever remove
LinkedList from the
std library anyways, but in its current state, it appears to be a completely useless collection to me.
I think that we all concur that linked lists are almost never the right data structure for anything. So, if anyone really wants to use one, they have a uncommon problem and know what they are doing, so going to crates.io is not really a burden.
I also would argue that most use cases that require the performance characteristics of a linked list would be better served by a slab that holds the nodes plus more memory efficient u32 handles for the links or similar solution.
So, I’m all for deprecating the std linked list. Otherwise we are in danger that someone tries to use it…
It might be a good idea to gather some data here. E.g. send a PR to rust-lang/rust deprecating it, do a crater run, and see who is using it. I actually would be very interested in seeing how others are using it in the wild and for what purposes.
Is there any value in replacing the current implementation (which iirc just is a tangle of
Boxes) with some kind of arena-based thing? Mostly just to mitigate the whole “RIP cache locality” thing that makes vectors so much better. I’ve definitely seen examples of other graph structures whose pointers are pointers into an internal arena.
I think such that is often the better solution, but also that such a data structure doesn't belongs into std. It's just to specialised... if i ever need one, i probably have other constrains that makes it unlikely that the provided std arena list is the right fit anyway.
In this case i'd argue for more building blocks for arena based or GCed data structures in std, but i don't know how those would look like...
++ I think this is the right answer. There's been rumblings of adding
Gc<T> back to Rust since forever... maybe it's worth making an RFC for a general
Gc<T, Col: GarbageCollector> interface.
It's funny that with all my distaste for linked lists, I had to implement one not so long ago I've used it in a parse tree implementation, where cursor API makes a lot of sense. But it was a hand-rolled intrusive arena-based implementation, not something std could had helped with.
I also implemented a doubly-linked list recently, and like @matklad’s it uses an arena and couldn’t use the libstd type. In this case, the main requirement was an ordered sequence that can be re-ordered without moving the elements themselves (because they can be very large).
I think @pcwalton had an intrusive linked list crate somewhere, but I don’t know if it is arena based.
I think @Manishearth did most of the work in that space. Maybe he has some insights by what is was blocked.
While I think LinkedLists are a fundamental data structure worthy of inclusion in a good standard library, a decent implementation provides too many footguns, and the Rust stdlib shies away from such. The C++ stdlib by contrast is happy to provide structures that can easily be misused. Unless the Rust team is okay with a structure that has multiple different paths unsound behavior, there’s no way to improve
LinkedLists in general require more trust than can reasonably be assumed. The reason
std::collections::LinkedList is mediocre is because it doesn’t actually expose the internal node representation. Thus, you don’t have a way to do O(1) insert-adjacent or extract-from operations. However, exposing that internal representation is also very problematic. Any modifications must be made through the LinkedList itself, because they may alter the front/back ptrs. Thus LinkedList may have methods like
LinkedList<T>::insert_after(&LinkedNode, T) but now you have to assume the given ptr actually corresponds to this particular linked list. You could
debug_assert!() that it is actually contained in this list in O(n) time or also maintain a hashset of node ptrs or store back pointers in each node, but the best API would just resort to undefined behavior here.
Again, a LinkedList is a worthy data structure in its own right. Not moving contents when growing and constant time mid insertion/deletion are useful. Even if
std::collections::LinkedList does not provide the latter, the former is still desirable. I disagree that it’s actively harmful, as the documentation is pretty clear and there basically zero example snippets of rust code using them. I don’t really believe anyone is using a LinkedList by accident (in Rust; in C++ std::list is more easily confused, but Rust very heavily favors Vec). I’d say just leave it be.
Thus LinkedList may have methods like LinkedList::insert_after(&LinkedNode, T) but now you have to assume the given ptr actually corresponds to this particular linked list.
That method just takes a node and a
T, it does not take a list. So I don't see how it has to assume that. Also, an operation that can result in unexpected behavior under pre-condition violation can still be made memory safe.
a decent implementation provides too many footguns,
I hope I am not misenterpreting the argument that you are trying to make. But your post reads like it is impossible to provide a doubly-linked list implementation in Rust that offers useful operations while being safe and convenient to use.
This argument might appear to hold if your only point of reference is
std::LinkedList, but given that there are both intrusive and non-intrusive crates implementing doubly-linked list that offer these operations and are safe and convenient to use, I fail to see how your argument holds.
If you want to convince me of the opposite you would need to show how these examples actually apply to, for example, the doubly-linked list in contains-rs (GitHub - contain-rs/linked-list: An alternative implementation of std::collections::LinkedList). It doesn't let you take a reference to a node, but you can insert another list after a node using a cursor, which kind of solves the problems you mention.
The first is better served by a vector of vectors (example is typed_arena) or by a linked list of chunks (C++'s std::deque works this way I think?)
The second is not generally true: to insert into the middle, you'll need to traverse the list until the middle, which takes linear time, see this talk by Bjarn Stroustrup for some benchmarks. If you are using a cursor API, edit and traverse list in changing directions at the same time then yes, it could be faster. But this is definitely an extremely niche use-case.