Lock free performance is generally superior to mutex/rwlock based synchronization.
The code becomes cleaner because there is no need to manually invoke mutexes or locks, one simply uses std::collection::{LockfreeVec, LockfreeMap} and Arc::new(LockfreeVec::new()).
The user is safe from deadlock "footguns."
Reason for Inclusion in std :
To ensure ecosystem interoperability. If implemented in third party libraries, Library A might use Lock-free Library A, while Library B uses Lock-free Library B. If I use both libraries and need to pass the output of Library A (returning Lock-free Type A) to Library B (expecting Lock-free Type B), they become incompatible.
Benefits : It offers the same interoperability benefits as the current Vector implementation, where all users are compatible with one another. The usage is also significantly simpler than Arc<Mutex<Vec>> because it eliminates the need for locking, which can cause deadlocks while this approach is deadlock free.
For example :
let a = Arc::new(LockfreeVec::with_capacity(10));
let b = a.clone();
thread::spawn(move || {
b.push("input");
}).join().unwrap();
For something like this, I would recommend making a crate and publishing it. Once it shows that there's enough interest to use it, then std adoption can be considered.
The existing collections are lock-free. Vec does not involve any locks.
I think what you mean is that they should be linearizable, i.e. thread-safe, and lock-free. But the thread-safety is the main difference to the existing collections.
To add to this, such datastructures typically come with a lot of tweakables and tradeoffs. Even if some numerous high quality implementations did exist in various crates[1]std might not be the best place to rehome them.
Lock-free implies these collections are supposed be mutably shared across threads, but mutable sharing itself can be a source of performance problems. No matter how you do locking (or not - see false sharing), you can still have performance penalty from writing to memory cached across cores.
There are many possible trade-offs for mutably-shared collections, and I wouldn't say they're faster, but rather you can choose which workloads are least slowed down by the sharing.
You can make collections that tolerate higher concurrency when writing or appending, but this usually comes at a cost of slower reads and pointer chasing.
Or you can design for mostly-immutable collections that only need an occasional write, by using optimistic locking (and still call it lock-free because it uses spinlocks or swaps pointers to copies).
And there's the special case of wait-free (for guaranteed latency, usually at the cost of everything else), and a subset of that which is signal-safe where you literally can't use locks.
It's not a "generally superior" upgrade. It's a bunch of tricky problems with different costs and downsides, highly dependent on the workload. If you make some middle-of-the road universal collection, it may end up being inferior in every workload compared to more specialized implementations.
What's the use-case? What do you need interoperability with, and why by sharing a specific collection type and not a channel, callback, or an implementation hidden behind a trait?
Compared to e.g. Java std is lacking just about any shared-mutability collection in the first place. However, curiously but without quantification I think that observation holds across the ecosystem, too. Anecdotally for bloom filters, count-min-sketch, and hyperloglog implementations, despite that parallel use being a rather prominent application and there being specifically distributed variants addressing cache hierarchies, false, and true sharing, many crates implementing them provide an interface either based on &mut self or are not sync.
The interoperability argument seems weak without a concrete example. Moving these into the standard library would not magically make them better and: which interface would pass a concurrent data structure across crates, not offer an opaque wapper for it? The comparison to Vec, which is a buffer for String, Read in std, and has specialization to make into_iter() (and conversely extend) efficient, is a rather large leap of logic. All of those motivate it being an actual vocabulary type of sorts. Arc in comparison is already less motivated but it gives us stable unsizing and some std-os modules can build on it. What's the tie-in for lockfree data structures?
Maybe consider if, instead, a specific lack of deeper core routines / types contributes to the disconnect between the two worlds that should be addressed first. Rust makes it easier to work on exclusively owned data, in general that is by design and good. However, for another more special class of algorithms are we missing something–apart from the obviously hard atomic unordered / memcpy, is there some safe core abstraction that we can already provide to facilitate?
This is wrong, if it uses a spinlock the it is not even non-blocking, since progress of some other threads depends on the first thread releasing the (spin) lock.
The problem with lock-free data structures is that they typically require lock-free memory reclamation, unless they have some very specific access constraints (cf. tokio::mpsc). Java does not have this problem because of GC. There is generally no single memory reclamation scheme that is "optimal" for all data structures, either. Also, using a single global memory reclaimer (like crossbeam-epoch) leads to simpler data structure APIs, but is not ideal for performance, especially if used by different lock-free data structure instances that are logically unrelated.
So std would not only have to provide the lock-free data structures but also either a single memory reclamation mechanism implementation used by all or design something like a generic trait-based reclamation API, which is not really something that has been seriously attempted thus far. Given that Rust does not even have a stable Allocator API (which is basically a solved issue) I seriously doubt anyone would want to commit to something like this.
I was alluding to the fact that the term "lock-free" is overused as its meaning has been diluted to "clever stuff with atomics" - it's not called a spinlock if you write the CAS loop manually! The term got diluted so much that "wait-free" is used for the stricter meaning of lock-free.
Wait free is a strict subset of lock free. Wait free guarantees forward progress for all threads. Lock free guarantees forward progress for at least one thread. These properties have to hold even if an arbitrary strict subset of the participating threads are suspended.
I don't see these terms misused much, at least not in hard realtime where I work. Using wait free as a synonym to lock free is just wrong.