Matches-index (July 2025) -- reviving the left-pad index

6 years ago we had this thread:

I was thinking about it recently (in connection to contemplating an ACP for StableDeref inclusion in core) and decided to update the analysis. See here:

I'm now calling it the matches-index, since matches!() was stabilized in std in Rust 1.42, released in March of 2020 -- about 5 months after the previous thread was started.

In addition to stable_deref_trait, perhaps at some point equivalent and/or all of indexmap could also make for good additions to core/alloc?

(Click the little calculator-like widget to do custom filters on the data without duplicating the sheet.)

9 Likes

Sorting by dependencies/size, a fair number of them have made it into the std or even the language in recent years. E.g.:

  • lazy_static/once_cell
  • atty
  • num_cpus
  • static_assertions
  • home (now that home_dir is undeprecated)

And the replacement for cfg_if is close to stabilisation.

12 Likes

From this list, arc-swap, bitflags and bitvec / bit-vec are obvious candidates. (another crate from this list, bit-set, is built upon bitvecs and can actually build a bitset from a bitvec passed as parameter. It depends on the crate bit-vec which is unfortunate if you use bitvec. Having bitvecs in the stdlib would improve interoperability)

Another thing I would like to see in the stdlib is ordered-float, and in special the NotNan type. Rust has things like NonNull and NonZero, and I think that NonNan would be a great addition.

And I wanted to say bytemuck too, but the safe transmute thing is supposed to be even better, right? If/when it happens.

8 Likes

I don't think arc-swap is a given, I saw an interesting alternative design: ArcShift — Rust memory management library // Lib.rs

Not as popular, but fairly new, and makes a good argument for the design difference.

That is in general a thing to watch out for: is the design space fully explored?

Or zerocopy. Multiple points in the design space again.

6 Likes

I think having a survey of the bitfield / bitvec / bitset / bitflag space would be useful. There are a lot more or less popular crates there, and there is usually some overlap in between them.

For example, I was recommended modular-bitfield — data structures in Rust // Lib.rs (haven't had the opportunity to try it yet) as being the best design for bitfields. But there are at least four other similarly popular crates for this.

1 Like

Thanks for the link. I was reading the docs and I don't understand why this approach would be better than arc-swap. If there's many writes, won't it traverse a potentially very large linked list each time it is read?

What I like about the arc-swap design is that it actually wraps the Arc<T> with something like ArcSwapAny<Arc<T>>, so the same wrapper enables updating Option<Arc<T>> atomically too (an important use case) but also works for custom reference-counted pointers implemented elsewhere.

Anyway you raise a good point: library ecosystem have network effects, and the oldest libraries have more mindshare even if the newer libraries are better designed. Even when you compare only recent downloads rather than total downloads, the fact that some library is older and more established will give it an advantage. So this list can say "people often need bitflags" but not "the stdlib should adopt the API of the bitflags crate".

Indeed this situation already happened once, the crate lazy_static was used since before Rust 1.0, but it wasn't until mitochondria, lazycell and finally once_cell that it was finally included in the stdlib (I think the stdlib version even made some improvements over once_cell)

2 Likes

I believe it says right in the readme it is optimised for rare writes. To me the arcshift design looks inspired by RCU, and the advantage would be lockless reads even when writes happen.

A typical example where RCU (and RCU like designs) are good is where reads happen all the time, but writes happen on a human timescale. Consider something like the routing table in a kernel. Switching between networks happens at most every few seconds, but usually far less often than that. Or the table of connected USB devices.

Arcswap on the other hand seems to be optimised for a mix of reads and writes, but with the downside that reading won't scale as well.

I also found the reddit thread from where the author announced arcshift to be quite interesting (if very long): Arcshift - an Arc that can be updated : rust

Lazy/Once Cell/Lock is indeed an interesting case study in how to figure out which API is best. In that case we got both, as there are reasons to use either. The Once API is more fundamental though, and I belive used internally by the lazy variant.

3 Likes

There is ongoing work to make API that equivalent to itoa crate: Tracking Issue for integer formatting into a fixed-size buffer · Issue #138215 · rust-lang/rust · GitHub

Note that NotNan technically correctly implements Eq but does not get rid of all surprises in the set of floating-point values; in particular, -0.0 and 0.0 compare equal but print differently, and can produce very different results when you compute with them. This is, IMO, something that needs further exploration (e.g. many applications would be happy with replacing -0 with +0, but what is the performance cost?) before any “blessed” solution gets into std; Eq impls that ignore some distinctions in value are valid but not something std has had so far (unless you count HashMap/HashSet equality despite iteration order differences).

One rather interesting aspect of moving NotNan to the standard library would be that this could improve on the library by marking a large amount of bit patterns as invalid. Where many dynamic language do tagged-nans for internal types, Rust could optimize those for all kinds of variant tags including Option<_> which would also make the ABI of the necessary checked_* methods transferred from f?? surprisingly clean. On that note, (A CanonicalNan and Finite would be just as interesting from this point.)

2 Likes

Maybe there should be another floating point wrapper called UnsurprisingFloat<f32> or something which doesn't just disallow NaN, but also other things.

NonNan is supposed to disallow just NaN, so i expect it to not allow -0, subnormal numbers (if supported by the platform), and +- infinity, which are three other gotchas I can think about. One use case of NonNan is representing in the type system the invariant expected by nan-boxing, like the crate boxing, though this particular crate doesn't use exactly NonNan but another thing that closely resembles it, a floating point number that contains just one NaN.

Maybe this proliferation of wrappers point to the need of pattern types as a general solution, but I think pattern types shouldn't block NonNan since it's so useful as a building block. Also Option<NonNan<f32>> should store the none variant as the "canonical nan" (for platforms where there is a canonical nan), which is something that only the compiler can do AFAIK, and removes the need for a separate type that has a single NaN.

Just skimming the top of the list as sorted by Dependencies / squared size, I think these would be good candidates:

Crate Notes
hex convert bytes to and from hexadecimal. Could be an extension to #138215 maybe
urlencoding & percent-encoding encode and decode url strings
byteorder convinence methods for encoding and decoding big and little endian
fastrand when you just want some "random" looking numbers and don't care about making them unpredictable (or where being predictable may be wanted)
walkdir recursive iteration of directories; super useful but maybe std would want to pick a subset of this
tempfile create anonymous or named temporary files and clean them up on drop

Except for the last two, these could be in core. Or at least some subset of them could be.

1 Like

The fact that there are so many plausible variations is exactly why this shouldn’t be in std yet. Note that you do not just need to decide what values are prohibited, but what the operations that would produce them will do instead! For example, even NotNan just a few months ago changed some operations from returning NotNan<T> but panicking on NaN, to returning T. That’s the sort of API design iteration that needs to be thoroughly settled and tested in practice before anything enters std.

Many people have many complaints with how IEEE floating point works, but it's not just a few “bugs” that you can “fix” and get a well-behaved type out of. You have to look at whether your new design is internally consistent, and whether it works well in practice.

8 Likes

These don't seem to be very far up your list so I am curious how you reached that conclusion?

This has been discussed previously in the context of -ffast-math.

The problem is that different users want different things from non-IEEE floats. It's not just lack of NaN, but only finite, no subnormals, associative, etc. There are also different possible approaches to dealing with violations — extra checks that prevent invalid values, or allowing invalid values and allowing invalid results, but not UB, or maximizing performance at cost of unsafety.

This ends up requiring lots of configuration knobs, and Rust's type system is not suited for it.

2 Likes

How is the size measured, btw? Does it take comments and doc comments into account? And unit tests?

1 Like