Calculating which 3rd party crates are good candidates for "std" inclusion via "left-pad index"

Patrick Walton proposed the idea of a "left-pad index":

I went ahead and crunched the numbers for crates.io, surveying the top 500 crates by recent downloads, dividing that number by the crate size, and coming up with the following results:

Per Patrick, here are some good candidates for potential inclusion in std:

...the idea being that it would be safer to rely on the Rust distribution itself, rather than individual maintainers of third-party crates, for this functionality.

17 Likes

The world will hopefully soon be nodrop crate free, now that we have MaybeUninit which fully replaced it in arrayvec 0.5 (Rust 1.36+), and some crates have also dropped arrayvec 0.4 altogether, losing nodrop that way (which is also cool in my book.) We only need to wait for the version upgrades to trickle in. (Would it be possible to have the maintenance status in a column? :slight_smile: )

My personal favourite microcrate is matches.

4 Likes

I believe crossbeam-epoch has a dependency on arrayvec 0.4 and that's used in quite a few places.

Is being downloaded a lot a good indicator for inclusion into std?

For example, the table above contains lazy_static. It has a lot of dependencies, because it solves a specific and quite a common problem. However, I find the once-cell more elegant and ergonomic. It has less reverse dependencies mostly because it is younger (I think).

Though I guess having this list is nice as a first filter into which to look.

9 Likes

It's gone in git, I think.

If I look close, rdep count seems like a much better indicator. Some of these have ridiculous download numbers for very few deps?

This scoring seems like a very incomplete story of importance in the ecosystem, and is misleading with regards to reusability which should be considered for inclusion in the standard library. Afterall, adding more baggage to everyone's download should at least come with overall useful payload.

Take memoffset for example whose download number comes overwhelmingly from a single crate, crossbeam. Or nodrop whose last release has marked it as deprecated after having been dropped from its main using crate arrayvec, by the same author (see above, @bluss beat me to the punch). Can a single major use be taken as an incentives to inclusion in std?

On the case of phf_shared/phf_generator they are very minimal indeed. However, how useful are they on their own? The extremely costly dependency phf_macros, pretty much required to actually make use of its powers compared to standard hash functions, is not capture by the metrics.

And fuchsia-cprng is also curious since the description already disputes any cross-platform usability: Type-safe bindings for the Zircon kernel's CPRNG.. While that confirms that at least some party connected to Google Fuchsia is a prominent user of Rust & crates.io (Are other crates internally cached, or why don't all fuchsia crates show up that high?) it does not qualify the crate for inclusion in std in any way. And since the metrics rank it that high, that makes me question the metrics.

That's not to say the list doesn't have any useful entries, both matches and scopeguard complement existing features quite well imho (though the interfaces deserve RFCs anways if someone were to propose them).

6 Likes

...which I don't think should be added to the standard library, at least not in its current form, due to improvements to the language itself (see let chains, the possibility of making let $pat = $expr itself a bool-typed expression, ...).

And moreover, memoffset is unsound.

2 Likes

I agree that any metric which includes transitive clients is going to have its usefulness quickly demolished by the "oh, crossbeam used it" problem.

The reverse_deps field is just direct reverse dependencies, right? Because using that metric instead of recent downloads produces the following top 30 list:

  • lazy_static
  • log
  • serde
  • serde_derive
  • byteorder
  • futures
  • failure_derive
  • serde_json
  • matches
  • quote
  • failure
  • env_logger
  • mime
  • hex
  • bitflags
  • num_cpus
  • atty
  • dirs
  • cfg-if
  • time
  • bincode
  • rand
  • strsim
  • error-chain
  • pkg-config
  • proc-macro2
  • tempfile
  • sha2
  • dotenv
  • pretty_env_logger

Which seems much more plausible to me. Although... serde is clearly not a "microcrate". Let's try reverse_deps / crate_size ^ 2:

  • matches
  • lazy_static
  • failure_derive
  • atty
  • phf_codegen
  • hex
  • phf
  • strum
  • cfg-if
  • log
  • doc-comment
  • byteorder
  • num_cpus
  • mime
  • dotenv
  • strsim
  • futures
  • crunchy
  • bitflags
  • dirs
  • quote
  • pretty_env_logger
  • hex-literal
  • try_from
  • maplit
  • crossbeam
  • md5
  • bincode
  • phf_shared
  • digest

Yeah, I think that's a slight improvement. That's probably as good a list of "Rust leftpads" as we're gonna get from number crunching alone.

(I am bad at google docs, but on the off chance it helps somehow, here's the link to my copy of the top 500 table changed to do rdeps/size^2 sorting: https://docs.google.com/spreadsheets/d/1TIU7IQKhwfIWPAWhfX0f-PHfcwsVXsg43MxDUg2AO4I/edit?usp=sharing )

8 Likes

It doesn't seem like a good metric. From crates on the screenshot, only matches and cfg-if are std worthy, imho. And both of them should be implemented as a language feature anyway.

Personally, log is my main request. Also arrayvec/smallvec. byteorder is also a very popular.

UPD: I almost forgot about language-level bitflags. The current implementation isn't user-friendly (IDE's cannot expand macros yet, so it breaks autocomplition).

Agreed, to the point that we keep having conversations about it being a feature with real syntax (one proposal was x is Some(_), for example.)

Would still be worth contemplating putting assert_matches! in std, though, the same way we have assert_eq! even with == syntax...

Good to see we've at least made progress one some of them :slight_smile:

5 Likes

Having some experience with multiple 'mainstream' languages, I was pretty surprised that following crates are not in the std:

  • log
  • rand
  • lazy_static or something similar

Other good candidates:

  • syn, quote, proc_macro2 - must haves to create any proc macro,
  • regex,
  • itertools - it provides a lot of goodies but sometimes I decide to write code in more ugly way just because I'm too lazy to add the dependency or I want to reduce build time,
  • derive more

Also, I don't understand why there is std::time and crate time. And chrono, or at least parts of chrono, seems to be good candidate.

1 Like

I would agree just copying and pasting lazy_static into std as-is would be a bad idea. That said, you're missing the forest for the trees: it isn't so much that we should just outright copy and paste these crates into std, but rather these crates provide features which might be good candidates for first-class std features.

In the case of lazy_static, improving Rust's core support for static data, e.g. static initializers, first-class heap-allocated statics, and associated statics, are common topics on these forums. The problem of static data, in particular, is a problem that can be solved much more elegantly and powerfully at the language level rather than at the library level, and could potentially interact with things like the trait system or program startup.

11 Likes

This would amount to freezing the syntax of the language itself.

I'm personally in favor of ArrayVec<...> using const generics on nightly because they are sort-of a vocabulary type at least for a language like Rust.

Imo assert!(let A = expr && ...); seems strictly more flexible.

We have imported some stuff from itertools over time. Would be worth going over again to see if there are some more things we could add.

Could definitely see this; adding more built-in derives for standard library traits seems sensible if obvious structural implementations can be given.

I think it would take some convincing for me to be comfortable with baking in support for (and thereby encouraging) what are essentially global singletons into the language itself since that is often a code-smell and hacks around better architectures. Most of the times I've used lazy_static! I've come to regret it later.

There are plans :slight_smile: (Me and Oliver should probably write an RFC at some point...)

...are generally wanted but these would allow generic statics and those do generally not mix with dylibs (which many want to ditch eventually...).

6 Likes

assert!(expr == expected) is strictly more flexible than assert_eq!(expr, expected) as well, yet we still have assert_eq! because it can give more useful errors than assert!(==). I think the same applies to assert_matches! and assert!(let =).

11 Likes

Neat! Looking forward to seeing it.

2 Likes

fuschia-cprng is a target-conditional dependency of rand ( or something in rand's dep stack, I forget ).

That might have something to do with its prevalence in this graph, despite the apparent lack of reverse-deps ( in that, cargo users using the rand crate may have it downloaded for them at some point, even though it gets elided from compilation )

Quickly about log: I don't believe that that the style of logging we see now, as exemplified by log, will be the dominant form of how we emit instrumentation from applications and libraries in the decently-near future. I think it'll probably be far closer to OpenTelemetry or tracing.

Introducing a library into std means that a breaking change cannot ever be introduced into that API, not even through editions. As much as I rely on those libraries on a day-to-day, the thought of never introducing a breaking change gives me pause.

10 Likes

I think it's a shame if we can't include basic things in the standard library because of lack of consensus, even when we have empirical evidence that they're widely used.

I strongly disagree with this. lazy_static! is extremely important for all sorts of use cases. It's not Rust's place to be overly opinionated like that.

For example, I recently used lazy_static! several times in a GPU plumbing crate I'm working on to load OpenGL extension functions. Essentially, I was replicating dlopen and dlsym (you can't use dlopen and dlsym directly for various platform-specific reasons not worth getting into here). They're global functions: there would be zero benefit to storing them in a struct and forcing callers to pass them around everywhere. We don't force callers of basic dynamically-linked functions like OpenFile() on Windows to manually load them into a struct and thread that struct through all their code.

Similar reasons crop up in most of the crates I write to implement lazy_static, since I frequently write low-level plumbing crates. Rust is a systems language, and as such must be flexible. Just because you don't use lazy_static doesn't mean a lot of others don't have legitimate reasons to use it.

6 Likes

For servers, maybe. But for those of us who are, say, writing low-level graphics code, those crates are total overkill. I don't want to learn a fancy logging system which has no benefit to the code that I'm actually writing right now; I just want to use log so that I can get printfs that actually work on Android. If log were to go away, then I'd probably migrate to just open-coding calls to __android_log_print instead of using a heavyweight logging framework, which would obviously be of no benefit.

I agree with @bascule that nitpicking the details of individual crates misses the point. The purpose of the "left-pad index" is that small crates that are widely used are one of the sincerest forms of feature requests. If the functionality provided by the crates can be provided by a language feature, or by the standard library in a different way from the popular crates.io crates, then great! What I don't think we should do is ignore the signal entirely. People are using the functionality provided by these crates, period—our choice is whether we want to make Rust users' lives easier or not.

10 Likes