Brainstorm request: How to get benefits of small and large crates

There’s been some discussion of trade offs between smaller and larger crates. I think we should find a middle option with an alternate mix of pros/cons.

Motivations:

My summary of the issue:

There is a trend towards smaller crates parts of rust ecosystem (Tokio is a good example).

Pros:

  • Easier to make breaking changes
  • Faster compilation time

Cons:

  • Coordintation/Discovery can be hard
  • Per dependency overhead:
    • Is it maintained
    • Is it license compatible
    • Does it have a minimum version check
    • Filing an upstream bug: new repo/author
    • Probably more
2 Likes

Potential middle ground:

Perhaps we could allow a parent crate to be defined that groups together children crates and reports everything. The children crates would have be cut from the same workspace though with much of metadata coming from the top - authors, minimum version check, license, etc. The cargo can collapse all those dependencies into one when listing dependencies? Because at that point being separate crates is mostly an implementation issue - it’s more so that the children crates are there as smaller code-gen unites.

1 Like

As the person who probably provoked this (but certainly not the only one who thinks this way), I’d like to start this discussion by strenuously making clear that this isn’t a black or white issue. I am just as guilty as adding dependencies to crates over time as anyone else. It is extremely hard to resist doing it, precisely because there are so many good reasons to motivate code reuse, which is obviously something that I (and I think many others) value highly.

To be more concrete, here are some things I have done or are considering doing in the near future:

  • bstr avoids dependencies on Unicode crates or pre-existing substring search crates, and instead rolls its own. In part to provide cohesion, in part to improve on the pre-existing things and in part because some of the pre-existing things weren’t a good fit for non-UTF-8 byte strings. (I elaborate more on this here, but I guess to be clear, this wasn’t just about minimizing dependencies, although it was a factor.) Moreover, bstr has features that can be turned off to slim it down to just a single small dependency (memchr).
  • I’d like to move utf8-ranges into regex-syntax, primarily because nobody uses utf8-ranges without regex-syntax, so it’s a good opportunity to reduce dependency count.
  • I was writing a new crate for packed multi-string searches, but I think it would fit better as a sub-module of aho-corasick.
  • Recently, csv has grown a few dependencies, and they aren’t always needed. I’m considering making serde (and associated deps for serde-only code) optional.
  • It is likely that regex will grow a few dependencies in the future (bstr, regex-automata, and possibly another one I haven’t conceived of yet) and I’m not sure how to mitigate that.

These aren’t all necessarily unambiguously good ideas. But this is kind of where my head’s at. And I thought it would be good to give some concrete ideas. I also think @lambda’s comment on Tokio RFC github issue is also really good and I agree with a lot of it: https://github.com/tokio-rs/tokio/issues/1318#issuecomment-512127908

1 Like

The unic project is another interesting data point. It’s currently divided up into a huge tree of crates (most of which don’t exist yet tbh) to achieve code separation and allow use of just what’s necessary. However, at the same time, we’ve been versioning the whole tree synchronously for simplicities sake.

I’ve entertained the idea of flipping this, and users would instead only ever import unic itself, which would have a large feature tree that controls which implementation crates are pulled in “transparently” behind the facade, which also solves the unic::ucd::case versus unic_ucd_case difference as well.

I think “package local dependencies” (e.g. a workspace with one API crate but internal crate boundaries not exposed on crates.io) is one potential way towards having our cake and eating it too.

A “build your own facade”, if you will.

1 Like

One thing I hadn’t considered until I read you comment was this - are the children crates exposed through crates.io? If this “build your own facade” thing needs special support (which I would think it would), I think it’d be easier to introduce in a backwards compatible way by also exposing the children crates on crates.io (but marking on the crate page that it’s a child crate of x).

It might also make other things more discoverable because children crates might have more descriptive crate names. On the other hand, I’d expect that most of the children crates wouldn’t be very useful on their own. :man_shrugging:

You know, there's one huge problem with feature flags. Say I had an app that depends on the url crate

# my-app/Cargo.toml
url = "2"
reqwest = "4.0"

And reqwest's Cargo.toml flips on the serde feature

# reqwest-4.0/Cargo.toml
url { version = "2", features = ["serde"] }

Because of how features are unified, my app can now accidentally depend on serde + url without explicitly asking for it.

2 Likes

This seems lintable (although non-trivial). If the metadata for url includes feature flag data on every public item, and cargo passed the dependencies features requested by my-app in when building it (e.g. rustc --crate-name my-app --dep-features url=default), then it could give a warning when you directly reference an item that requires a feature you don't explicitly activate.

You are describing a situation where your app does in fact transitively depend on serde + url, because your dependency reqwest needs it. I'm not sure what else you're expecting?

If you removed all of the cargo features so all of the code is available all of the time, and deleted the serde-related code in url, your app wouldn't compile, because reqwest depends on it.

The problem is that if usage of url/serde is private to reqwest it is a non-breaking change to remove that feature from reqwest's dependencies. If my-app had been accidentally directly relying on that feature being active (e.g. by serializing a url somewhere) then a patch version update of reqwest could cause my-app to fail to build.

1 Like

Implicit dependencies are an old problem, like using something from an indirect C header that you really should #include yourself.

2 Likes