Enabling Clean Architecture and avoiding exponential feature blowup

I have some rough ideas about how to improve crates ecosystem to enable Clean Architecture. I think this is best explained using an example. Let's say developer A creates fast_serialize crate that defines serialization traits. Developer B creates crate foo_types that contains bunch of basic types in domain foo.

Developer A doesn't know/care about foo_types. Developer B doesn't know/care about fast_serialize.

Developer C wants to use both crates and integrate them. The current answer is for C to use a newtype.

The issue is newtypes are annoying to use because C has to write boilerplate to delegate the trait methods as well as perform conversions. Even worse, converting things like Vec<NewtypeOverT> to Vec<T> could cause reallocation (unless unsafe is involved).

Now let's imagine a slightly different scenario

There are multiple crates with interesting traits (serde, postgres-types, slog...) A crate defining some base types may want to support them (so like developer B above except he cares about A). So the developer adds multiple optional crates as features. However they could accidentally interact so the proper test would be to test all possible combinations of features. Which is 2^feature_count tests. That may be a lot of tests.

I was thinking it'd be helpful to have some mechanism to allow binary crates to implement external traits for external types. A trait-defining crate could declare it won't implement the trait for any type outside of crates it already uses and vice versa. Having only one impl is then guaranteed and it's guaranteed to be in binary crate only.

There could also be impl crates that provide these impls but can only be used as direct dependency to binary crate, never dependency to library crate. This would be useful if developer C in the top-most scenario wants to make the impl reusable.

I know this is just some rough idea, would like to know what people think.

4 Likes

What if one of those libraries adds an overlapping implementation of the trait in the future? This feature would make adding any trait implementation a semver breaking change. The coherence rules exist among other reasons to allow adding trait implementations without causing a semver breaking change.

5 Likes

So you propose to allow implementing foreign traits on foreign types by instructing rustc\chalk via some user provided code?

At first, it seems like a way of introducing clashes of the impls: what if there are two binaries that define the same impl?

So there is a clear need for separation of these impls.

Let's suppose we have four crates: A with Trait, B with Type, and C,D with impls Trait for Type:

In vacuum, C and D are fine, but if we a have any crate which in any way depends on both of these, we have a not really solvable clash.

If we try to imply restrictions on visibility of these impls, we can have:

  • just some crate-local impls => today it's done via newtypes; defeats the purpose.
  • a way to have nameable impls along with publicity modifiers on them => so we have a situation, when we have to import an impl in addition to type...

The last option seems not bad...

1 Like

Yes, it'd be semver-breaking. The entire point of this is to opt-into promise that new traits won't be added.

3 Likes

That is roughly #[fundamental]. It doesn't allow you to implement foreign traits for foreign types though. Only for example Box<LocalType> despite Box being a foreign type as Box is marked as #[fundamental].

1 Like

Why would be this any bigger problem than me implementing fn foo() -> u32 { 42 } in my project and you implementing fn foo() -> u32 { 47 } in your?

We don't because even attempting to depend on both is hard error. Only binary crates can depend on them and binary crates are responsible for picking exactly one.

Not entirely, you still avoid boilerplate.

Agree. Not sure if they need to be explicitly named. You could write something like use impl_crate::impl trait_crate::Trait for type_crate::Type;

1 Like

Yes, something like that.

One more thing I should note. Not being able to impl a trait later is not too bad. The crate author can just impl it in a separate crate that the consumers can import. Sure it can get a bit annoying sometimes but at least it's not impossible and the annoyance is not too bad.

Alternatively the crate author could write those traits in the original crate with some marker and then the consumer must opt into importing them. (Resembles #[macro_export]/#[macro_use])

How can there be two binaries that are compiled at the same time?

It is bad due to the hashtable problem (or whatever is called, I don't remember). Imagine crate A and B implemented Hash for Foo in two different ways, both with its own named impl. Then an HashSet<Foo> created in A would be incompatible (at runtime!) with crate B due to different rules for hashing the elements.

2 Likes

It would be only possible to use impl in the binary crate and only once though.

But there's still the possibility of the library crate implementing the trait that could create the same situation, no?

The library could only create it if it duplicates functionality like this:

pub fn hash(val: &ForeignType) -> u32 {
    ...
}

impl Hash for ForeignType {
...
}

Then if the binary crate uses a different hash impl and also the hash() fn then it's the binary crate that's at fault.

Is it actually fewer tests if they're separate crates instead of separate features? Or would you just not test all the combinations if they were crates?

1 Like

If they are separate crates then the compiler enforces more separation between them so that many tests are not needed.

I just don't see these as being particularly likely to cause issues. Sure, there's pub(crate) and #[non_exhaustive] and some other stuff where it could hypothetically matter, but for the most part if you write the different features in different modules you have the vast majority of the protection that crates would offer.

So yes, you'd need to test with each feature on its own and with all the features enabled, but that's only one more test than if they were all separate crates.

That'll catch #[cfg(feature = "bar")] when you meant #[cfg(feature = "foo")], things that are unused without a particular feature, and even breakage from things like #[cfg(all(feature = "bar", not(feature = "foo")))].

Yes, it won't catch a sufficiently-complicated guard like #[cfg(all(feature = "foo", not(feature = "bar"), feature = "baz"))], but it's also just not that hard to avoid writing that in the first place.

And separate crates doesn't necessarily make them work together either, since the different crates could try to link different versions of native code that fail if put in the same binary.

Then maybe the proposal should be: allow libraries other than stdlib define new #[fundamental] itens (both types and traits). That way, if a library indeed adds a new impl, it will be a breaking change (and must be accompanied by a semver bump)

I do think this is a substantial scaling problem in the crate ecosystem.

To give a concrete example I've run into: suppose you want to store a git2::Oid in a database. It definitely doesn't make sense to make git2 add a feature flag to depend on the database crate just to provide that impl. That tends to be true of most types. So in practice, database crates need a pile of features to pull in dependencies just to impl traits for other crates' types.

A substantial part of software development is gluing things together in interesting ways not anticipated by the original developers. This limitation makes it hard to do that in the Rust ecosystem; you need to get one or the other original developer to add support for the combination.

4 Likes

@scottmcm those are some interesting points. The usefulness of separating things into crates does seem low.

@dlight maybe, I'd have to better understand what it does to say for sure. Sadly being unstable feature it's not well documented. :frowning:

@josh I approached this so far by picking popular crates/crates I use. I think it's even reasonable to do for super-popular/important things like serde. But it sucks for the consumers of less popular crates.

So, I do agree that it would be nice to have a clean way around the ever-growing dependencies problem.

Interestingly, this is in a sense more flexibility than I'm used to. In C#, if I wanted Git2.Oid to have a [JsonConverter(typeof(OidJsonConverter))] attribute on it or to implement IXmlSerializable, then that's only possible by having Git2 do it. So the impl being able to be in either of the two crates, rather than only one option, is already an improvement.

And if you can't do that, then you need to attribute the field in a type of your own (rather like [serde(serialize_with)]) or there's a collection of converters looked up using dynamic types (which maybe one could do in Rust with something like https://lib.rs/crates/anymap ).

What other solutions are in use in other languages? The only ones I can think of are ones with potential coherence problems, like templates specialization or monkey-patching.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.