Vague proposal: Extending coherence with workspaces

So after talking to @sgrif recently, I’ve been thinking about trait coherence. I still believe pretty strongly that we do not want to sacrifice coherence in general, but there are definitely some scenarios that are not easily resolved today. One thing that @sgrif raised is the following. Imagine I am defining a crate, let’s call it Ethanol, and it defines various types. I would like those types to be compatible with Serde, but I don’t want to force my users to depend on Serde. Now I am stuck, as I can’t implement the serde traits unless I depend on serde.

What people do in practice today is to use cargo features to conditionally define those traits. That’s workable, but it’s kind of annoying, and it’s stressing the preprocessor, which imo creates problems of its own. Moreover, cargo features don’t necessarily compose all that well: if I have for example one crate that needs Ethanol+Serde and one that wants Ethanol by itself, cargo won’t necessarily figure out that it should take the union of the feature flags (it’s also not obvious that this is the correct thing to do in the first place). All in all, doesn’t feel (to me) like the right solution.

What would be nice is if I could define another crate, let’s call it EthanolSerde, that took both Ethanol and Serde as dependencies and then implemented the Serde traits for the Ethanol types. Now people who want Serde support can rely on that crate instead. Moreover, since i am in control of both Ethanol and EthanolSerde, I can publish new versions in lockstep, and so we don’t really have to fear incoherence. Basically, as long as I myself don’t publish two incompatible definitions of the serde traits for my types, there will always be at most one impl to use. The problem of course is that the compiler doesn’t know that Ethanol and EthanolSerde share the same author.

The recent RFC on cargo workspaces got me thinking. One could imagine tweaking the coherence rules to require that one can only write an impl Trait for Type if either Trait or Type is in your current workspace, as opposed to the current crate. Prior to publication, cargo could then even run a “whole workspace” consistency check to ensure that you don’t impl Trait for Type in two distinct crates within the same workspace.

(Of course, the current RFC doesn’t attempt anything this ambitious. This would basically be adding workspaces as something the compiler understands more deeply, rather than a convenient shorthand for managing multiple crates.)

Thoughts?

(cc @aturon @wycats, with whom I’ve occasionally discussed this)

2 Likes

I seem to recall another thread somewhere recently that said that that is, in fact, exactly what cargo does, and thus it is important that features only ever add to an api, and change definitions. Was that incorrect?

This proposal makes me feel concerned about layering.

Though you ddn’t say how this would be implemented, the most obvious implementation to me is this: rustc would gain a notion of a ‘workspace’, and would be passed a set of crates which are ‘in the same workspace’ as the target crate, and coherence rules would be modified to treat all types and traits in those crates as local. If this is significantly different from what you had in mind, please let me know.

In such a case, although cargo may provide enforcement that rustc’s workspaces map to cargo’s workspaces, and may limit how you can define workspacess such that only ‘your’ crates are within your workspace, rustc makes no such provision. Using rustc directly I could trivially add any dependency to ‘my workspace’, making orphan rules unenforceable outside of the cargo ecosystem.

Obviously I can also just copypaste the body of any crate into my own to get around the orphan rules right now, but that’s a much steeper obligation than changing how I pass dependency args.

If this use case for features is a very common use case (and I imagine it is), it seems like the right course is to make it more usable.

For example, there could be a new kind of feature which is semantically associated with a dependency (probably this association would be best applied in cargo). Cargo could get the dependency graph without any of these features enabled, and then for each crate automatically look for the dependencies associated with features and enable them. Authors would just need to add one of these ‘dependency-features’ (a better name is needed) to their toml, apply an attribute to a module containing all uses and imports of that dependency, and users would have to do nothing at all. If this proposal is viable, it seems even more usable than the workspaces proposal.

Does that mean that type safety (for associated types) would depend on checking done outside rustc, in cargo? I find that just a little bit scary.

One could imagine tweaking the coherence rules to require that one can only write an impl Trait for Type if either Trait or Type is in your current workspace, as opposed to the current crate. Prior to publication, cargo could then even run a "whole workspace" consistency check to ensure that you don't impl Trait for Type in two distinct crates within the same workspace.

I think if you add a feature like that, there should be at least a special syntax on the impl side, maybe requiring a #[workspace] attribute.

In my experience, I’ve come across use-cases where it would be nice if I could just permit my traits to be implemented for any type by any user. For example, in multipart, for certain traits it doesn’t really matter to me who implements them. In fact, letting them be implemented arbitrarily improves compatibility immensely, because I don’t have to find crates that I should be compatible with and implement the traits for the crate’s types myself, or somehow convince the crate’s author to pull in multipart to implement said traits for their types.

Perhaps this could be a little more restrictive by being able to mark traits or types as bypassing coherence only for a specific crate. Using the example from the OP:

In ethanol:

#[permit_impls("ethanol-serde")]
pub struct EthanolStruct;

In ethanol-serde:

extern crate ethanol;
extern crate serde;

use ethanol::EthanolStruct;
use serde::SerdeTrait;


// Coherence checking sees `#[permit_impls("ethanol-serde")]` on the `EthanolStruct` metadata,
// and asserts that this crate's name == "ethanol-serde"
impl SerdeTrait for EthanolStruct {
    // ...
}

Sure, it can be bypassed just by changing the crate’s name, but what would that really let you do? You can’t publish it on crates, AFAIK, because the crate name is already taken.

2 Likes

This is all very manual work, requiring users of crates to opt into using both serde and ethanol (meaning there are 5 ways to use those crates).

  1. neither crate
  2. just serde
  3. just ethanol
  4. ethanol and serde
  5. ethanol and serde together

Wouldn’t it be better to improve cargo features to allow this case specifically? Basically allow a crate (ethanol) to state that if another crate (serde) is required by any dependency in the entire dependency tree, then activate feature ethanol-serde.

The other way around, serde could state that if ethanol is anywhere, enable the feature ethanol. This will of course collide when both have an impl, but that already happens now, because you’d need a cyclic dependency.

This way there’s no magic feature union, and we don’t even end up building ethanol twice (once with and once without ethanol-serde).

Yes it’s magic feature activation, probably ending up requiring multiple passes until no crate requires new dependencies, but it would be so much more comfortable to use than the other options.

4 Likes

I looked at a feature-bracketed piece of code yesterday and thought it fits perfectly in its own crate, then this hit me (and it was related to serde support, too).

Can this be sound if the central crate can somehow explicitly delegate to the “glue crate”?

I think having a modifier on the trait that makes it implementable for any type outside the current crate (strawman proposal: extension trait Too { .. }) could work in all situations mentioned in this thread.

This is not the case actually, because crates.io keeps package names unique by now, but the actual rust crate in the package can be named independently from that.

Could this be used to solve the fact that some inherent methods of primitives (e.g. <[T]>::to_vec) have to be implemented in crates with access to allocation, but the rest can be in libcore?

Right now we have an extension trait in libcore and it feels very hacky.
@bluss suggested orphan inherent impls, but I wouldn’t be comfortable with unregulated orphan impls.

If anyone could implement your traits for any type, two of my dependencies could implement them for the same type, and now I have a coherence violation I can't resolve except by modifying or abandoning one of the dependencies. The guarantee that adding a dependency can't break your build is very important.

4 Likes

I didn't know that people were already doing this... this does feel like the outline of the right solution to me, albeit not the specifics of manually hacking it together with feature flags. Cargo could have a first-class notion of "weak dependencies", which aren't required for a package to build, but certain parts of it (e.g. impl fors) are only compiled in if the dependency is present. If Cargo has explicit knowledge of weak deps and they aren't expressed abusing feature flags, then the scenario you mention isn't a problem because the right way to handle it is obvious? This approach is appealing because it doesn't require modifying the coherence rules and doesn't require a combinatorial proliferation of "compatibility crates".

4 Likes

Thinking on it a bit more, and also @ker’s comment, I am also wondering if some more automatic or targeted form of today’s features flags may be a better way forward.

Something like this?

Cargo.toml

[dependency.serde]
weak = true
version = "..."

lib.rs

#[has(serde)]
extern crate serde;

#[has(serde)]
impl serde::Serialize for ... {}

cargo doc and cargo test would make all weak dependencies strong dependencies.

Catching up on the whole discussion, but in response to just the OP – This is definitely an improvement, but ultimately has the same problem as where we’re at now. The author of Ethanol (or Serde, or whatever else) still have to care about every other crate that users might want to use in conjunction with it, and handle it in the same repo.

Ideally I’d like to see a world where an ethanol_serde crate can just exist, independent of either one. I agree that we don’t want to sacrifice coherence in general, but I think this can get solved by a different feature being added to Cargo, which is specifying external vs internal dependencies.

I think that the main issues that could come from losing the orphan rule would come from transitive dependencies conflicting with each other. However, if we were to simply “not leak” orphans that come from an internal dependency, the surface area for conflict becomes your direct dependencies, and their external dependencies (which I think in practice will generally be very small, and mostly limited to things like libc).

I might be in the minority here, but I think it’s fine for two crates to not be useable together as long as that can only occur when you directly try to depend on both. If there’s an ethenol_serde and ethenol_serde2 crate, I think it’s fine for those two to conflict if you put them both in the same Cargo.toml. If we restrict the leakage of internal dependencies, this would mean that both can exist as transitive dependencies and be fine together.

4 Likes

I think some sort of feature-dependency addition to cargo is a good resolution for the more narrowly scoped problem - I want to implement these traits if they are available, but I don’t want to require them to be available. But there is a broader problem, which @sgrif brings out: I want my dependencies to compose as well as possible, even if their authors aren’t aware of one another.

I think a less costly change that could enable this than weakening orphan rules is adding mutual exclusion to the trait system so that a set of mutually exclusive ‘pivot traits’ like Sequence and Map could be provided in std, and libraries like serde could provide blanket impls of their functionality for all types which implement one of those traits. Obviously this still would require some foresight on the part of the library authors, but less than right now.

In C# interfaces such as IDictionary<K,V> ("Map") extend ICollection<KeyValuePair<K,V>> and IEnumerable<KeyValuePair<K,V>> ("Sequence"). Rusts HashMap<K,V> ("Map") implements IntoIterator<Item=(K,V)> ("Sequence"). I doubt one could come up with a system where it really makes sense for Sequence and Map to be mutually exclusive, nor do I believe one could make a blanket impl on both Sequence and Map because the compiler must still be convinced that you won’t sneakily implement both Sequence and Map for a type (even if doing so would not make sense).

This seems contradictory to me. If it would not make sense to implement both Map and Sequence for a type, then the Map and Sequence traits should be declared mutually exclusive (in a system with explicit mutual exclusion).

IntoIterator is not the trait for a sequential collection of items, its the trait for a type which can produce an iterator. A sequential collection is one in which items are associated with an integer, and if there is an item at integer n, there is item at every integer from 0..n as well.

Well first of all we do not have such a system of explicit mutual exclusion, but more importantly, even if we did, I do not believe that your suggested division makes much sense, because the line between Sequence and Map is rather vague.

And what is an iterator if not a sequence of items? IntoIterator represents the capability to convert a type into a sequence of items, so surely types such as HashMap (or SortedMap / OrderedMap, if we ever get such a thing in libstd) can be argued to be a Sequence.

Exactly, Sequence is simply a special case of Map, from indices to items. Or a Map is simply a Sequence of key-value tuples. Depending on how you look at it, they're really the same thing.

I get the feeling that your division between Sequence and Map is based on formats such as JSON, where there is such a distinction, but other formats, such as XML, do not have this distinction, and I don't believe the Rust ecosystem would benefit from it.