Limited Opt Out of Orphan Rule

If I understand the orphan rule, it’s only about protecting users of my crate from chaos? If so, and if I’m willing to cope with any (including any future) collisions, I would want the ability to opt out just for myself!

The following syntax would limit the visibility of the impl, just like pub can be limited. I suggest that, within this visibility limitation, Rust allow implementation to be free from the orphan rule:

impl(self) Bar for Foo {}
impl(super) Bar for Foo {}
impl(in path) Bar for Foo {}
impl(crate) Bar for Foo {}

Making the impl more public than the type would be an error.

The trait methods would only be available within the declared perimeter. So, for special traits like Drop, only objects created within the scope of such an impl would run that drop() method. Can that be compatible with moving the object out of Drop’s scope? Maybe that would be a compiler error unless Drop is explicitly propagated, e.g. as pub fn f() -> Foo + Drop.

Is this overall a logical model? Is it something that the compiler can implement? At least impl(self) should be easy, right?

The above makes it easy to marry any trait to any type. So integration crates are possibly not needed at all, unless they actually implement specific code. In that case impl(in path) – unlike pub(in path) – could allow foreign pathes as well.

Not sure if this should instead or additionally have been an issue on Experiment with relaxing the Orphan Rule

3 Likes

What happens when two different crates impl Hash for the same type MyType (defined in a third crate) and each one builds a HashSet<MyType> with their own impls? Somehow the type system must prevent smuggling each hash table into the other crate, because they would actually break (you can't insert items with one Hash impl and then search for them with another impl).

The same may happen with traits like PartialEq or PartialOrd and many other traits.

That's a kind of bug that coherence tries to prevent.

8 Likes

We can restrict only bin crate can opted-out orphan rule, and can't be used as a dependency. And there's always a valid option called "compile error". If there's overlapping impls the compiler just error out.

It might be a version compat nightmare when upgrading dependencies, but that's my own choice.

6 Likes

Gosh, you made me realize my huge shortcoming. I thought returning HashSet<MyType + Hash> would likewise solve it. But no, I can also move into a 3rd party function, which would not be prepared to consider a different Drop or Hash that it doesn’t have in scope. And your example even extends to references.

This would require some kind of dyn, but then every parameter to every function would need to be dyn, just in case. :sob:

We could move the orphan rule to the point of use: your HashSet<MyType> without my Hash would not be the same type as my HashSet<MyType>. What new can of worms would that open?

Changing the Hash to have a type argument like Hash<whose implementation is this> works, and if it was a rust-native feature I think it could have been made syntactically hidden most of the time.

You can actually implement workarounds for the orphan rule like that in stable Rust (see UserImplementationType arg here).

It does mean that a hashmap will be specific to what context it has been created in, and you will get a type error when you mix implementations (except where you manage to make that dyn/generic).

3 Likes

Then the MyType in your crate is just a syntax sugar for struct MyTypeWithHash(MyType) (the newtype pattern) plus implementing Hash yourself. Newtypes are inconvenient and boilerplate-y, but they at least allow you to convert from one newtype into another (well if you can access the inner value, anyway)

I can't find it, but I remember a proposal here in IRLO that was basically this: a bare MyType, a MyType with a given Hash impl and a MyType with another Hash impl would all be different types. (this sounds like the typestate pattern, and really is just sugar for newtypes too)

Newtype wrappers are convenient, but syntactic cruft. If they can be hidden in syntactic sugar, that would go a long way to transparently solving the problem.

1 Like

Allowing bin crates to opt out and make impls for themselves without regard for the orphan rule is a reasonable experiment that we've talked about a few times. It'd be a worthwhile first step to try.

12 Likes

I guess that would already solve many end users’ big pain point with Rust. Huge pity the linked todo isn’t getting tackled this semester as scheduled!

Since a bin defines a limit on visiblity anyway, the syntax I proposed can be added later, if and when this is also done for libs.

This just reminds me of passing an IEqualityComparer to a Dictionary constructor in C#. We probably would use a self-less trait, so implementations can assume that the same type has the same Eq, so that things like HashMap: Eq makes any sense, rather than an instance, but it's the same idea.

If we just had a defaulted comparer type on HashMap, then you could implement your own one of those for other types, and that also allows for things like rather than doing BTreeSet<Reverse<i32>>, you do BTreeSet<i32, ReverseComparer>.

As you say, I think any loosening of coherence basically ends up needing to do that under the hood anyway, so I think just exposing it as a normal thing might better than making it type system magic.

3 Likes

i've head this called the hashtable problem

1 Like

This is relevant, I think:

I still haven't applied all necessary fixes (see the discussion; There are quite a few cases where code can assume a specific implementation currently, which is one major source of potential unsoundness.), but the RFC and comments taken together go over all problems with orphan rule relaxation that I know of.

There are some parts you could omit by not having import/export, but unfortunately anything that relaxes the orphan rule must either be unsafe itself or take into account some not entirely obvious type system proofs that current Rust supports. (For example, there are two distinct hashtable problems, not just one, and type system proofs can be linked through associated types and multi-trait bounds. Blanket implementations must be treated as 'sealing' a trait for types they cover, too.)

3 Likes

My honest question is what are design constraints on the orphan implementation? From my understanding, it's the following:

  • Must be backwards compatible. Why? First, it preserves the crate ecosystem, and is one of Rust core promises. As a result, it might cause problems for traits being used to keep an invariant. Consider following.

    // crate m1
    pub mod M1 {
      pub struct Foo;
      pub trait Tr1 {
        const ID1: u32;
      }
      impl Tr1 for Foo {
        const ID1: u32 = 0;
      }
    }
    
    
    // crate prove?
    pub struct Proof<T: ?Sized>{
      _ph: core::marker::PhantomData<T>
    }
    pub trait Tr2 {
      fn prove() -> Proof<Self>;
      const ID2: u32;
    }
    impl Tr2 for M1::Foo {
        fn prove() -> Proof<Self> {Proof{_ph: core::marker::PhantomData}}
        const ID2: u32 = 0;
    }
    pub fn explode<T: Tr2 + M1::Tr1>() {
      T::prove();
      if T::ID1 != T::ID2 {
        unsafe {core::hint::unreachable_unchecked()}
      }
    }
    

    Crate prove depends on there being only one place to implement Tr2. With Orphan implementation you can define unsound functions like following (Although if the trait is being used to maintain an invariant, then it should be unsafe as well but granted it might still cause issue in current codebase):

    struct FooBar;
    
    impl Tr1 for FooBar {
        const ID: u32 = 1;
    }
    
    impl Tr2 for FooBar {
        const ID: u32 = 2;
    }
    
    fn main() {
        FooBar::prove(); // UB because Tr1::ID != Tr2::ID
    }
    
  • Should work for the Hash map problem: Addressing "the hashtable problem" with modular type classes · GitHub

  • In Zulip chat, it's mentioned that having a single trait per type is a great way to speed up compilation times in some cases, so maybe that is a good property to have as well.

Edit: list isn't complete ofc. It's mostly a conversation starter.

Compatibility/soundness could be solved by requiring the traits to opt-in to allowing 3rd party implementations.

Traits like ToSql and serde::Serialize could allow being implemented on anything by anyone, and that would be convenient without causing too much chaos.

2 Likes

Would that be backward compatible?

That's what Scala does. A consequence of this is that even if T == U, you don't get that T::Assoc == U::Assoc. That can be quite painful in itself...

8 Likes

We can restrict only bin crate can opted-out orphan rule

That would be a good start, but I hope we can do better. We could also relax the orphan rules when trait and implementation are in the same cargo workspace, or when the crate containing the implementation has private = true.

Of course, rustc isn't aware of cargo workspaces, so this would require a compiler flag like --relax-orphan-rules=foo,bar.

Having trait definitions permit 3rd party implementations is unfortunately not backwards-compatible for existing traits, no.

If

  • crate a defines Trait,
  • crate b
    • defines Type,

    • implements Trait for Type ① and

    • has a function

      fn f<T: Trait + 'static>() {
          if TypeId::of::<T>() == TypeId::of::<Type>() {
              // ②
          }
      }
      

then code at ② can currently assume to see implementation ① on T. Changing an existing trait to allow third party implementations would be unsound, due to that. (edit: I got this mixed up, see below.) Changing an existing implementation to allow additional implementations would be sound, but that's not useful if such an implementation doesn't exist yet (and is exactly what Specialisation covers, anyway, with its associated default items).

Something similar happens with fn f<T: Sealed + Trait>() { … } where b defines Sealed, i.e. b could use that to prove the identity of the impl is among a well-known set, and additionally rule out certain other types that are Sealed but not Trait.

This can be mitigated by adjusting how TypeId::of behaves in and on generics (must take added implementations into account sometimes) and by requiring the implementation of Sealed to opt into use alongside 3rd party implementations instead. Then the opt-in on Trait's definition isn't required, either.

The details aren't trivial, but can be intuitive, I think. One problem is that you can get unexpected behaviour of value-key-type-erasing collections if the type differentiation is to broad though, or very high friction at crate boundaries if you don't except certain common generics like Option, Result, slices, function pointers, … (and adjusting these exceptions would at best always require an edition change, if I'm not entirely mistaken).

1 Like

Please share specific examples of when you would like to opt out of the orphan rule. I think it would help much more.

These are what I can think of right now:

General description Example Workaround
External trait + external struct Trying to derive serde on an external struct that does not implement it newtype/ fork upstream/ do not rely on the struct implementing that trait
My trait + external struct Library author wanting to let users implement their trait on others' structs for their use cases Add a dummy parameter to your trait and let user implement it using a dummy wrapper struct

Hope this helps, Steven Hé (Sīchàng)

1 Like

Concrete example of your first case: I have a project right now that uses rustix and unix_path in a no-std context. (It's a fully hosted environment; no-std is only being used to avoid a dependency on the C runtime.) Because rustix and unix_path are independent crates, neither of them provides an impl of rustix::path::Arg for unix_path::UnixPath, and the orphan rule prevents me from providing one, which means I have to write path.as_unix_str().as_bytes() every time I want to pass a path to a rustix function, which is often.

(It's my impression that this sort of thing is the most important reason why people want to relax the orphan rule. I could work around this with a newtype over UnixPath, but getting its ergonomics to match UnixPath's would be more work than continuing to type path.as_unix_str().as_bytes() all over the place.)


This is not technically about the orphan rule, but it's a closely related possible language feature that would require us to solve many of the same problems: Sometimes a crate might wish to override the implementation of a trait by one of its dependencies. This same project provides a good example: In a no-std configuration, rustix's Display impl for rustix::io::Errno prints "os error <number>" for all errors, which is not exactly user friendly; I'd ideally like to override it with an impl that provides the usual human-readable error strings for at least the errno values that I expect to come up in normal operation. Again, this can be dealt with by using a wrapper struct, but at the expense of at least some ergonomics -- right now what I have is a local conversion trait that lets me write

writeln!(stderr, "failed to {}: {}", operation, err.msg())

when err is an Errno, which is not bad but not as nice as

writeln!(stderr, "failed to {operation}: {err}")
1 Like