Syntax Sugar and derive-redirection for Newtypes, to work around the Orphan Rule

Motivation

We all hate-love rusts orphan rule.

Traits that are foreign to our crate, cannot be implemented for types that are foreign to our crate.

That makes perfect sense and protects the ecosystem from a lot of catastrophic sillyness, or "unsoundness" as some people insist on calling it.

However, application authors and other leaf node library authors often despise this rule. Perhaps the biggest inconvenience is that the powerful derive macros that exist for a variety of use cases, are not available on foreign types. Using a newtype pattern, authors can only implement foreign traits manually. But this can be needlessly tedious.

This is definetly not ideal in the crates.io ecosystem - a lot of "serde" feature flags would not be necessary anymore - though still convenient. Less popular libraries than serde are usually just unsupported and cannot easily be combined with libraries like serde can ve- even if the derive macros they define would work, and no additional logic is required. This is a very annoying inflexibility, and prevents the ecosystem from progressing on these matters or even competing with serde. (not that that is necessary.)

Attempts to remove the glorious orphan rule have often been discussed:

But i am not really a fan of these proposals, as it makes existing code so much harder to reason about, and would add very complicated new syntax to the type system.

My idea is pretty simple, and could significantly ease any orphan-rule pain.

Syntax Idea

Compiler support for newtype enum / newtype struct!

Simply define: pub newtype struct crate_a::TypeA as MyNewType

Why would it help!?

  • Copy any type and make it local to your crate!
  • Freely define impls for (no longer) foreign traits!
  • The compiler can use the TokenStream from the original type for any derives on the newtype!
  • No danger of changing any behaviour or breaking expectations in upstream code, since it is a new type that can't be passed off as the original!
  • "Inherit" all impls from crate_a, and any impls from crates that implement their own traits on it!
  • (We should discuss if you can override existing impls by crates on the newtype)
  • (We should discuss restrictions on accessing/deriving on private fields and upholding encapsulation)
  • (We should discuss newtypes of newtypes)

In it's most simple Form, this newtype would be nearly equivalent to:

struct MyNewType(crate_a::TypeA)

impl Deref<Target=crate_a::TypeA> for MyNewType { /**/ }

/*
 impls outputted by the redirected "derive()" 
*/
// the generated code should mostly work because of the deref
// we might need to change the identifier of the struct/union/enum in the original typedefs token stream to "MyNewType", etc..

The behaviour we would observe (overriding with deref, "inheriting" of impls) is pretty close to what we want here. Derive redirection is almost all that we need, even if it would be worth it thinking these newtypes through to the end.

Okay, but hold on, let's look at an example to understand how this would work in practice, and the implications and edge cases of this solution.

Cross-Crate Example


=== CRATE A ===
//! Provides an interesting type

struct TypeA { /* ... */ }


=== CRATE B ===
//! Provides a derivable super useful trait.
//! Think of Serialize, Deserialize
//! But also less popular similar libraries like ts-rs, Tsify, serde-likes that don't enjoy ecosystem-wide support.
//! Or think of Clone or Debug, PartialEq, things that force you to fork and MR just because someone forgot them upstream.

trait SuperUsefulTrait {

}

// in crate_b_proc
#[proc_macro_derive(SuperUsefulTrait)]
pub fn derive_super_useful_trait(_item: TokenStream) -> TokenStream {
    // powerful and useful derive implementation.
}

=== CRATE C ===
//! Third Crate: Defines it's own trait and implements it. 
//! This is the tricky part once we get to my proposal.

trait BoringTrait {

}

impl BoringTrait for TypeA {
}

=== MY CRATE ===


// NEW SYNTAX: "newtype struct" that uses definition and impls from crate_a. 
// crate_c's impl for crate_c::BoringTrait could also be available (design choice)
// blanket impls will probably already apply directly on the new type.
// can be cast to TypeA using From/Into or "as", depending on whats easier.

newtype struct crate_a::TypeA as TypeAA;

// nice try orphan rule, this struct is not a foreign type any longer!
//
// The reason that this could be awesome:
// derive macro recieves same TokenStream as if derived for crate_a::TypeA directly!
// derived impl is available only on newtype for users of our library.

#[derive(SuperUsefulTrait)]  
newtype struct crate_a::TypeA as TypeAAA;

// allowed:
impl crate_b::SuperUsefulTrait for TypeAAA {
}

// disallowed, already implemented (or other design choice: takes prescedence over impl by crate_c):
impl crate_c::Boring for TypeAAA {
}

Conclusion

What do you think! I have no idea how rustc works and how hard this would be - and the syntax may not be the most thought-through final thing ever, but... I think this actually has a shot of easing the pain with the orphan-rule, while keeping it in place! The orphan rule is good! The workarounds are actually kind of the way to go, and should be way less of a hassle.

I am extremely interested what issues arise with this idea! Please keep in mind that i am not that experienced in language design- I value your opinion far more than my own if you stay kind in the replies!

1 Like

I don't think this can work. Apart from the problem of having access to the TokenStream at all (I don't think the compiler has access to the source of dependencies, only .rlibs), how would you prevent a derive macro from creating instances of the inner type that violate that type's invariants?

Consider for example the case of a NonEmptyVec that's internally just a Vec, but whose interface guarantees that it's never instantiated as empty:

// crate non_empty
struct NonEmptyVec<T> {
    inner: Vec<T>,
}

// no Default impl!

impl<T> NonEmptyVec<T> {
    pub fn new(value: T) -> Self {
        Self { inner: vec![value] }
    }

    pub fn first(&self) -> &T {
        // This is okay because the inner `Vec` can never be empty
        unsafe { self.0.get_unchecked(0) }
    }

    // other methods..
}

// crate user
#[derive(Default)]
newtype struct non_empty::NonEmptyVec as MaybeEmptyVec;

let MaybeEmptyVec(empty_non_empty) = MaybeEmptyVec::<u8>::default();
empty_non_empty.first(); // oops

Does this include private fields? What happens if their type is also private? What about semver compatibility (e.g. if the original crate changes the type fields)?

This is not possible. Suppose that crate_a implements the following traits for a type Bar:

  • Foo<Assoc = Bar>
  • Baz, which requires Foo<Assoc = Self>
  • Qux, which requires Foo<Assoc = Baz>

Then if you newtype Bar in NewBar you can implement Baz or Qux but not both, because they are mutually exclusive for any type that is not exactly Bar.

"Inheriting" traits also has other problems, e.g. with private ("sealed") traits and unsafe traits that have specific preconditions.

Private Fields are definetely a concern. I don't think the newtype idea makes sense for structs with any private fields to be honest - It's more useful for fully transparent Data Transfer structs with all-public fields

You are right, it's unacceptable that the types invariants are compromised, especially if they involve unsafe code. What would a newtype RefCell<> do? It's highly questionable, so i think any number of private fields disqualify it for this kind of usage.

Hmmm. I wonder if the trait inheriting thing can be done under-the-hood by converting, running the called trait method with the original type, and converting back to the newtype. (so nothing different than what Deref already does.)

True Trait Inheritance sounds very undesirable to me in practice, you are totally right that it seems like a can of worms

I guess the only question that i really pose is:

For fully transparent Data Transfer Structs and Enums with no private fields, is it feasible to allow derive macros on their newtypes to access the TokenStream of the original type? Could it work and would it be worth it?

Perhaps my syntax idea is overly complicated and confusing. How about:

#[transparent_derive(Serialize)]
pub struct MyType( pub crate_a::TypeA );

// leave the user to implement From, Into and Deref

Don't forget to consider structs with all-public fields while using #[non_exhaustive].

1 Like

This still has some of the problems previous mentioned, for example the fact that rlibs don't store the TokenStreams of the dependencies and how to handle private fields/semver.

In addition to those:

  • what is the macro supposed to see and generate? For example if it sees MyType but with TypeA's fields then the macro will e.g. try to initialize TypeA with those fields, which is invalid. The macro would have to somehow be specialized for transparent_derive, which means this becomes opt-in;
  • how does this handle nesting? For example, what if TypeA contains a field of type crate_a::TypeB? Then you're back at the original problem because the transparent derive on MyType will require Serialize to be implemented for TypeB and you can't do that.

Finally, how does this compare with the "remote derive" pattern (see e.g. Derive for remote crate · Serde). It seems to solve the same problem but without these issues:

  • it doesn't need the TokenStream of the remote type;
  • it can support workarounds for private fields;
  • the presence of #[remote(...)] signals that this isn't a normal derive;
  • nesting is supported because you can specify how the derive works for them (e.g. with #[serde(with = "...")]).
2 Likes

I, for one, just wish it were possible to impl ForeignTrait for ForeignType in any crate, with it only have an effect in that crate. Maybe with the additional feature that if another crate wants that impl, it would be able to use it, and that would only affect the crate with the use.

1 Like

I always wonder why people want this to be honest... Why not just define a newtype, if you don't intend to use derive on the remote type?

(There may obviously be a lot that i'm missing)

Bulletpoint1) You're right. There was a reason for the ´newtype struct` after all. Bulletpoint2) That's not an issue. If TypeB is a private type without Serialize, you're out of luck. But that won't often be the case, since private types in public interfaces are an antipattern - if it is the case, compilation will fail on the derived code as normal. If Type B is public, all public fields, well then you'll have to transparent_derive on it too. Obviously due to Bulletpoint1) that's impossible :frowning:

Yeah, so we obviously have to be able to treat MyType as if it WAS TypeA Syntaxwise and in every way (so it can be used in initializers and pattern matching as if it had the same fields) That's why the current way of defining netwypes is not sufficient. However, i still wonder if the entire trait issue can be solved by automatically coercing the newtype to it's source type whenever necessary. After all, they ARE functionally identical (in memory and size-wise)

I know about Serde's #[remote()]. I want to clarify that my reason for this proposal is that i don't want people to complicate the language to an unusable extent by removing the orphan rule just for such a silly usecase.

As for accessing the originals tokenstream: That is the technical challenge here. Maybe you can build a tokenstream from the type information that the current crate's code is being checked against. But if you say it would be impossible, you probably know more than i do.

Good call!