[Pre-RFC] Scoped `impl Trait for Type`

That's very interesting! Perhaps this is too complex for a established language like Rust, but I would love to know if any existing language implements something like this.

I would certainly consider scoped impls if I were creating a language.

There's 𝒢 (link to the section in Prior art), but it's very C++-y so I don't think it's got any of these safety semantics. I lifted the core idea for the container safety more or less from Genus, which is the other prior art I cite. It's done quite differently there though since it aims for Java-style OOP semantics.

Pretty sure both of those are research languages and not actively developed/supported, though.

C# has extension methods that you can bring into scope on foreign types, but generic-bounds-wise it still only uses interfaces that must be part of the class definition, iinm.

1 Like

The type identity stuff seems pretty subtle. If this sounds good to you, I would suggest that you provide an explicit mechanism to specify a type along with explicitly chosen impls. Then the implicit behavior you describe would desugar to this explicit thing, which would make it easier to follow what's happening.

For example if you allowed the naming of impls, this could look like

use std::str::{impl PartialEq for String as case_sensitive};
impl (case_insensitive) PartialEq for String { ... }
type CaseInsensitiveString = String with impl case_insensitive;
type CaseSensitiveString = String with impl case_sensitive;

You don't even need to allow users to write this in code. Having a desugaring could simply help explain your proposal. However I'd guess that for complex cases having an explicit syntax would really help.

2 Likes

I probably should work explicit binding into the main text even though it's optional to scoped impl Trait for Type's function, yes.

That said, there is a problem with this example:

These types aren't just inexpressible but they (intentionally) do not exist as discrete types under this proposal. That is strictly a runtime distinction for the TypeId of PartialEq-bounded opaque generic type parameter types. (It should still render out in error messages where relevant though, yes.)

There are two reasons for this:

First, it keeps crate APIs clean. If a free-standing³ String in a signature is always just a String, then it's easier to not accidentally expose scoped implementations that you use in a module (aside from the lower-visibility warning, but it's most likely easier to not have to deal with this in the first place). This is also important to cleanly provide derive-dependencies to public fields, and for things like a HashSet<String: Eq in case_insensitive> to cleanly interface with String values moving into and out of it.

Second, and more importantly, if you need a persistently case insensitive string then what you are dealing with is a semantically distinct instance, which should be a usage-named newtype like:

#[derive(Debug, Display, Eq)]
struct MentionedUserName(String);

impl PartialEq for MentionedUserName {
    fn eq(&self, other: &Self) -> bool {
        str::eq_ignore_ascii_case(&self.0, &other.0)
    }
}

impl PartialEq<str> for MentionedUserName {
    fn eq(&self, other: &str) -> bool {
        str::eq_ignore_ascii_case(&self.0, &other.0)
    }
}

(There may be some argument for a newtype shorthand in a separate RFC, but I'm not at all confident that it's a good idea to have those inherit all implementations from their wrapped type by default.)

Edit: ³ I just realised references are generics though. Hm. I think those should have special handling to not cause capturing by themselves, because they're so common in signatures, don't "look" like generics and have no consistency requirements. I think it's reasonable to exclude them, pointers and possibly tuples from causing captures to happen, while generics with <> and function pointers, closure traits do cause it for everything inside, transitively.

That is a bit messy though :confused:


Regarding proper-naming implementations: I'm very strongly opposed to it, since I think it is squarely detrimental here, mainly in terms of clarity but also syntactically and for ease of use.

By identifying the implementation with its trait and module name instead, that's

  • about as concise when used,
  • more clear about what is being done and what would collide,
  • ensures you get all the information you need by reading a usage example of either form,
  • allows semantically identical implementations to share a name¹,
  • and it removes the need for hard-to-remember² edge case syntax.

¹ e.g. a shared "case_insensitive" with implementations for each of str, String, OsStr and OsString, without extra rules. It's definitely possible, but it would be a bit strange to have one particular kind of named item that doesn't collide in Rust.

² That may be a me-problem though. I have to look up the macro_rules! syntax every time I write one.


Something that may be interesting could be the ability to write

use bevy_reflect_glue::typewise_prelude::{impl * for Type};

for cases where multiple traits are needed, but that's definitely a tradeoff that lets people write more brittle code much more easily.

I just pushed an update (diff) to the draft that now specifies a list of implementation-invariant generics.

Otherwise there were only minor changes, mostly fixes to wording.

This is a fix for this problem:


Implementation-invariant generics

The following generics that never rely in the consistency of implementation of their type parameters are implementation-invariant:

  • &T, &mut T (references),
  • *const T, *mut T (pointers),
  • [T; N], [T] (arrays and slices),
  • (T,), (T, U, ..) (tuples),
  • superficially* fn(T) -> U and similar (function pointers),
  • superficially* Fn(T) -> U, FnMut(T) -> U, FnOnce(T) -> U, Future<Output = T>, Iterator<Item = T>, std::ops::Coroutine and similar (closures),
  • Pin<P>, NonNull<T>, Box<T>, Rc<T>, Arc<T>, Weak<T>, Option<T>, Result<T, E>**.

Implementation-invariant generics never capture implementation environments on their own. Instead, their effective implementation environments follow that of their host, acting as if they were captured in the same scope.

The type identity of implementation-invariant generics seen on their own does not depend on the implementation environment.

* superficially: The underlying instance may well use a captured implementation internally, but this isn't surfaced in signatures. For example, a closure defined where usize: PartialOrd in reverse + Ord in reverse is just FnOnce(usize) but will use usize: PartialOrd in reverse + Ord in reverse privately when called.

** but see which-structs-should-be-implementation-invariant.

See also why-specific-implementation-invariant-generics.


This is a purely pragmatic fix to these types appearing in signatures a lot, and not necessarily an elegant solution. I think it's still in line with other cases where Rust uses compiler- or standard-library-privilege to pave over certain ergonomic issues, but im definitely interested in hearing about ways to do this more naturally (while preserving the current level of separate compilation and coherence).

A very useful side-effect, though, is that the implementation-invariance of tuples provides a usually sufficient quick-fix path for behaviour-changewarning-typeid-of-implementation-aware-generic-discretised-using-generic-type-parameters from e.g. TypeId::of::<HashMap<K, V>>() to TypeId::of::<(K, V)>(), as the latter now only makes a distinction between innate identity of and bounds-relevant implementations on the type parameters.

Right, I forgot to mention this earlier: I'm planning to work explicit bindings into the proposal, but it's a larger edit since that seems to require quite a few grammar changes and a few new errors, and is likely to have edge cases I haven't thought of yet.

I'm not entirely sure when I'll find the time, but I was planning to take this RFC fairly slowly from the beginning.

Rust uses types to do monomorphization. If case insensitive strings and case sensitive strings were different types, Rust would be able to, at each call site, inline each PartialEq impl and not pay for dynamic dispatch. This is pretty huge for Rust's claim to zero cost abstractions.

In this scenario (where those are different types), then a bare String is, strangely, a generic type: it's generic over its impls. As a generic type, it either monomorphizes wildly, or do dynamic dispatch.

This also happens with closures: each one receives a different type to aid monomorphization. If you had a single type for all closures (and this type isn't generic) you would be forced to do dynamic dispatch per Rust rules.

So how should scoped impls be implemented? Should it do dynamic dispatch under the hood, and rely on an optimizer to sometimes devirtualize? With this implementation, how to avoid pessimizing the execution of all impls, not just scoped impls?

Or should it monomorphize each instance of a type with differing scoped impls as different types? Like, the type String with case_insensitive impl doesn't exist in the surface language, but it's created in a lowering. In this case, this has the potential to cause hidden blow ups in code size and compilation times (because there is no generic term to mark down the places where monomorphization can occur)

2 Likes

(See also: Cost of additional monomorphised implementation instances)

This needs another lowering layer in the compiler to be implemented efficiently, as purely by-type code generation is too coarse here. 'Monomorphisation +0.5' is a good way to look at it, though.

In essence yes, this is intended to use full static dispatch outside of dyn, but there are a few mitigating factors:

  • The layout of discrete types like String, but also including fully discretised generics, is guaranteed-identical between all implementation environments.

  • Discrete implementations/functions bind all implementations locally, so they are guaranteed-monomorphic between implementation environments by their definition.

  • Generic implementations are polymorphic, but they are guaranteed-identical between almost¹ all bound-irrelevant implementation differences* (so they can be unified to a large extent without looking too closely at their implementation details).

    ¹ Unfortunately, this is another place where TypeId::of::<Generic<T>>() throws a spanner in the works. It may be advantageous to check/flag this in advance well ahead of code generation, since it's the only call that's a source for difference here in terms of defined behaviour. This needs to be scanned for anyway since it produces a warning.

Regarding explicit dyn/trait objects:

  • These would need to have full additional implementations generated per trait.
  • Specifically supertrait coercions could still take advance of earlier/easier deduplication.

Overall, I'd wager that the end result is very close to what you'll see with existing newtypes, possibly ever so slightly smaller and faster given equal use (because more use-cases can be easily unified before code generation and linking). The practical outcome I'm not sure about, however:

  • This is a lot less painful than newtypes, so it'd likely get used a lot more.
  • Code reuse is much easier with this feature, so there may be fewer reimplementations.

I think that, given a compiler that takes advange of these properties, it's likely too close for me to call in advance in either direction.


* Edit: I need to check over how exactly I worded that though, and add a clarification. It's possible I misdefined it and accidentally made non-blanket generic implementations aware of top-level implementation environment differences, which they probably shouldn't be without explicit bounds.

Hello! I'm still working on this.
(I took a holiday break, then unrelatedly got sick, then was extremely busy for a while.)

I pushed another update (v5, diff) that (hopefully) clarifies when monomorphisation happens and when it does not.

Changes:

I will likely still need a few weeks to work explicit binding into the proposal properly though, since I still have a stack of pending organisational work sitting next to me here. That said, this version of the draft should be consistent (as far as I can tell, which is to say it's ready to have holes poked into it again).

1 Like

I published another update (v6, diff), which finally adds explicit inline implementation environment specification to generic arguments. (I ended up using as instead of : as intro to the ImplEnvironment, feels overall a bit more consistent to me.)

Changes:

  • Implemented the former Future Possibilities section "Explicit binding" into the main text as "inline implementation environments", mainly in form of grammar extensions.
  • Specified that call expressions capture the implementation environment of their function operand, acting as host for implementation-invariant generics there.
  • Miscellaneous wording clarifications and fixes. (Turns out you can't call a tuple struct's implicit constructor through a type alias, whoops.)

There was a bit of churn in the Unresolved Questions section too, as some things clicked into place and I moved some paragraphs from elsewhere there. I think the pre-RFC is a good reflection of which decisions follow from the feature and which are better left to someone more knowledgeable than me.

It's likely I'll reorder some sections to make the proposal easier to follow, if possible, but other than that it feels overall solid to me now (even if I'm under no illusions that there aren't any smaller mistakes at all). I'll most likely submit it next Sunday, just giving it another week in case someone notices anything or I remember something I missed during that time.

1 Like

Looks great to me, something like this would be great to deal with glue crates.

1 Like

That's roughly where I started with this (I got very frustrated trying to implement bevy_reflect traits for declarative data-moshing in a JPEG XL decoder :sweat_smile:), but I kept finding more use-cases.
There are a few postponed RFCs (Hidden trait implementations, Coercible and HasPrefix for Zero Cost Coercions), pre-RFCs (Pre-RFC describing mechanism to optionally loosen some orphan rule constraints) and previous discussions (pretty much anything here that mentions "orphan rule") where scoped implementations either covers the use-case entirely, would make the proposed feature easier to use or implement, or provides a generalised alternative.

Aside from usefulness, I think the most important aspect here is that the RFC leads to readable/"boring" code and that it slots nicely into all other (planned) language features.


Funnily enough, that last search above just let me come across Named and scoped trait implementations as a solution to orphan rules [ok, it wont' work], which probably would have discouraged me somewhat had I seen it earlier. I think I already rediscovered and solved all the issues that came up there, though.

This is now submitted as RFC 3634:

I made a few last-minute changes:

  • Added more guide-level documentation changes
  • Reordered warnings and errors to the end of the reference-level explanation
  • Some small adjustments to wording and formatting

The links in the OP are updated to point to the now-numbered file, but I didn't update links in the other posts here yet (where they don't point to a specific revision anyway). I may do that when I have a bit of downtime and am not as nervous about this anymore.

1 Like