[Pre-RFC] Scoped `impl Trait for Type`

That's very interesting! Perhaps this is too complex for a established language like Rust, but I would love to know if any existing language implements something like this.

I would certainly consider scoped impls if I were creating a language.

There's 𝒢 (link to the section in Prior art), but it's very C++-y so I don't think it's got any of these safety semantics. I lifted the core idea for the container safety more or less from Genus, which is the other prior art I cite. It's done quite differently there though since it aims for Java-style OOP semantics.

Pretty sure both of those are research languages and not actively developed/supported, though.

C# has extension methods that you can bring into scope on foreign types, but generic-bounds-wise it still only uses interfaces that must be part of the class definition, iinm.

1 Like

The type identity stuff seems pretty subtle. If this sounds good to you, I would suggest that you provide an explicit mechanism to specify a type along with explicitly chosen impls. Then the implicit behavior you describe would desugar to this explicit thing, which would make it easier to follow what's happening.

For example if you allowed the naming of impls, this could look like

use std::str::{impl PartialEq for String as case_sensitive};
impl (case_insensitive) PartialEq for String { ... }
type CaseInsensitiveString = String with impl case_insensitive;
type CaseSensitiveString = String with impl case_sensitive;

You don't even need to allow users to write this in code. Having a desugaring could simply help explain your proposal. However I'd guess that for complex cases having an explicit syntax would really help.

2 Likes

I probably should work explicit binding into the main text even though it's optional to scoped impl Trait for Type's function, yes.

That said, there is a problem with this example:

These types aren't just inexpressible but they (intentionally) do not exist as discrete types under this proposal. That is strictly a runtime distinction for the TypeId of PartialEq-bounded opaque generic type parameter types. (It should still render out in error messages where relevant though, yes.)

There are two reasons for this:

First, it keeps crate APIs clean. If a free-standing³ String in a signature is always just a String, then it's easier to not accidentally expose scoped implementations that you use in a module (aside from the lower-visibility warning, but it's most likely easier to not have to deal with this in the first place). This is also important to cleanly provide derive-dependencies to public fields, and for things like a HashSet<String: Eq in case_insensitive> to cleanly interface with String values moving into and out of it.

Second, and more importantly, if you need a persistently case insensitive string then what you are dealing with is a semantically distinct instance, which should be a usage-named newtype like:

#[derive(Debug, Display, Eq)]
struct MentionedUserName(String);

impl PartialEq for MentionedUserName {
    fn eq(&self, other: &Self) -> bool {
        str::eq_ignore_ascii_case(&self.0, &other.0)
    }
}

impl PartialEq<str> for MentionedUserName {
    fn eq(&self, other: &str) -> bool {
        str::eq_ignore_ascii_case(&self.0, &other.0)
    }
}

(There may be some argument for a newtype shorthand in a separate RFC, but I'm not at all confident that it's a good idea to have those inherit all implementations from their wrapped type by default.)

Edit: ³ I just realised references are generics though. Hm. I think those should have special handling to not cause capturing by themselves, because they're so common in signatures, don't "look" like generics and have no consistency requirements. I think it's reasonable to exclude them, pointers and possibly tuples from causing captures to happen, while generics with <> and function pointers, closure traits do cause it for everything inside, transitively.

That is a bit messy though :confused:


Regarding proper-naming implementations: I'm very strongly opposed to it, since I think it is squarely detrimental here, mainly in terms of clarity but also syntactically and for ease of use.

By identifying the implementation with its trait and module name instead, that's

  • about as concise when used,
  • more clear about what is being done and what would collide,
  • ensures you get all the information you need by reading a usage example of either form,
  • allows semantically identical implementations to share a name¹,
  • and it removes the need for hard-to-remember² edge case syntax.

¹ e.g. a shared "case_insensitive" with implementations for each of str, String, OsStr and OsString, without extra rules. It's definitely possible, but it would be a bit strange to have one particular kind of named item that doesn't collide in Rust.

² That may be a me-problem though. I have to look up the macro_rules! syntax every time I write one.


Something that may be interesting could be the ability to write

use bevy_reflect_glue::typewise_prelude::{impl * for Type};

for cases where multiple traits are needed, but that's definitely a tradeoff that lets people write more brittle code much more easily.

I just pushed an update (diff) to the draft that now specifies a list of implementation-invariant generics.

Otherwise there were only minor changes, mostly fixes to wording.

This is a fix for this problem:


Implementation-invariant generics

The following generics that never rely in the consistency of implementation of their type parameters are implementation-invariant:

  • &T, &mut T (references),
  • *const T, *mut T (pointers),
  • [T; N], [T] (arrays and slices),
  • (T,), (T, U, ..) (tuples),
  • superficially* fn(T) -> U and similar (function pointers),
  • superficially* Fn(T) -> U, FnMut(T) -> U, FnOnce(T) -> U, Future<Output = T>, Iterator<Item = T>, std::ops::Coroutine and similar (closures),
  • Pin<P>, NonNull<T>, Box<T>, Rc<T>, Arc<T>, Weak<T>, Option<T>, Result<T, E>**.

Implementation-invariant generics never capture implementation environments on their own. Instead, their effective implementation environments follow that of their host, acting as if they were captured in the same scope.

The type identity of implementation-invariant generics seen on their own does not depend on the implementation environment.

* superficially: The underlying instance may well use a captured implementation internally, but this isn't surfaced in signatures. For example, a closure defined where usize: PartialOrd in reverse + Ord in reverse is just FnOnce(usize) but will use usize: PartialOrd in reverse + Ord in reverse privately when called.

** but see which-structs-should-be-implementation-invariant.

See also why-specific-implementation-invariant-generics.


This is a purely pragmatic fix to these types appearing in signatures a lot, and not necessarily an elegant solution. I think it's still in line with other cases where Rust uses compiler- or standard-library-privilege to pave over certain ergonomic issues, but im definitely interested in hearing about ways to do this more naturally (while preserving the current level of separate compilation and coherence).

A very useful side-effect, though, is that the implementation-invariance of tuples provides a usually sufficient quick-fix path for behaviour-changewarning-typeid-of-implementation-aware-generic-discretised-using-generic-type-parameters from e.g. TypeId::of::<HashMap<K, V>>() to TypeId::of::<(K, V)>(), as the latter now only makes a distinction between innate identity of and bounds-relevant implementations on the type parameters.

Right, I forgot to mention this earlier: I'm planning to work explicit bindings into the proposal, but it's a larger edit since that seems to require quite a few grammar changes and a few new errors, and is likely to have edge cases I haven't thought of yet.

I'm not entirely sure when I'll find the time, but I was planning to take this RFC fairly slowly from the beginning.

Rust uses types to do monomorphization. If case insensitive strings and case sensitive strings were different types, Rust would be able to, at each call site, inline each PartialEq impl and not pay for dynamic dispatch. This is pretty huge for Rust's claim to zero cost abstractions.

In this scenario (where those are different types), then a bare String is, strangely, a generic type: it's generic over its impls. As a generic type, it either monomorphizes wildly, or do dynamic dispatch.

This also happens with closures: each one receives a different type to aid monomorphization. If you had a single type for all closures (and this type isn't generic) you would be forced to do dynamic dispatch per Rust rules.

So how should scoped impls be implemented? Should it do dynamic dispatch under the hood, and rely on an optimizer to sometimes devirtualize? With this implementation, how to avoid pessimizing the execution of all impls, not just scoped impls?

Or should it monomorphize each instance of a type with differing scoped impls as different types? Like, the type String with case_insensitive impl doesn't exist in the surface language, but it's created in a lowering. In this case, this has the potential to cause hidden blow ups in code size and compilation times (because there is no generic term to mark down the places where monomorphization can occur)

2 Likes

(See also: Cost of additional monomorphised implementation instances)

This needs another lowering layer in the compiler to be implemented efficiently, as purely by-type code generation is too coarse here. 'Monomorphisation +0.5' is a good way to look at it, though.

In essence yes, this is intended to use full static dispatch outside of dyn, but there are a few mitigating factors:

  • The layout of discrete types like String, but also including fully discretised generics, is guaranteed-identical between all implementation environments.

  • Discrete implementations/functions bind all implementations locally, so they are guaranteed-monomorphic between implementation environments by their definition.

  • Generic implementations are polymorphic, but they are guaranteed-identical between almost¹ all bound-irrelevant implementation differences* (so they can be unified to a large extent without looking too closely at their implementation details).

    ¹ Unfortunately, this is another place where TypeId::of::<Generic<T>>() throws a spanner in the works. It may be advantageous to check/flag this in advance well ahead of code generation, since it's the only call that's a source for difference here in terms of defined behaviour. This needs to be scanned for anyway since it produces a warning.

Regarding explicit dyn/trait objects:

  • These would need to have full additional implementations generated per trait.
  • Specifically supertrait coercions could still take advance of earlier/easier deduplication.

Overall, I'd wager that the end result is very close to what you'll see with existing newtypes, possibly ever so slightly smaller and faster given equal use (because more use-cases can be easily unified before code generation and linking). The practical outcome I'm not sure about, however:

  • This is a lot less painful than newtypes, so it'd likely get used a lot more.
  • Code reuse is much easier with this feature, so there may be fewer reimplementations.

I think that, given a compiler that takes advange of these properties, it's likely too close for me to call in advance in either direction.


* Edit: I need to check over how exactly I worded that though, and add a clarification. It's possible I misdefined it and accidentally made non-blanket generic implementations aware of top-level implementation environment differences, which they probably shouldn't be without explicit bounds.

Hello! I'm still working on this.
(I took a holiday break, then unrelatedly got sick, then was extremely busy for a while.)

I pushed another update (v5, diff) that (hopefully) clarifies when monomorphisation happens and when it does not.

Changes:

I will likely still need a few weeks to work explicit binding into the proposal properly though, since I still have a stack of pending organisational work sitting next to me here. That said, this version of the draft should be consistent (as far as I can tell, which is to say it's ready to have holes poked into it again).