Pre-RFC: Trait bounds on individual enum variants

Summary

Add trait bound clauses to enum variants, allowing several additional concise code patterns and aligning the structurally inferred marker traits with the sum semantics of enums.

Motivation

This is perhaps best motivated by an example:

pub enum Synced<T: Send> {
    /// The contained value is synced with a mutex.
    MaybeUnsync(Mutex<T>),

    /// The value is already synced on its own.
    Sync(T) where T: Sync,
}

fn assert_sync<T: Sync>() { }

fn always_sync<T: Send>() {
    /// Currently impossible, would need `T: Sync`
    assert_sync::<Synced<T>>()
}

The Synced type makes it possible to create a Sync capsule around any type but avoids the overhead of Mutex<_> where it is not necessary. Which of these were chosen is revealed at the site of a match on this enum.

Guide-level explanation

Automatic traits and the construction will now consider each enum variant on its own. Without a trait bound clause, this is equivalent to the current system. But an additional guard clause allows restricting usage of a single variant without affecting the generic argument bounds of the enum type itself. The extra restrictions are then provided back to the programmer through match, by allowing access to the gated impl when matching that variant.

A maybe common occurance is an I/O algorithm that can be implemented more efficiently with BufRead but for which the interface should not depend on it. This kind of function is not adequately captured by current specialization and would require an additional trait and two impls to dispatch. Using enum for this makes everything much clearer and declares the possibilities in a single location that can be freely chosen to be part of SemVer or not. Additionally, the caller may choose not to use the specialized version (e.g. if they determine that some heuristic makes the generic impl more efficient than the specialized version for some kind of reader).

pub enum MaybeBuf<T: Read> {
    Standard(T),
    Buf(T) where T: BufRead,
}

pub fn do_io<T: Read>(read: MaybeBuf<T>) {
    match read {
        MaybeBuf::Standard(reader) => dispatch_standard(reader),
	MaybeBuf::Buf(reader) => dispatch_buf(reader),
    }
}

fn dispatch_standard<T: Read>(reader: T) { ... }
fn dispatch_buf<T: BufRead>(reader: T) { ... }

Another common case is encapsulating a related pair of values while permitting a size optimization in case the representation for both is already present in a single type. This could otherwise be solved with a specialized trait and an automatic impl for generic T but that route may run into conflicting impl blocks, especially if the trait should be implementable by a third-party library as well. Using a constrained enum variant to disambiguiate would not require a generic and broad impl block and be much more concise:

pub trait Value {
    fn get(&self) -> usize;
}

pub trait Named {
    fn name(&self) -> &str;
}

pub enum KeyValue<T: Value> {
    Pair(String, T),
    SelfNamed(T) where T: Named,
}

impl<T: Value> KeyValue<T> {
    fn name(&self) -> &str {
        match self {
	    KeyValue::Pair(name, _) => name,
	    KeyValue::SelfNamed(val) => val.name(),
	}
    }
}

//! Which would otherwise have to look like:
pub strut Kv<T>(pub T);

pub trait KeyValue: Value {
    fn name(&self) -> &str;
}

impl<T: Value + Named> KeyValue for Kv<T> {
    fn name(&self) -> &str {
    	self.0.name()
    }
}

/// Uh oh, dangerously close to conflicting impl.
///
/// tuple (`String`, T) better not be `Value + Named`.
impl<T: Value> KeyValue for Kv<(String, T)> {
    fn name(&self) -> &str {
    	&self.0
    }
}

Now that we’ve seen the use, let us quickly state the straightforward restriction on construction. The implicit functions provided by enum tuple variants are simple extended to also include their own trait bounds. The compiler error message should hint at which bound is violated through something like:

pub enum MaybeBuf<T: Read> {
    Standard(T),
    Buf(T) where T: BufRead,
}

fn main() {
    // Note: stdin() is not buffered like StdinLock.
    MaybeBuf::Buf(stdin());
}

error[E0000]: the trait bound `std::io::Stdin: std::io::BufRead` is not satisfied
 --> src/main.rs:8:8
  |
8 |     MaybeBuf::Buf(stdin());
  |     ^^^^^^^^^^^^^^^^^^^^^^ the trait `std::io::BufRead` is not implemented for `std::io::Stdin`
  |
note: required by enum variant `MaybeBuf::Buf` 

Reference-level explanation

The goal is to elevates the power of enums beyond tag dispatch, improving structural induction for marker traits while also allowing more general type representations. Trait bounds are allowed on each enum variant where they are asserted when constructed and provided to a match expression. Marker traits are inferred by inspecting each variant individually with its bounds before performing the structural induction otherwise.

/// Permit passing a value by reference, and a sized value by value.
///
/// Note that this type is always `Sized` as each variant is `Sized`.
enum ValOrRef<'a, T: ?Sized + 'a> {
    /// `&_` is always sized.
    Ref(&'a T),

    /// This is sized due to the bound.
    Val(T) where T: Sized,
}

impl<'a, T: ?Sized + 'a> ValOrRef<'a, T> {
    /// This requires `Self: Sized`.
    fn get(self) -> T where T: Clone {
        match self {
	    ValOrRef::Ref(t) => t.clone(),
	    ValOrRef::Val(t) => t,
	}
    }
}

An enum variant provides, in a way, a witness for the additional trait bounds mentioned in its variant clause. This way, the generic interface of an enum can offer a less restricted interface with individual variants providing more concrete trait bounds. This provides possibilities similar to runtime specialization.

Within the statement guarded by a match clause that matches Enum::Variant, implicitely make visible the additional trait bounds imposed by that variant. That is, for match statements the corresponding arm, for if let the block, and for irrefutable patterns the following program (although it could be best to feature-gate those individually – only match is the core target of this proposal).

3 Likes

I assume that this proposal is a subset (without existential quantification and type equality constraints) of {-# LANGUAGE GADTs #-} in Haskell. It’s quite useful there and substantially increases the power of algebraic data types. I’ve been meaning to write up a similar proposal myself.

However, this is quite a non-trivial extension of enums that interact with at least traits, regions, type inference, and pattern matching and so this RFC will need substantial additional work. Therefore I think we should move this into the work of the to-be-started WG-traits-lang (cc @aturon, @nikomatsakis, and @eddyb).

4 Likes

I’m not sure if calling GADTs without existential quantification GADTs is entirely justified. The rfc as proposed does not enable new data representations, it makes it possible to forbid certain representations.

If you think of the variants of an enum as one dimension and type impls as another, then all current enum types are rectangles (every allowed type parameter can be combined with any of the enum’s variants). This RFC makes it possible to specify an enum as a proper subset of such a cross product, so that it can become a proper subset of some auto trait impl. The match semantics are also entirely separate from the other changes, just changing the possibilites of structural induction of marker traits would already be powerful in itself (Sized, Send, Sync, Unpin all offer very real advantages if implemented).

I’d think of GADTs on the other hand extending the space in other directions, namely one direction for each optional type parameters of each variant.

To achieve the thing proposed above, you'd write the following in Haskell:

{-# LANGUAGE GADTs #-}

data Expr t where
  Literal :: Num t => t -> Expr t

someExpr :: Expr t
someExpr = undefined

use = case someExpr of
  -- We get out a dictionary witnessing that `Num t` holds allowing `+` to be used.
  Literal x -> Literal $ x + x

What where-clauses alone on enum variants wouldn't allow is:

data Id a b where
  Refl :: Id x x

data Expr t where
  Var :: Var a -> Expr a
  App :: Expr (a -> b) -> Expr a -> Expr b
  Abs :: Var a -> Expr b -> Expr (a -> b)
  Pair :: Expr a -> Expr b -> Expr (a, b)

(you could fake Refl with trait Equal<T> {} impl<T> Equal<T> for T {}).

Therefore it's accurate to call this a subset.

Either I'm confused about what you are proposing, or it does. Consider e.g.

pub enum Cow<'a, B: 'a + ?Sized> {
    Borrowed(&'a B),
    Owned(<B as ToOwned>::Owned) where B: ToOwned,
}

It becomes possible to use Cow<'_, B> for Bs which are not ToOwned. The representation of such Cow<'a, B>s are presumably different because Owned(..) is not a possibility and therefore can be ruled out as a matter of ABI. Thus, the representation of Cow<'a, RandomNotCloneType> is presumably that of &'a RandomNotCloneType since Borrowed(...) is the only inhabited variant.

As with normal ADTs in Haskell.

As with GADTs in Haskell aside for the "auto" part which seems like a needless limitation. GADTs allow you to relax the constraints on the type itself instead moving them to the GADT's constructors and when you pattern match on a constructor, you get a witness out that the constraints hold. This is essentially a small subset of dependent typing (see Stephanie Weirich's talk, https://www.youtube.com/watch?v=wNa3MMbhwS4).

Ah, yes. I hadn't seen the embedding this way around. Well then it does :wink:

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.