Pre-RFC: type-level sets


#1

This proposal is a continuation of sum-enums Pre-RFC, which in turn is an alternative to (postponed) RFC 2587.

About terminology

You can consider “type-level sets” (TLS) a temporary name. (I know it’s an unfortunate abbreviation, but let’s continue with it for now) Alternative names are:

  • type sets (probably will be too confusing considering “typeset”?)
  • true unions
  • sum-enums
  • auto-enum
  • amalgamated sum
  • (anonymous) conflating enums
  • normalized variant
  • fibered sum

Summary

  • Introduce a new syntax A | B | C which will create an anonymous TLS type.
  • Utilize type ascription in patterns to perform matching on TLS.
  • Use TLS to represent static impl Trait with several possible variants.

Motivation

There is two main use-cases for this proposal:

  • Removing a need for short-lived “throw-away” enums:
     // `?` desugaring will include `become` keyword (see further), so `?`
     // will work as expected
     fn foo(val: Weak<str>) -> Result<i64, NoneError | ParseIntError> {
         let strref = Weak::upgrade(val)? 
         let num = i64::from_str_radix(strref, 10_u32)?
         Ok(num)
     }
     
     match foo(val) {
         Ok(n) => { .. },
         // equiv. to `Err(NoneError: NoneError)` and `Err(_: NoneError)`
         Err(NoneError) => { .. },
         // for now we can't write `Err(err): Err(ParseIntError)`
         Err(err: ParseIntError) => { .. }
     }
    
  • Allow to return several types using impl Trait by automatically creating a TLS:
    // This function returns a union of 2 anonymous `impl Trait` types,
    // variants are determined via usage of `become` keyword
    fn foo<'a>(data: &'a [u32], f: bool) -> impl Iterator<Item=u32> + 'a {
        if f {
            become data.iter().map(|x| 2*x);
        } else {
            become data.iter().map(|x| x + 2);
        }
    }
    

In this proposal we re-use a reserved become keyword, which was initially planned for tail call optimization. We either can make TLS and TCO compatible, or use an alternative keyword for TLS.

Explanation

Existing enums can be seen as a TLS which implicitly creates a wrapper type for each variant (it’s not quite how it works today of course), so we can generalize this behavior in the following way:

type Foo = u32 | u64
// or alternatively
type Foo = u64 | u32;
type Bar<T> = u32 | T;

fn foo() -> u32 | u64 {
    if condition() {
       return become 1u32;
    } else cond2() {
        become 1u64
    }
}

fn foo<T>() -> u32 | T { .. }

Considering the set nature of TLS, they have the following properties:

  • A | B is equivalent to B | A
  • ((A | B) | (C | D)) is equivalent to A | B | C | D
  • A | B | A is equivalent to A | B
  • A | A is equivalent to A
  • A | B | ! is equivalent to A | B

The main way for creating TLS is become keyword. It can be used for safely converting sub-sets into larger sets:

fn foo() -> u8 | u32 | u64 {
    let val: u8 | u32 = become 1u32;
    if cond() {
        become 2u64
    } else {
        become val
    }
}

Internally u32 | u64 can be represented as:

union TlsUnion_u32_u64 {
    f1: u32,
    f2: u64,
}

struct Tls_u32_u64 {
    // tag is TypeId based, and can be u8/u16/... depending
    // on a number of variants
    tag: u8,
    union: TlsUnion_u32_u64, 
}

Using TypeId-based tags will make it more efficient to convert from A | B to A | B | C when there is no tag collision.

Matching on TLS will be the same as for usual enums:

match val: u32 | u64 | () {
    v: u32 => { .. },
    _: u64 => { .. },
    () => { .. }, // type ascription can be omitted
}

// see motivation example
match foo(val) {
    Ok(n) => { .. },
    Err(NoneError) => { .. },
    Err(err: ParseIntError) => { .. }
}

match val: u32 | u64 | () {
    v: u32 | u64 => { .. },
    () => { .. },
}

match val: u32 | u64 | () {
    v: u64 => { .. },
    // without an explicit type ascription `v` will have type
    // `u32 | u64 | ()`
    v => { .. },
}

In the last example we don not change type of v to u32 | () for several reasons:

  • Simplicity of implementation
  • Potentially incompatible memory layouts
  • Being coherent with the existing match behavior
  • Reducing number of code breakage on match arm changes.

TLS will implement a minimal set of object-safe traits of included types. In other words trait is implemented for TLS only when all its variant types implement it and it’s an object-safe.

Object-safety is a strong restriction, but will make it easier to start. In future this restriction can be relaxed. (e.g. nothing forbids using consuming method on TLS, but associated types and constants probably should be forbidden)

Generic code

One of the issues often mentioned in discussions for similar proposals is problems with generic code, for example:

// what will happen if U == V?
fn foo<T, V>(val: T | V) {
    match val {
        v: U => { .. },
        v: V => { .. },
    }
}

Arguably considering monomorphization, this code is very similar to this code:

// `get_id` is a static method of `MyTrait`
fn foo<U: MyTrait, V: MyTrait>(input: u32) {
    match input {
        n if U::get_id() == n => { .. },
        n if V::get_id() == n => { .. },
        _ => { .. },
    }
}

In other words we can “solve” this problem by utilizing the fact that match arms are evaluated and executed in order, so if U and V (e.g. u32) have the same type foo will get monomorphized into the follwoing function:

fn foo2(val: u32) {
    match val {
        v: u32 => { .. },
        v: u32 => { .. },
    }
}

It’s obvious that only the first arm will be executed and the second one will be always ignores (and removed by optimizer). In practice it’s highly unlikely that elimination of the second arm will cause any sufficient problems.

If one of generic types will be TLS, this case is handled by ability to include TLSes into match arms (see the third match example). If U = A | B and V = B | C, then we’ll get the following code:

// (A | B) | (B | C) == (A | B | C)
fn foo(val: (A | B) | (B | C)) {
    match val {
        v: A | B => { .. },
        v: B | C => { .. },
    }
}

It’s obvious that if variant has type B it will always go to the first arm, and the second arm will always get variant with type C. So code will work without any problems, and result will be predictable.

TLS as a generalized enum

As was mentioned TLS can be viewed as a generalization of enum:

// this enum
enum A {
    Foo,
    Bar(u32),
    Baz{f: u64},
}
// can be (theoretically) desugared as
struct Foo;
struct Bar(u32);
struct Baz{f: u64};

type A = Foo | Bar | Baz;

This will naturally solve problem of enum variant types which is targeted by RFC 2593.

Type inference

Let’s take a look at the following code:

fn call_a_function<A>(value: A | (), f: impl Fn(A | ()) -> A) { .. }

fn main() {
    let x: i32 | () = become 1i32;
    call_a_function(x, |x| Clone::clone(&x));
}

Currently type checker will not be able to handle such cases (see @ExpHP comment here), so in the beginning such non-deterministic cases can result in a compilation error. Later type checker can become smarter in a backwards-compatible way and deduce the minimal type for cases like this.

become vs enum impl Trait vs implicit conversion

Some proposals introduce enum impl Trait syntax, which will be used like this:

fn foo() -> enum impl Display {
    if cond() {
        1u32
    } else {
        2u64
    }
}

I believe this approach unnecessary clutters function signature. Inner working of impl Trait value should be an internal detail, which should not matter to users.

As for implicit conversion proposals, I believe they are too magic for Rust. Additionally they can result in bugs for traits which are implemented for (). For example the following code with an easy to miss mistake will compile without any issues with such approach:

fn foo() -> enum impl Debug {
    if cond() {
        1u32
    } else {
        2u64
    };
}

Thus we need an explicit conversion inside code, with the most straightforward option of prefix keyword. Alternatively postfix variants can be used as well, like expr@convert, expr.convert!(), etc. (see discussions in this issue)

Extension: const variants

One possible extension of this feature is adding const variants:

type Foo = u8 | const 1u32 | const 2u32;
// note that 1u8 != 1u32, if N=1, this TLS get unified into `1u8 | 1u32`
type Bar<const N: u32> = 1u8 | 1u32 | const N;

match val: Foo {
    v: u8 => { .. },
    1: u32 => { .. },
    2: u32 => { .. },
}

match v: Bar {
    1u8 => { .. },
    1u32 => { .. },
    N: u32 => { .. },
}

Unresolved questions

  • Name bikeshedding.
  • Should we automatically implement Into and From traits for TLS variants? (i.e. From trait will be baked into language) What about sub-sets? (i.e. From<A | B> for A | B | C)
  • Interaction with lifetimes.

#2

Overall I feel like I mostly like it.

My one concern here is: is this a big enough need/feature to warrant allocating a new keyword? Wouldn’t a function of some kind do the same trick? After all, even such „core“ concepts like drop are only a function instead of a keyword (I wouldn’t be against an intrinsics-function, if that’s the right terminology ‒ one that the compiler knows, only that it should be good enough on the syntax level).


#3

Original proposal was using Into implementation instead of the keyword, but several people spoke against it. One of the problems was that number of variants is unknown for impl Trait while parsing code, so Into becomes magic trait, baked into compiler. Thus keyword will be a simpler and easier to understand solution, instead of prefix keyword we could use postfix keyword-y variants à la discussed in the await bikeshedding.


#4

If From/Into are implemented, they still must become specially known in order to be implemented for the “compiler generated type”.

Alternative drastic idea I just came up with: enum match, which is allowed to return non-unifying types by wrapping them in some form of anonymous enum.


#5

While I personally think I would like this feature, I would like to wait a year and get more experience with the current type system features. This feels mostly like an ergonomics win, rather than a deal breaker, so waiting and letting the dust and tech debt settle would be my preferred path.


#6

This RFC fails to address why this is better than a true coproduct type, its main competitor. I suspect that the sole reasons are being able to write x: T => match arms, which I don’t buy as a justification for the cons below.

As with other proposals of this form, it seems to require that T | T = T as types for all T, which seems dangerous. In particular, in the following method:

fn foo<T, U>(x: T|U) {
  match x {
    _: T => println!("T"),
    _: U => println!("U"),
  }
}

If you monomorphize foo::<T, T>, it is not obvious that this prints “T”. This clashes with people’s expectations of enums being ADTs; your typesets are not ADTs. It also seems to me like this new unification rule will add a lot of complexity to the compiler for almost no gain; coproduct types do not introduce a new unification rule, and behave consistently with enums.

Finally; what does type K<'a> = &'static T | &'a T unify as? I think being able to recover that the lifetime is 'static is useful, but being able to do that requires that the unification for typesets be distinct from the usual unification (which unifies &'a T and &'b T to &intersect('a, 'b) T).

If avoiding unification requires defining out-of-band types, then all ergonomic wins have been lost.


#7

I don’t plan to publish a full RFC PR in near future, so no objections here. Plus there is this @aturon’s comment on the alternative proposal: https://github.com/rust-lang/rfcs/pull/2587#issuecomment-457382927

But I hope we will polish the proposed design and will collect opinions/proposals regarding this feature.

Can you name a concrete proposal? Do you mean RFC 2587? All anonymous enum proposals in my opinion had somewhat ugly syntax and questionable ergonomics.

I think it’s a weak argument. What will print the following code for N = M?

fn foo<const N: u32, const M: u32>(val: u32) {
    match val {
        N => println!("N"),
        M => println!("M"),
        _ => println!("other"),
    }
}

Everyone who has a bit of experience with match statements understands that match arms are evaluated in order, and in practice if we’ll get io::Error | io::Error both match arms will be equivalent either way, or difference between them will not matter.

I think it should use the usual unification rules. Yes, it’s unfortunate that we can’t extract longer lifetime, but I don’t think it will matter in practice (and if it will, it can be easily solved by a wrapper type) Either way exact rules regarding interaction with lifetimes is an open question right now.


#8

This really scares me.

fn maybe_unsound<T>(i: (&'static str | T)) {
    match i {
        i: &'static str => { do_something_static(i) },
        i: T => { println!("Generic type better be here") },
    }
}

If we don’t require that the lifetimes of the TLS are all unified, you get unsoundness. Either this introduces a T: 'static bound implicitly or it has to live no longer than an input lifetime if unbound.


#9

Correct. The thesis of my comment is that “ugly syntax and questionable ergonomics” is not nearly as bad as “violation of least surprise.”

This is a strawman. const generics do not exist yet, and it is not clear that this example is in any way something average code will do. What I wrote, however, is analogous to the following very reasonable stable Rust:

fn foo<T, U>(x: Result<T, U>) {
  match x {
    Ok(..) => println!("T"),
    Err(..) => println!("U"),
  }
}

And no, actually; when I write a match statement to match over enum variants and nothing else, I don’t think about the order, because it is irrelevant. It is surprising that in the Result example order doesn’t matter, but that in the typeset example, it does.


#10

I expect your example to behave exactly like

enum MaybeStatic<T> {
    Static(&'static str),
    Bound(T),
}

with whatever bounds are inferred on T (which appears to be 'a, not 'static). I’m not sure why you seem to think this is unsound… though it might be a trick of the x: T pattern.

(I’ll point out that if you can prove that the Static variant is active, it is safe to upgrade from MaybeStatic<T> to MaybeStatic<!>, which is 'static)


#11

I have to concur here: I initially treated match as requiring disjointness and was surprised the first time when rustc didn’t catch a problem with non-disjoint arms. (That said, non-disjoint arms definitely are “goes in the first bucket”.)

That would be the sound thing to do, but with unification:

fn promote_to_static_str<T>(input: (&'static str | T)) -> &'static str {
    match input {
        input: &'static str => input,
        input: T => "T",
    }
}

fn main() {
    let mut hello = "Hello, world!".to_string();
    let x = promote_to_static_str::<&str>(&*hello);
    hello.clear(); hello.push_str("Die, vermin!");
    println!("{}", x);
    drop(hello);
    drop(x);
}

Would not the two arms unify, thus transmuting a short lifetime to 'static? Lifetimes don’t exist at runtime. You’d have to prevent this somehow if the two types of explicit &'static str and generic T instantiated with &'concrete str unify.

TL;DR unification is hard enough in Rust today when it only has to care about lifetimes and not distinct generic types


#12

I see no reason those two arms should not unify. If you have an expression that goes from &'a T to &'static T without ever uttering unsafe, the compiler must prove that the output is static before unifying with the other arms. If you instead had

match (input: (&'static T | &'a T)) {
  x: &'static T => x,
  x: &'a T => x,
}

I really hope it unifies to &'a T.

I’ll point out that your example can’t borrowck because the lifetime of &*hello ends when x is dropped, but you try to mutably borrow hello in the meantime.


#13

x is &'static str per the return type of promote_to_static. It doesn’t effect hello's use at all.

And this is an example against this behavior, I don’t like it much either. I suppose the generic arm could “steal” the specific, but that’s definitely going to be surprising. (Whereas going to the specific is just regular surprising.)


#14

Oops, you’re right. The code you wrote is still sound though… there’s no way to return the input ephemeral reference as a static reference, so you just matrerialize one via a literal.

All the lifetimes are known at compile-time; I would be shocked if you could use this to escape the borrow checker in a way enums couldn’t, either.


#15

This would also definitely end up with someone wanting to write the following, I think:

fn a<T>(t: T) {
    match t {
         t: String => println!("String"),
         t: u32 => println!("u32"),
         t: T => println!("T"),
    }
}

Which I’m pretty sure we never want to make work? But if you can “steal” generic paths with specific with TLS, why wouldn’t it?

fn a<T>(t: (String | u32 | T)) {
    ...
}

#16

This is ambiguous with or-patterns + type ascription; (v: u32) | u64 => vs. v: (u32 | u64) =>. It can be solved by picking a parse (the latter in all likelihood), but it isn’t obvious and I feel like this is stealing syntax from ascription and or-patterns.

enum { A(X), B(X), C, D(Y) }.

I concur with @drXor; hitherto I’ve seen nothing to convince me that union types (not union in the C sense…) are worth adding to the language since they are substantially different than the sum types that we’ve got. If a structural variant is to be added, I think it should be structural coproduct types (AKA anonymous enums). Structural coproducts are just easier to implement I believe and require small changes to the surface of the language as well as in people’s minds, thereby having a relatively small complexity cost.


#17

Instead of forcing this to ‘just work’ with matching, could it be possible to use existing enums for matching and provide automated conversion to these with as? This seems not so different from enums that have explicit integer representations (repr(u16)) except the other way around. The explicit enum would resolve confusion around variant order as it provides an explicit one. And since unification of match statements would no longer happen, this can also not lead to unexpected errors.

enum MaybeStatic<T> {
    Static(&'static str),
    Bound(T),
}

fn promote_to_static_str<T>(input: (&'static str | T)) -> &'static str {
    match (input as MaybeStatic<T>) {
        MaybeStatic::Static(input) => input,
        MaybeStatic::Bound(_) => "T",
    }
}

The as conversion would require that the enum has a variant Label(Type) for each type in the input anonymous enum or more variants. Conversion in the other way could also work the other way around, the anonymous enum/type-level set should have an entry for each such type.


#18

Have you thought about how this would interact with variadics ?