Pre-RFC: sum-enums

enum(…) is not a real type, because if it was a real type then order and repetition would matter. but it doesn’t, so it can’t be a real type. best way to go about this is to add it to where. this is only necessary for generic enum(…), tho, and concrete enum(…) can be used as a real type freely.

you can, additionally, add restrictions to the enum:

fn foo<A, B>() where enum(A, B): Bar {
}

I would like to minimize the number of fake types as much as possible. And since the other proposal and this pre-rfc doesn’t have any, i don’t think it is neccessary.

1 Like

(side-note: do we even have any fake types currently? I’m not talking about special types like PhantomData, I mean actual, types-that-can’t-quite-be-used-in-generics.)

I don’t think so, and if not I would like to keep it that way.

1 Like

so let’s not add sum-enums, because the best way to add them is to add this weird “sum-enums must be bounds” thing.

No, it's not the only way, and I've explained it in the proposal.

3 Likes

it’s not the only but it’s the most convenient

Very arguable statement, personally I think the approach described in the proposal is the most logical, non-intrusive and easiest to explain/understand.

Oh. That wasn’t there before.

Nope, it was there from the beginning. :wink:

it wasn’t there before the concerns about generics were mentioning. I remember that.

Please, just look at the OP history…

Can you provide an example where this can be a problem? In my understanding you will not have many chances to use _ inside enum, as compiler will refuse to compile such code.

Can you point on there this guarantee is formulated? Also note, that you can get &A from &enum(A, B) by simply shifting over tag. Overall I don't quite get your concerns from the practical perspective.

As I've noted in the OP and several times in the discussion thread I am aware about it, and I am not insisting on keeping this name, which was explicitly called "temporary".

@Centril

A question about possibly using A | B | C syntax instead of enum(A, B, C), how it will interact with | used in patterns with type ascription? For example in cases like this: CONST1: u32 | CONST2: u32. I guess compiler should be able to make decision based on that goes next after |, but I hope someone more knowledgeable will answer here.

Sorry, the underscores were meant to be inference variables. I.e. enum(Vec<?0> | ?1). This could come from anywhere as long as that is the most the compiler can figure out about a type at some point in the function body. I can contrive a better example once I'm not on my phone.

If x is assignable as A and B, you perform unification to form a type C such that A <: C and B <: C. This is the basis of the Rust type checker, as @ExpHP explains. For example, references to the same type unify to a reference with a shorter lifetime. If you unify A and enum(A, B), you expect their union, i.e. enum(A, B). Any other unification result violates the principle of least surprise.

You cannot. Consider:

let x = &0
let p: &enum(i32, u32) = x; // coersion

What memory does p point to? Note that x is a pointer into rodata.

As an aside, I don't understand the insistence on ascription syntax. Rust is not a language with runtime type information, which is what this syntax makes me expect.

1 Like

@mcy

A and enum(A, B) have a different memory layout, so of course you can't coerce A into enum(A, B). It's expected. I don't see any practical implications here.

They could've been coercible if each variable will carry TypeId tag around, which of course we don't want.

Ascription syntax in patterns was proposed independently of this proposal and it helps to make code much clearer in some cases. I just utilize it for the most ergonomic (in my opinion) matching over the union, and I think it fits really nicely.

In cases if there is multiple interpretations possible compiler should reject the code. And I think we can go even further and forbid _ inside enums(..) completely, it should not hinder usefulness of the feature in any substantial way.

As I was trying to allude to, the _ can come from any instantiation of a generic function, type, or trait impl. So to clarify, you mean that any type parameter in an enum should be forbidden?

impl<A> Trait for enum(A) { ... } // forbidden
impl<A, B> Trait for enum(A, B) { ... } // forbidden
impl<A> Trait for enum(A, i32) { ... } // forbidden
impl<A> Trait for enum(Vec<A>, i32) { ... } // forbidden

// similarly
fn function(x: enum(A, i32)) { ... } // forbidden
fn function<A>(x: enum(Vec<A>, i32)) { ... } // forbidden

or any bare type parameter?

impl<A> Trait for enum(A) { ... } // forbidden
impl<A, B> Trait for enum(A, B) { ... } // forbidden
impl<A> Trait for enum(A, i32) { ... } // forbidden
impl<A> Trait for enum(Vec<A>, i32) { ... } // ok

// similarly
fn function<A>(x: enum(A, i32)) { ... } // forbidden
fn function<A>(x: enum(Vec<A>, i32)) { ... } // ok

In either case, I'm not sure why you would bother even having the enum(enum(A)) = enum(A) rule.


My gut instinct, which I'm currently trying to work out an example for, is that an idempotent type constructor like enum (i.e. one that satisfies enum(enum(A)) = enum(A) is incompatible with type inference, unless you do something outlandish like forbidding the above examples.

If you have:

match $expr {
    A : B | C
}

then it can either be parsed as:

match $expr {
    (A : B) | C
}

or:

match $expr {
    A : (B | C)
}
1 Like

No, I meant forbidding only type parameters inferred via _. Of course we would like to use enum(..) in generic contexts. Can you provide an example which can potentially cause problems except the one covered in OP (i.e. when two match arms after monomorphization will match on the same type).

  • It's logical that A ∪ B ∪ A = A ∪ B, A ∪ ∅ = A and ((A)) = (A) = A. It will be especially expected if we'll change syntax to A | B.
  • Optimization
  • I think it makes matching nicer, e.g. you can match directly on A without traversing the sum-enums tree.

@Centril

Yes, I understand this, this is why I asked the question. So will be parser able to process such cases assuming it knows if C is a constant or a type? Or is this information will be unavailable at that moment?

The parser is not aware of such semantic information at that moment. To resolve the ambiguity you simply pick a parse; you assume that it means A : (B | C) and let the user disambiguate if they want the other interpretation.

1 Like