Bring enum variants in scope for patterns

The implementation in rustc of the two more discussed proposals is doable. There are details that would need to be hashed out, particularly for the bare variant version around order of evaluation (imo checking for variants last is the most reasonable, non-breaking change) but while thinking about this made me realize that this introduced a very bad effect: adding a new const or a new variant can accidentally and without warning change the meaning of a match arm guard from one to the other. This is a kind of effect at a distance that is uncommon in most of the language. This problem is mitigated by naming conventions and the high likelihood of type errors in the match arm, but the issue tracker has evidence of this already being problematic for the understanding of cases involving struct names, consts and function names. It would also introduce more asymmetry between what can be written in pattern context and expr context. All if these can be worked around with extra diagnostics but make me weary, particularly thinking about the failure modes of the nested enums case. I can relate to the desire, being in a position of wanting this very often, and really enjoying Self when it is available, but at the same time this can be handled at the tooling side, by using block editing commands, multiple cursors, making rustc suggestions more appropriate and machine applicable and having rust-analyzer provide autocompletion. This leaves the language slightly harder to write than otherwise would be in a plain editor, but readable and understandable with fewer special cases newcomers have to learn.

14 Likes

Others have brought this up in the thread, but to clarify, this only applies to the proposal of eliding the :: prefix (like is done in the OP), right? I don't think this applies to using _::.

2 Likes

I believe that's correct when the interpretation of _:: is limited to the immediate Self::. However, some proposals in this thread apply _:: hierarchically in nested matches to also infer outer Self::s, where @ekuber's concern might be warranted.

Even if _:: can infer other enum types, adding a new enum variant would never change the meaning of an existing match. If I write _::Foo, that should mean "the Foo variant of some enum in scope", but it should also take into account the enum whose type you're matching against that part of the pattern.

(It could, theoretically, require a completely unique variant name and error on any ambiguity. But I don't see why it should do so.)

It's not fully without warnings, I think you will trigger a "nonstandard style" warning when initially writing that code or when adding the new variant. In fact, that is already the only line of defense we have to protect us against typos like

enum E { Var1, Var2 }

fn foo(e: E) {
  use E::*;
  match e {
    Var1 => ...
    Val2 => ... // note: Val2 vs Var2
  }
}
1 Like

_:: if done properly has nothing to do with Self, match blocks in particular (as opposed to any other context where type inference can happen), or even scope. It should be solely based on type inference. If the type is inferred as some_crate::Foo, _::A should be resolved as Foo::A, regardless of how it got inferred, and regardless of whether or not you have used some_crate::Foo.

4 Likes

I forgot to respond to this.

Personally, I use QWERTY with a freehand keyboard style where keys don't have a fixed finger assignment, as I believe it's faster that way. In particular, it's almost always slower to use the same finger for two different keys being pressed sequentially: you have to finish pulling away from the first key, then move your finger to the next key, before you can even start pressing down on the next key. If you use different fingers, you can overlap the keypresses: you don't even have to be finished pressing down the first key before starting to press down the next key.

(On the other hand, if I need to press the same key twice in a row, I have no choice but to use the same finger, which is another disadvantage of any sigil containing repeated characters such as ::.)

In the case of _::, I would probably reach my ring finger up to _ and use my middle finger for :, but that's slightly uncomfortable. It also pulls my right pinky away from right-shift, which would take a bit of getting used to, although admittedly I shouldn't need right-shift when my left pinky is already on left-shift. An alternative is to do it one-handed with my right hand with middle finger on _, thumb on :, and pinky on shift. This is a bit more comfortable and more convenient in some cases, but it leaves my hand well out of position.

At the end of the day I guess I can create an editor shortcut and move on, but hopefully that explains why I don't like typing _:: . (As I've said, I also don't like how it looks.)

1 Like

I think aesthetical concerns should be left out in this discussion, because

  • they are subjective
  • everything unfamiliar looks weird at first, until you get used to it.

I'd like to summarize some of the pros and cons. The options are

  1. don't change anything (write use Foo::* to bring variants into scope)
  2. automatically bring variants of the correct enum into scope
  3. Allow substituting the enum name with _

If you disagree with something on this list, I can edit this comment.

Convenience

With proper IDE support, all three options are equally convenient to write. Outside of an IDE, option 2 wins, but option 3 isn't significantly worse. Since I'm German, I don't know how hard it is to type _:: on an American keyboard, but it can't be worse than typing SomeIdentifier::.

Readability

I'd argue that option 1 is the most readable since the types are visible; if we choose option 2 or 3, an IDE could provide type hints.

If types are repeated often in the patterns or are unimportant for understanding the code, less information can be better. With option 2 or 3, the programmer can choose when types should be explicit and when they don't need to be (as is the case with let bindings).

The main downside is that the programmer might be too lazy to specify the type explicitly, even if it would help readability. Therefore, some people might want a clippy lint that can be enabled to forbid inferred types in patterns.

One advantage of option 3 is that enum patterns can be easily distinguished from struct patterns and variable bindings.

Teachability

Option 1 wins here obviously.

Some programmers would find option 2 confusing, because the name of enums can be omitted in patterns, but not in expressions.

Option 3 has the same problem, but also suffers from the inconsistency that _ works for enums, but not for structs or unions — unless we want to also allow match .. { _ { .. } => () }. However, option 3 has the advantage that it's obvious where type inference takes place.

Diagnostics

I believe that option 2 would have a negative effect on diagnostics, because the same syntax, SomeIdent => .. can have different meanings. It can refer to an enum variant that is brought into scope automatically (unless type inference fails!). It can also refer to an enum variant or struct that is already in scope, or it can be a variable binding. So the compiler would have to do the following:

  • if a type with the name SomeIdent is in scope, use that type
  • otherwise, try to infer the type of the matched expression
    • if the type is an enum, bring its variants into scope
    • otherwise, assume that SomeIdent is a binding
    • if type inference for the expression failed, emit a type annotation needed error

Not only is this difficult to explain, it can also cause problems. For example, adding or removing an import can alter the program's behavior without causing a warning, and misspelling something might cause an incomprehensible compiler error message.

Compatibility

Option 2 is not backwards compatible, if people use lowercase enum variants or uppercase variable names, but the amount of breakage might be acceptable.

Syntax

The syntax of option 3 is more "Rust-y" than of option 2, because it can be easily explained in terms of type inference. The _ as a type placeholder is already used in other contexts, whereas option 2 is a completely new concept.

11 Likes

I would rather:

match use x {
    Foo => Bar,
    Baz(Qux) => Quux,
}

Since use is already known for bringing items into scope, in this case it brings enum into scope. Or probably use match, I didn't think about the order.

For typeability, I think _:: works bad on both QWERTY and DVORAK keyboard layout that I know.

But if we add this, we need to think about feature parity with if let, if using _:: would make it if let _::A(x) = data.

Can we consider copying Swift verbatim?

  • The syntax is very terse, yet unambiguous.

  • Users may be familiar with this syntax from Swift. C99 and C++20 use .field in initializers, so from C perspective it's not entirely foreign-looking syntax either.

  • This syntax has been successful in Swift, and is widely used there.

  • It's very easy to type. . is probably the most common symbol on keyboards.

AFAIK .Ident syntax isn't used for anything in Rust, so this could be generally applicable in all places, not just match arms. The only downside I can think of that it's not "rustic" due to _:: being a more logical extension of Rust's existing syntax. Enums in Rust are important enough that it may be worth giving them their own "sigil".

5 Likes

I have an alternative suggestion for solving this problem - teach people how to use column editing in their editor.

column-editing

With this power complexity of writing

MyEnum::VariantA
MyEnum::VariantB
MyEnum::VariantC
MyEnum::VariantD

and

Self::VariantA
Self::VariantB
Self::VariantC
Self::VariantD

and

_::VariantA
_::VariantB
_::VariantC
_::VariantD

becomes identical.

As a result you can easily start preferring the first variant because it's the only variant that is not write-only.

3 Likes

The complexity of switching between them becomes identical, yes, but not of initially writing the arms.

Unless you normally write match arms by hitting enter the number of times there are variants, duplicating your cursor to each line, fixing indentation (if necessary), and then writing Enum:: => todo!(), on every line with the multiple cursors and going back and filling in the variant names?

At that point, it'd be more convenient to write Enum:: => todo!(), once (without duplicated cursors) and duplicate that line enough times (which everyone knows how to do a basic version of, with copy/paste).

Or if you're suggesting writing all of the arms with Variant first, and then adding the Enum:: to each of them later; I suppose that could work if every arm is a single line expression (such as todo!()), but it breaks when an arm goes to block style.

And it ignores the practical fact that I typically don't write all of the arms on a single cargo check cycle, anyway. I'll write a couple arms out completely, have a temporary _ => todo!() arm, and cargo check the existing arms. Multi-editing doesn't help when the different arms are written in different check cycles.

And in any case, there are many cases where the enum type name is noise (and you don't really want the variants in scope widely). Consider matching a syn::Expr (using the example straight from the doc page):

let expr: Expr = /* ... */;
match expr {
    Expr::MethodCall(expr) => {
        /* ... */
    }
    Expr::Cast(expr) => {
        /* ... */
    }
    Expr::If(expr) => {
        /* ... */
    }

    /* ... */

or

let expr: Expr = /* ... */;
match expr {
    _::MethodCall(expr) => {
        /* ... */
    }
    _::Cast(expr) => {
        /* ... */
    }
    _::If(expr) => {
        /* ... */
    }

    /* ... */

The enum name there (whether Expr, E, or _) is just noise and aids nothing for readability.

1 Like

When type ascription is stabilized, it could even be

match some_expression(): Expr {
    ...
}
1 Like

In Swift this syntax reads intuitively because it either has corresponding style for naming enum variants (camelCase) and corresponding path separator (.), so in Rust this will look misleading. But if we really want something implicit without _:: I would rather vote for small unambiguous keyword like ref/mut e.g.:

match ev {
    on Next(x) => ...,
    on Err(_) => ...,
    on Completed => ... 
}

and with your example:

let is_timeout = match error {
    on FailedRequest(on ConnectionFailed(on Timeout))) => true,
}

I don't think the naming style has much to do with it. But indeed, to clarify for anyone reading: in Swift the explicit version would be MyEnum.myVariant, so .myVariant is an intuitive shorthand. Not so in Rust where the explicit version is MyEnum::MyVariant.

To summarize the way C/C++ use .field, this:

MyStruct s = {
    .foo = 123,
    .bar = 456,
};

is roughly equivalent to this:

MyStruct s;
s.foo = 123;
s.bar = 456;

Unlike in Swift, it's not just a shorthand, since the first version can be used in more contexts and has a slightly different effect. But the resemblance between the two is certainly intentional. I wouldn't count that as precedent for using . to abbreviate Foo::.

That is why I favor the syntax :Ident (as shorthand for Foo::Ident). It's as terse as .Ident and almost as easy to type, but it does a somewhat better job of hinting what it's short for (though not as good a job as _::).

It also resembles the syntax for Ruby symbols, which is arguably a plus, given that Ruby uses symbols for many of the use cases where Rust uses enums. Though it can also be a minus, as newcomers might guess that :Foo is a true symbols feature rather than 'just' gloss for enums.

4 Likes

Error out when a match arm can be ambiguously interpreted either as a const or a variant?

So far, while there is the possibility of minor breaking changes based on additions changing/breaking type inference, Rust still has a "fully elaborated" form that is resilient to all of them (basically, fully qualified <Type as Trait> UFCS everywhere).

If we introduce new places where minor breaking changes can occur due to overlapping names, there needs to be a fully elaborated form that isn't vulnerable to breakage.

So "just" erroring on a name that is both a variant and a constant isn't enough; we'd also need a fully elaborated (e.g. const X) form that isn't prone to breakage.

Plus, it's the named binding case that's more potentially problematic (though that one does have a fully elaborated form, x @ _). That isn't a conflict between two names in scope, so it's a lot more difficult (impossible?) to diagnose.

1 Like

That would be inconsistent with the current behavior in similar cases elsewhere and still be a breaking change as existing code could suddenly start erroring.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.