[Pre-RFC] Inferred Enum Type

Summary

Allow inference of base enum type names via a _::Variant syntax.

Motivation

Matching over enums with many variants and/or with large type names causes lots of repetition in code that could be handled by the compiler.

If this is allowed in match arms, it seems intuitive to extend it to assignments using enums and all other expressions.

Enum variants can already be imported directly, by e.g. use MyEnum::*;, however this can only import at module scope, causing name pollution and possible type name clashes (especially since enum variants are often named very generically).

Another possible approach possible today is use MyEnum as E. However doing this is even more indirect and possibly more confusing than inferring the type name. Importing as a single-letter character also has similar plain-text editor downsides as this RFC, which could be interpreted as having less reason to not do this RFC.

Example

The author recently wrote this code as part of an overhaul of errors to an enum type for the http-types crate:

pub fn associated_status_code(&self) -> Option<StatusCode> {
    match self {
        HeaderError::SpecificityInvalid => Some(StatusCode::BadRequest),
        HeaderError::DateInvalid(_) => Some(StatusCode::BadRequest),
        HeaderError::TransferEncodingUnnegotiable => Some(StatusCode::NotAcceptable),
        HeaderError::TransferEncodingInvalidEncoding(_) => Some(StatusCode::BadRequest),
        HeaderError::TraceContextInvalid(_) => Some(StatusCode::BadRequest),
        HeaderError::ServerTimingInvalid(_) => Some(StatusCode::BadRequest),
        HeaderError::TimingAllowOriginInvalidUrl(_) => Some(StatusCode::BadRequest),
        HeaderError::ForwardedInvalid(_) => Some(StatusCode::BadRequest),
        HeaderError::ContentTypeInvalidMediaType(_) => Some(StatusCode::BadRequest),
        HeaderError::ContentLengthInvalid => Some(StatusCode::BadRequest),
        HeaderError::AcceptInvalidMediaType(_) => Some(StatusCode::BadRequest),
        HeaderError::AcceptUnnegotiable => Some(StatusCode::NotAcceptable),
        HeaderError::AcceptEncodingInvalidEncoding(_) => Some(StatusCode::BadRequest),
        HeaderError::AcceptEncodingUnnegotiable => Some(StatusCode::NotAcceptable),
        HeaderError::ETagInvalid => Some(StatusCode::BadRequest),
        HeaderError::AgeInvalid => Some(StatusCode::BadRequest),
        HeaderError::CacheControlInvalid => Some(StatusCode::BadRequest),
        HeaderError::AuthorizationInvalid(_) => Some(StatusCode::BadRequest),
        HeaderError::WWWAuthenticateInvalid(_) => Some(StatusCode::BadRequest),
        HeaderError::ExpectInvalid => Some(StatusCode::BadRequest),
        _ => None, // Contextually, there are more which end up becoming InternalServerError.
    }
}

This could be re-written as such:

pub fn associated_status_code(&self) -> Option<StatusCode> {
    match self {
        _::SpecificityInvalid => Some(_::BadRequest),
        _::DateInvalid(_) => Some(_::BadRequest),
        _::TransferEncodingUnnegotiable => Some(_::NotAcceptable),
        _::TransferEncodingInvalidEncoding(_) => Some(_::BadRequest),
        _::TraceContextInvalid(_) => Some(_::BadRequest),
        _::ServerTimingInvalid(_) => Some(_::BadRequest),
        _::TimingAllowOriginInvalidUrl(_) => Some(_::BadRequest),
        _::ForwardedInvalid(_) => Some(_::BadRequest),
        _::ContentTypeInvalidMediaType(_) => Some(_::BadRequest),
        _::ContentLengthInvalid => Some(_::BadRequest),
        _::AcceptInvalidMediaType(_) => Some(_::BadRequest),
        _::AcceptUnnegotiable => Some(_::NotAcceptable),
        _::AcceptEncodingInvalidEncoding(_) => Some(_::BadRequest),
        _::AcceptEncodingUnnegotiable => Some(_::NotAcceptable),
        _::ETagInvalid => Some(_::BadRequest),
        _::AgeInvalid => Some(_::BadRequest),
        _::CacheControlInvalid => Some(_::BadRequest),
        _::AuthorizationInvalid(_) => Some(_::BadRequest),
        _::WWWAuthenticateInvalid(_) => Some(_::BadRequest),
        _::ExpectInvalid => Some(_::BadRequest),
        _ => None, // Contextually, there are more which end up becoming InternalServerError.
    }
}

Guide-level explanation

When using an enum type in an expression or match arm, the type name of the enum can be replaced with _ if the type name is already concrete and known from elsewhere.

Examples

enum CompassPoint {
    North,
    South,
    East,
    West
}
let mut direction = CompassPoint::West;
direction = _::East;
match direction {
  _::East => { ... }
  _::West => { ... }
  _ => { ... }
}
matches!(direction, _::South | _::North);
fn function(cp: CompassPoint) {}

let direction = _::West;
function(direction);
function(_::South);

Reference-level explanation

When using an enum type in an expression or match arm, the type name of the enum can be replaced with _ if the type name is already concrete and known from elsewhere.

(TODO: write this part up with more specifics...)

Drawbacks

For users of Rust who do not use an IDE or an editor with a tool such as rust-analyzer, or, for those viewing code hosted online, such as on GitHub, it may be less clear what the type in question is, especially if it originates far away from the code doing the match.

Rationale and alternatives

Enum variants can already be imported directly, by e.g. use MyEnum::*;, however this can only import at module scope, causing name pollution and possible type name clashes (especially since enum variants are often named very generically).

The placeholder _ is chosen as it is already widely used in other places to indicate inferred types (such as in let v = Vec<_> = iter.collect();) and also lifetimes, and does not cause any conflicts in this position.

As an alternative, it may be possible the _ could be dropped and just the leading :: used instead. The [Swift language's similar feature][Swift] uses just the . for this same purpose, with the leading type name omitted.

It is also possible that this feature would be deemed too unclear in enough situations as described in drawbacks.

Prior art

Inference in other contexts

Inference for type names in other contexts is already very widely used in Rust. As example, it is especially common to infer the inner type of the indicated container when using the Iter::collect() function. However, it does not appear in a position elsewhere where it is followed by more specific information (the variant of an enum in this case).

Swift

A nearly identical feature exists in the Swift programming language, as described in its reference. In fact, the short hand seems to be the suggested way of writing match statement statements in Swift.

enum CompassPoint {
    case north
    case south
    case east
    case west
}

var direction = CompassPoint.west
direction = .east // Type is known here so we can omit

Unresolved questions

  • Is this too unclear in plain text editors, especially when viewed on GitHub?

Future possibilities

Should the _ be deemed unnecessary in the future, it could be dropped as an additive feature.


This is resulting from these two rust rfcs issues:

I'm hoping to get this up on rust-lang/rfcs in, let's say, under a week. Please let me know if there any egregious issues with this RFC or if somehow ::Variant would be preferable and easily possible.

18 Likes

From a post I previously made on IRLO, it was generally agreed upon that _ to infer a type was reasonable in patterns, whether that type was a struct or enum. Construction was generally agreed upon as something to be avoided, at least at first.

Here's the thread in question. Doesn't seem like it was almost a year ago, already! You just happened to beat me to writing an actual RFC.

Speaking from experience, try not to set deadlines (even if they don't really mean anything). It's a lot easier to hash things out on IRLO in my experience.

10 Likes

Interesting - I suppose these are two different ways to go about the feature "more gradually" - i.e. either allows all inference of types in matches, or allow enum type inference everywhere.

To be honest I had only been thinking about enums, but I have also written a lot of StructName { } match / de-structuring code. I'd like it in both places but I think it is easier to argue that is should be allowed on enum variants first, since as you can already import enum variants directly, _::Variant is essentially a convenience to avoid module-scope name pollution.

Note that patterns exist in more places than just match. For example, just yesterday I wrote let Self { tcx } = self;. If this were allowed as part of patterns then I could have done let _ { tcx } = self;.

I suggest you take a look at the thread I linked. That's what I was intending to go off of when writing the RFC myself.

More IRLO threads about this:

(There are probably more. It's been discussed many, many times.)

1 Like

Besides everything that has been brought up, you could also try your hand at implementing these purely for error recovery. We already suggest a type if you write fn foo() -> _, doing so for match foo { _::Bar => would be reasonable too in my eyes, and that would let you see what machinery would be needed to actually make this part of the language and would also give the language team an opportunity to see where things would break down.

14 Likes

This is not the case. It's perfectly okay to use MyEnum::*; at function scope.

29 Likes

Please be respectful and constructive.

12 Likes

And even within block scope...

9 Likes

Well, I always get this clippy lint when using glob at function scope. Then I accept the quickfix to automatically replace use MyEnum::*; with use MyEnum::{VariantA, VariantB, VariantC, VariantD, ..}; but the more variants, the less satisyfing the solution of course :\

I think that this example is not motivating enough. For this specific instance, it would be much cleaner to import the entire set of variants (use MyEnum::* at the beginning of the function) and then you can avoid all common prefixes in the match.

I believe better examples would showcase how using _:: can avoid the ambiguity / conflicts possible with use. I had one example in an RFC thread:

struct Fahrenheit(f32);
struct Celcius(f32);

enum Temperature {
    Fahrenheit(Fahrenheit),
    Celcius(Celcius),
}

impl Temperature {
    fn flip(self) -> Self {
        match self {
            Self::Fahrenheit(f) => Self::Celcius(Celcius(todo!("math"))),
            Self::Celcius(f) => Self::Fahrenheit(Fahrenheit(todo!("math"))),
        }
    }
}
8 Likes

The turn of this clippy lint, at least for this line. This is not a good enough reason for a language addition.

I don't like glob imports for the whole file, but when they're scoped they're fine, IMHO.

6 Likes

It's a pedantic lint and allow by default. Not all Clippy lints are meant to be slavishly followed.

9 Likes

I would like to ask @petrochenkov to explain in more detail why they think this feature would lead to the production of write-only code.

I have never needed to write anything like the example that leads off the proposal myself, but I can easily imagine encountering something like it while digging through unfamiliar code trying to pin down a bug, and it doesn't seem to me that either form (repeated HeaderError:: and StatusCode:: or repeated _::) is particularly more readable than the other. In the _:: case, the ergonomic advantage of not having to type HeaderError:: and StatusCode:: over and over again is clear, and when reading the inferred type is easy enough to dope out, but the visual noise level is about the same either way, IMHO.

So I'm guessing @petrochenkov is imagining the proposed feature being used in contexts where the inferred type would not be easy to dope out, and I suggest that the discussion would be aided by some concrete examples of such contexts.

I quite often end up using overlapping enums (for compat with C error codes), such as

enum Whatever {
  NoError = 0,
  SomeError,
...
enum WithOutNoError {
  SomeError = Whatever::SomeError,
...
}

I would never use this _::, and it really isn't a pain to add a use Whatever::*; or use Whatever as W. Consider the case where you have some enum Foo { Bar } and for whatever reason someone adds some enum Salad { Bar }, both a use or explicit Whatever:: will cope nicely with the addition, but _::Bar?

Edit: Even if type inference can somehow cope, it is a convenience only for writers, to the detriment of anyone reading the code.

6 Likes

It is true that reading code happens more than writing code, but during development code gets rewritten a lot. Having facilities to allow people to quickly rewrite code during development is a good thing. Maybe the solution is not to change the language, but to change the refactoring tools, but either way it is not a good idea to completely disregard the pain point of writing code in verbose languages. I dislike not being able to grep for things to understand a new codebase so I would be very much against anything that is entirely implicit, but inference is constrained enough that the hoops you'd need to jump are fewer than I'd be annoyed by.

2 Likes

To me, the awkward thing about adding this is that it could easily lead to needing to change the conventions for how variants are named.

Something like make_webrequest(CookieHandling::Skip) is perfectly readable, but make_webrequest(_::Skip) is pretty confusing. So because the latter syntax doesn't exist, a variant name of Skip is reasonable today, but might not be good later.

But if people end up wanting to do _::SkipCookies, that's the kind of repetitive naming (CookieHandling::SkipCookies) that having scoped enums is supposed to avoid in the first place.

That said, there are some contexts where something else helps make it clear. Like if it was RequestOptions { cookie_handling: _::Skip, .. }, that seems completely fine, and less annoying than today's RequestOptions { cookie_handling: CookieHandling::Skip, .. }.

I don't know, though, whether trying to enforce such things in the grammar would be good. Maybe there's some nice simple rules that could be errors if not followed. Maybe it should be left up to conventions, with clippy lints to push people towards best practices. I'd definitely like to see some starting point proposals in both directions to compare and maybe mix together.

22 Likes

I want to reply to this in particular, because my take at least is that the current solutions work fine for me, but for some reason or another people proposing this either don't like it or don't know about it. I'm not sure what to do about it since in my initial response I mentioned them!

here is how I would write the OP's case (being averse to glob imports):

pub fn associated_status_code(&self) -> Option<StatusCode> {
    use HeaderError as HE;
    use StatusCode as SC;
    match self {
        HE::SpecificityInvalid => Some(SC::BadRequest),

It seems like in this thread (above) there is some confusion regarding when use is allowed etc. I don't know if some combination of documentation, or if editor tooling could recommend collapsing redundant enum verbosity into such a use as an assist.

Anyhow my opinion is that the language has ways to avoid this pain but I honestly don't understand why people don't seem to use it.

8 Likes

I think having this option available doesn't force people to use it. If you want to write CookingHandling::Skip to be more self-documenting, you can. If you have several enums that largely exists for self-documenting purposes (e.g. enum Something { No, Yes }), you could write some_func(_::No, _::Yes, _::Yes), but that would defeat the purpose of having made the self-documenting enums in the first place, so presumably you won't want to do that. If you wanted that, you could just as easily have made that function accept bools, and written some_func(false, true, true).

I don't think that's an argument against this feature; I think that's an argument against using it everywhere that it's possible to use.

6 Likes

Spitballing warning: Nothing in here is to the point where I'd actually propose it seriously, but hopefully something might trigger better ideas in others.

That snippet quoted above makes me think that it would be possible to allow this:

RequestOptions { _: CookieHandling::Skip, .. }

Now, there's some obvious problems -- like Foo { _: 4, _: 5 } can't be allowed -- but in some ways it's interesting.

Any restrictions it would have are basically the same ones that already exist in trait solving. You could implement it builder-style with

impl SetField<CookieHandling> for RequestOptions {
    fn set(self, ch: CookieHandling) -> Self { self.cookie_handling = ch; self }
}

And then you'd be able to do

RequestOptions::default()
    .set(CookieHandling::Skip)
    .set(RedirectHandling::Follow)

which would have the same inference behaviour restrictions and readability problems (you probably don't want .set(4)) as the RequestOptions { _: CookieHandling::Skip, _: RedirectHandling::Follow, .. } idea.

So, I don't really know where I'm going with this, but it's another thing that one could imagine using _, and maybe exploring what's good or bad about it could help elaborate what's good or bad about other uses of _ too.

(I guess you could also imagine this as like an "unordered tuple structs" feature? Which would be product types without order or field names, and thus the only way to get things out of them would be type-annotated patterns.)

3 Likes