[Pre-RFC] Inferred Enum Type

trentj · February 12, 2022, 11:00am

I'd like to find out exactly how this feature in pattern position interacts with pattern ergonomics. In the "motivating" example, self (the scrutinee of the match) is &HeaderError, but the rhetorical compiler apparently has no difficulty resolving _ to HeaderError (no &). Am I to infer that _ can only be resolved to a "sufficiently deref'd version" of the scrutinee expression? What direction does the arrow of inference point here - can we determine from the scrutinee (without looking at the pattern itself) what _ must mean, or is there some kind of try-try-again algorithm (like we already have for . resolution, except that wouldn't work here, because patterns)? Does _:: feature as a unique syntactic element in that _ can only be resolved specifically to an enum (seems to be what people are assuming, but the syntax is misleading), or are associated constants also on the table, and if both are possible, how's that resolved?

re: social pressure

When pattern ergonomics made ref partially redundant, there emerged quite a bit of social pressure to avoid ref patterns entirely, despite the fact that it still works fine, and is occasionally still more readable than matching on a borrow. But because pattern ergonomics was the new thing around, ref became, unnecessarily, a marker for "old" code. I feel that introducing _:: will make use Enum::*; (which is perfectly fine as long as you don't give undue weight to overzealous clippy lints) a similar example of dated code, despite the fact that it's arguably still better than the "new" thing in many cases.

Despite the stated motivation being to reduce repetitive code, the "motivating example" is actually longer (and, in my opinion, noisier) using this rhetorical feature than it would be if the author had simply added a use HeaderError::*; which works in Rust since 1.0. So, even if I accepted "it makes code shorter/prettier" as sufficient justification for adding a language feature (I don't), this one doesn't seem to meet even that relaxed criterion.

Despite some mention of inference farther up in the conversation, I'm not able to picture a non-contrived example of where you'd want the slight additional expressiveness of smart inference in _::, and the OP's motivating example doesn't take advantage of that, either. In particular, it seems strange for a module to expose an enum's variants without exposing the enum itself - that it's supported for struct fields is more accident than design, and "we already do that for struct fields!" doesn't inherently motivate it for enum variants. Anyway, if we're to solve that problem, I think I'd rather have a more general typeof mechanism, which would be far more practical and relevant in many more situations than just matching on unnameable enums.

In short, I don't think this suggested addition pulls its weight (but I'm open to being convinced by arguments that don't center wholly on code brevity).

CAD97 · February 12, 2022, 1:59pm

It's much more obvious with deeper matches, such as with sea-of-enum style trees, e.g.:

match item {
    _::Fn(_ {
        sig: _ { ident, generics, inputs, output, .. },
        ..
    }) => todo!(),
    _ => bail!(),
}

You don't need to know the types here. (This specific example only does one enum, but that's just the specific path I chose for this example. There are other, more enum heavy paths to take.) In fact, in the advised usage pattern you don't see the names of any of the struct types, and the enums are special in that you see their type by virtue of how match works.

If I'm being honest, I see myself using this more for struct types than enum types, if it works for any well known type in pattern position. And there, you'd've probably not seen the type name in the first place; I'd've potentially just bound the whole object and used field access rather than destructure it, even if destructuring makes sense, due to the extra salt of importing and including the name in the pattern.

(Even though this could work without having visibility of the type name, it should probably require the type name to be sufficiently pub. The expression position ones also worry me a bit about how they'd impact priv-in-pub patterns, as well, where a type is pub to acknowledge the lint but not actually exported.)

trentj · February 12, 2022, 2:30pm

Sure, I see that you could do it, but your example is only made shorter, not more abstract/expressive, by omitting the names of the types. I don't see an advantage in expressiveness of _ {..} over simply using the name of the struct. Unless, of course, the struct name is not visible in this context, which is a thing you can do, but doesn't seem like a useful technique or something to encourage people to do.

In other words, in what context does it make sense to destructure a struct or enum where you can't name it? Maybe for macros? I think a more general typeof mechanism is still the superior solution though.

jkugelman · February 12, 2022, 4:39pm

Boy, I don't know. I find all those _'s quite an eyesore. Some redundancy is good for readability.

Let's say I want to decipher sig's type. It's elided on the right, so I guess I'll look up to see what struct it's contained in. Oops, the struct name is elided as well. Okay, well to figure out what struct this is I need to know what Fn is, but the enum name is elided. Darn it again. I guess I have to go see what item is, then work my way back down the type hierarchy once I do? That's a lot of effort just to figure out sig.

Natural language has a fair amount of redundancy -- for good reason. If you miss a word you can almost always figure it out from context. We shouldn't go too far trying to eliminate redundancy from Rust.

Nemo157 · February 12, 2022, 6:29pm

Without even knowing the exact context of that code, I can make a well-educated guess at the three _ being Item, Fn and FnSig respectively (or very similar synonyms to these if I've forgotten the exact names). IMO there are cases like this where the type tree is so commonly used in certain types of projects that being able to elide this redundancy is worthwhile, the enhanced readability for developers experienced in the context outweighs the reduced readability for newer developers (and one of the most prominent feature of IDE integrations is the ability to view elided/inferred types).

josh · February 12, 2022, 7:27pm

FWIW, I have a lot of sympathy for this position.

While I do like the idea of having _ inference available in a few places, I would only want to add it if 1) it's perceived as a genuine improvement to both writing and reading code in the places it's available, not just an abbreviation people can live with, and 2) we take this kind of social pressure into account, and make sure that the places where it's used are benefits and the places where it isn't a benefit discourage it (either via lint or by making it not available in that context, depending on what makes sense).

I would not want to end up in a situation where a feature is simultaneously not a benefit and incurs social pressure to use it anyway.

scottmcm · February 12, 2022, 7:40pm

To argue the opposite side from my previous post, this does help the "but I want to grep for HttpMethod" scenario a bit. It still doesn't let them find the actual uses, especially if it's in a crate-local prelude-style module, but even then they could delete it and get compiler errors at the uses.

I agree, but to me that's the argument for why parameters and return types need to have their type specified exactly. They're quite often redundant -- whole-program inference SML-style demonstrates that quite clearly.

Inference is "figure it out from context", by definition.

If it's not clear enough to the reader, then that says it should be split into more functions or otherwise add more type annotations. But that's already the case today. Especially if, as CAD97 mentioned, you just use field access. Since you can use all the fields and call all the methods you want without ever needing to annotate a type anywhere.

So if let a = foo().a; is fine -- which we have to assume it is because it's allowed today and people do it all the time without complaint -- why wouldn't let .{ a, .. } = foo(); also be fine?

(Aside: I'm arguing a little bit on both sides in this thread. My goal is to try to tease out whatever the differences are between things proposed here and things that are already accepted. I find that's the best ways to make progress on something that has lots of gut reactions -- we can at least make progress on agreeing on the distinctions, even if we get to different conclusions from weighing those distinctions differently.)

Agreed! I'll add that it's often much easier to find things in rustdoc anyway -- especially if you're not in one of those IDEs that gives type hints.

I wrote some code the other day that was basically this, following rustfmt:

match &data.terminator().kind {
    TerminatorKind::SwitchInt {
        discr: Operand::Constant(constant),
        switch_ty,
        targets,
    } =>

I'd be quite happy to save the vertical space to get to

match &data.terminator().kind {
    .SwitchInt { discr: .Constant(constant), switch_ty, targets } =>

Because if I'm familiar with the area, terminator().kind is plenty for me to know that "yup, it's a TerminatorKind".

And if I'm not familiar, I can ask rustdoc and it takes me right there: https://doc.rust-lang.org/nightly/nightly-rustc/?search="SwitchInt"

At least, that's what I do to find for the other places that are already using uses. For example, here was the definition of that method:

    fn reachable_blocks_in_mono_from(
        &self,
        tcx: TyCtxt<'tcx>,
        instance: Instance<'tcx>,
        set: &mut BitSet<BasicBlock>,
        bb: BasicBlock,
    ) {

Where do those come from? I dunno, I'll ask rustdoc. I'd probably want to go there to know what to do with the type anyway.

And I have no interest in looking through the uses. They look like this:

use crate::mir::coverage::{CodeRegion, CoverageKind};
use crate::mir::interpret::{Allocation, ConstValue, GlobalAlloc, Scalar};
use crate::mir::visit::MirVisitable;
use crate::ty::adjustment::PointerCast;
use crate::ty::codec::{TyDecoder, TyEncoder};
use crate::ty::fold::{TypeFoldable, TypeFolder, TypeVisitor};
use crate::ty::print::{FmtPrinter, Printer};
use crate::ty::subst::{Subst, SubstsRef};
use crate::ty::{self, List, Ty, TyCtxt};
use crate::ty::{AdtDef, Instance, InstanceDef, Region, ScalarInt, UserTypeAnnotationIndex};
use rustc_hir::def::{CtorKind, Namespace};
use rustc_hir::def_id::{DefId, CRATE_DEF_INDEX};
use rustc_hir::{self, GeneratorKind};
use rustc_hir::{self as hir, HirId};
use rustc_target::abi::{Size, VariantIdx};

use polonius_engine::Atom;
pub use rustc_ast::Mutability;
use rustc_data_structures::fx::FxHashSet;
use rustc_data_structures::graph::dominators::{dominators, Dominators};
use rustc_data_structures::graph::{self, GraphSuccessors};
use rustc_index::bit_set::{BitMatrix, BitSet};
use rustc_index::vec::{Idx, IndexVec};
use rustc_serialize::{Decodable, Encodable};
use rustc_span::symbol::Symbol;
use rustc_span::{Span, DUMMY_SP};
use rustc_target::asm::InlineAsmRegOrRegClass;
use std::borrow::Cow;
use std::convert::TryInto;
use std::fmt::{self, Debug, Display, Formatter, Write};
use std::ops::{ControlFlow, Index, IndexMut};
use std::slice;
use std::{iter, mem, option};

use self::graph_cyclic_cache::GraphIsCyclicCache;
use self::predecessors::{PredecessorCache, Predecessors};
pub use self::query::*;

Most of which I didn't add.

There's just so many that they're basically useless to me as a human.

And, oh look, one of them is a * anyway:

github.com

rust-lang/rust/blob/5d8767cb229b097fedb1dd4bd9420d463c37774f/compiler/rustc_middle/src/mir/mod.rs#L47


      
          
          use std::borrow::Cow;
          use std::convert::TryInto;
          use std::fmt::{self, Debug, Display, Formatter, Write};
          use std::ops::{ControlFlow, Index, IndexMut};
          use std::slice;
          use std::{iter, mem, option};
          
          use self::graph_cyclic_cache::GraphIsCyclicCache;
          use self::predecessors::{PredecessorCache, Predecessors};
          pub use self::query::*;
          
          pub mod coverage;
          mod generic_graph;
          pub mod generic_graphviz;
          mod graph_cyclic_cache;
          pub mod graphviz;
          pub mod interpret;
          pub mod mono;
          pub mod patch;
          mod predecessors;

Ltrlg · February 12, 2022, 8:26pm

If they are willing to edit files and launch a compiler instead of only using grep, maybe rust-analyzer’s “find all references” should be on the table as well, which I expect to find _:: if this happens.

workingjubilee · February 12, 2022, 10:11pm

I simply don't use rust-analyzer because even my best efforts find it challenged at addressing the code I write, which is often code pushing the leading edge of the compiler abilities, which do not have support in r-a, and I increasingly am reducing my usage time of the text editor it has the best support for (VS Code). It also often actually delivers overall worse diagnostics due to the fact that no text editor I am aware of fully supports the features that would be required to actually deliver good ones.

CAD97 · February 13, 2022, 1:10am

Ok, so direct comparison. What's different that you don't like in the proposed

match item {
    _::Fn(_ {
        sig: _ { ident, generics, inputs, output, .. },
        ..
    }) => todo!(),
    _ => bail!(),
}

where today I could write the following—with the exact same amount of type info (if not less)—and nobody would complain:

if let Some(item) = item.as_fn() {
    let ident = &item.sig.ident;
    let generics = &item.sig.generics;
    let inputs = &item.sig.inputs;
    let output = &item.sig.output;
    todo!()
} else {
    bail!()
}

Aloso · February 13, 2022, 1:27am

I argue that we shouldn't give this argument too much weight. grep is already often inadequate for finding uses of items, especially types. There are many limitations:

Items can be renamed on imports
Grep may find a lot of false positives, which adds noise to the search results:
- The same name may be used for different things
- The name may appear in strings and comments
Because of type inference, the type of bindings is often not visible to grep

While grepping can work well for fields or enum variants, it is quite unreliable for types. This is not necessarily a problem though, because IDEs can offer more reliable ways to search for usages of a type.

The bigger problem is that humans don't see the type. However, this doesn't seem to be much of a problem in practice, at least in Java, where you can omit the enum type in switch statements.

jkugelman · February 13, 2022, 1:46am

It's a fair question. It's foremost a gut-level, aesthetic reaction. I can try to justify it but to be clear I didn't reason my way into my opinion.

I think part of it is that every time I see _ it's like seeing a foreign word in a piece of text. It makes me tap the brakes. I have to stop and think, "What does that word mean?" Imagine the first snippet had four question marks. It'd be pretty distracting. _ feels like that. It's supposed to be this unobtrusive nothing-symbol, but it actually draws attention to itself because it's not a normal alphanumeric identifier.

Another part is that it reminds me of Perl and its overuse of sigils. Rust is quite good about not being too symbol heavy. We have enough ::<>s and |_|s and @s in the language as it is.

If this feature were to be added, I actually prefer @scottmcm's .Variant syntax. .Item is easier on the eyes than _::Item.

scottmcm · February 13, 2022, 6:23am

I think I first mentioned this syntax in Auto infer namespaces on struct and enum instantiations - #6 by scottmcm, but all I did is shamelessly steal it from Swift https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html#ID147.

iago-lito · February 13, 2022, 7:54am

The most compelling argument I've seen against this feature is that it may make code less greppable, since the enum name will appear in fewer places. And I do think that's an important argument to balance.

I would object that the problem with the feature making the code "less greppable" is not that the EnumName becomes _. It's that we use the lexical grep search in the first place. When searching for every occurence of EnumName, I would expect the best/recommended solution to be: use some project-level semantic search instead. Why not some rust-analyzer feature, to make both occurences of EnumName and relevant _ show up

iago-lito · February 13, 2022, 8:06am

Well, I understand why any automated code analyzer tool would be challenged by the code you are writing then. But the situation you describe seems the very kind of situation you would like to avoid writing _::Variant for the very purpose of remaining able to grep the enum name lexically. This is not an argument in favour of not making this option available to other "regular" types of code, right?

workingjubilee · February 13, 2022, 7:49pm

And yet my code, no matter how bleeding edge it is, has to interface with a large project, most of which is written in a more conventional Rust style: the Rust compiler. And I have to interface with what I would frankly say is a random sequence of modules in the compiler and standard library, each time. And if the tool fails on the combined set of my code and that, then the tool fails entirely, for my purposes. So I take my lack of tooling into a more conventional project, yet retain an inability to manage any lexical peculiarities that are justified primarily by some other tooling being used to prop it up.

kornel · February 14, 2022, 12:28pm

I must admit, it is an eye-sore. How about allowing omitted type names only in nested structs in patterns:

match item {
    Item::Fn({
        sig: { ident, generics, inputs, output, .. },
        ..
    }) => todo!(),
    _ => bail!(),
}

Looking at it from perspective from C/C++ syntax, it's similar to aggregate initializers' syntax.

ekuber · February 14, 2022, 7:15pm

The parser could unambiguously recognize { ident : as the start of an anonymous struct... if type ascription wasn't a thing on nightly.

wiogit · February 14, 2022, 7:58pm

Sometimes I want to find usages of a struct or enum by a crate. I don't know if rustdocs has a feature for this, so what I've done is visit search in the git repository on GitHub. This is pretty helpful if the crate documentation is not clear on some details. This could be a reason for people wanting grep to work.

I consider matching an enum is analogous to destructuring a struct, which also requires the type name.

let foo = Bar { x: 10 };
...
// Explicit version
let Bar { x } = foo;
// Inferred version
let _ { x } = foo;

The analogy is somewhat flawed, because enum matching is so more common than struct destructing, and it requires repeating the typename for every matched variant. A compromise might be a solution that requires the enum typename to only be used once.

scottmcm · February 14, 2022, 8:05pm

I've been thinking more about this example, and found something to distinguish at least one part of it: field access syntax is expression form of an irrefutable pattern, in a way.

So that says to me that one might say that this is unambiguously fine for structs, but it's not necessarily as obvious for enum variants.

Though maybe that's an indication that we're lacking syntax for enums. The existence of that as_fn (and related things like Result::ok) makes me imagine a world where instead of needing to make the method, it's just, say, if let Some(thing) = item.Fn { instead -- it's not like there are any fields on enums right now.^†

And even if it was only irrefutable patterns where it was ok, that'd still be nice for things like avoiding the Type::Array(ArrayType { element_type, length }) => repetition code from making distinct types to put in the enum variants. Having Type::Array(.{ element_type, length }) => makes that that much less annoying, while still being quite clear.

Come to think of it, we already have pattern examples of not needing to specify types for irrefutable patterns: both _ and bindings already work exactly like that!

^† I do see a bunch of potential practical problems with this, like how it can't be a place projection the way the other things are. And that means the distinction between .as_fn() and .as_mut_fn() and .into_fn() might be trickier to encode. But it's more a thought experiment, not a fleshed-out proposal, so some unresolved questions aren't a problem

Topic		Replies	Views
[Pre-RFC] Optional language feature for inferred variants language design	9	768	December 14, 2022
Elide the enum name on match language design	6	933	December 13, 2022
[Pre-RFC] Inferred member language design	14	612	December 20, 2023
Auto infer namespaces on struct and enum instantiations language design	45	4443	November 22, 2021
Revisit: Types for enum variants language design	6	2118	March 25, 2019

[Pre-RFC] Inferred Enum Type

Related topics