[IDEA] Implied enum types

The answer is fairly simple: it's valid to use _::Variant if and only if Default::default() would be valid (presuming that the enum implements Default), i.e. when the type is uniquely constrained. The variant name would not impact inference. It's the exact same rule for inference used everywhere else in the language for the Hindley-Milner type inference.

It is unfortunate that inferred enum types won't work for generic functions like HashMap::get (at least without inference impacting defaults), but it's consistent with inference in the rest of the language.

2 Likes

While it's consistent, it also has the SemVer hazard hidden in it, and right now, you get away with implementing traits on enums because the only cases it breaks are those involving associated functions in traits like Default::default(), and if you don't implement a trait with associated functions yourself, then it's unlikely that your consumers will get caught by the hazard (because of the orphan rules, the only way to get caught by this hazard is to implement a local trait on two different non-local types, and the break happens at that point).

With this change, that stops being true - implementing any trait on your enum can result in a previously unique constraint on a different enum no longer holding, and thus a SemVer hazard that was only a breaking change if you implemented a trait with associated functions on your type is now a breaking change. In turn, that means that you cannot add to the set of traits implemented by any enum unless you bump the major version, since implementing any trait could result in a previously fully constrained case somewhere in user code gaining a second option.

First off, inference breakage is considered minor breakage and doesn't require a major version increment. Otherwise literally any API addition would be major breaking, as it can break name resolution and/or inference, especially when glob imports are involved.

Most relevantly, if downstream has an extension trait with method names which clash with the newly implemented trait as well as both traits in scope, implementing the trait will make method lookup ambiguous. Adding a trait implementation is already (minor) breakage.

But secondly, no; calling func(_::Variant) for fn test<T: Trait>(_: T) will never compile, because the type isn't uniquely constrained. It doesn't matter if only one type implements the trait. There isn't some extra magic on associated enum variants that will make them work when trait associated methods won't, because it's the exact same question posed to the type inference.

(To be explicit, the answers to your hard questions are 1. Yes, 2. No.)

1 Like

That disagrees with the following code:

use std::borrow::Borrow;
use std::collections::BTreeMap;

#[derive(Clone, Copy, Debug, PartialOrd, Ord, PartialEq, Eq)]
enum KeyType {
    One,
    Two,
}

impl Default for KeyType {
    fn default() -> Self {
        Self::One
    }
}

#[derive(Clone, Copy, Debug, PartialOrd, Ord, PartialEq, Eq)]
enum KeyWrapper {
    Alpha(KeyType),
    Beta(KeyType),
}

impl Default for KeyWrapper {
    fn default() -> Self {
        Self::Alpha(KeyType::One)
    }
}

impl Borrow<KeyType> for KeyWrapper {
    fn borrow(&self) -> &KeyType {
        match self {
            Self::Alpha(ty) => &ty,
            Self::Beta(ty) => &ty,
        }
    }
}

fn main() {
    let mut map = BTreeMap::new();
    map.insert(KeyWrapper::Alpha(KeyType::One), "unus");
    map.insert(KeyWrapper::Alpha(KeyType::Two), "duo");
    
    let lookup_key = Default::default();
    
    println!("{:?}", map.get(&lookup_key));
}

(see Rust Playground).

As-is, this code does not compile, because there are two types that lookup_key could be. If you remove the impl Borrow<KeyType> for KeyWrapper, the code compiles.

And this is the problem case, as far as I'm concerned. I've added an extra trait implementation, which causes lookup_key to go from "can only be KeyWrapper" to "can be KeyType or KeyWrapper", and thus causing type inference of Default::default() to fail. To avoid this, you'd need a different rule for enums than is used elsewhere, otherwise adding a trait implementation could cause this very same class of breakage.

1 Like

My understanding is that this is because there is a code spelling that works for both old and new code (e.g., specifying the enum type explicitly). Is that what is being relied upon here?

1 Like

Here is a google doc with my proposal. Inferred Types - Google Docs

1 Like

It looks like you put some effort in. I have to disagree with using it for structs though. I mean it could add consitency but it could confuse programers because a another struct could be converted into this struct with into.

One of the code snippets where I don't like how this interacts is with let x : &dyn Any = &_::Foo. When the enum in question is assigned to a value which is type erased, at the very least you must hunt through all enums in all the imports to discover the origin... It could also be problematic if the imports change such that _::Foo changes type, hopefully you would notice, but is this the kind of thing you want on your mental checklist when changing imports?

How does it interact with discovering/enumerating unused imports when an import can now appear unused, but used only in an inferred fashion...

Personally, I'd expect that to fail because it's not constrained by inference -- there's nothing there to constrain it to a particular Foo.

1 Like

Globally, or locally to the expression i.e. would something the following allow it to be accepted in some cases, and rejected in others...

let _: Z = _::Foo;
let x : &dyn Any = &_::Foo;

Have we gone over sem-ver? I.e. What happens if an imported crate adds a Foo enum variant, or new enum containing a variant Foo with the same name as one used with this feature?

I've been imaging this as "you can use .Variant to get Foo::Variant in the same set of places where inference today can figure out that Default::default() is <Foo as Default>::default()".

More specifically, something like "when you see .Variant, treat it as a type variable, run inference, then if inference generates an exact type, get the variant from that type, otherwise error and suggest that they write the actual type themselves".

I've written about this in more detail in the zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/RFC.20inferred.20types/near/362000314.

5 Likes

The only reason I have to dislike this version is that implementing a trait can change whether or not Default::default() infers a unique type, or not (see the code sample in post #64 on this thread where the existence or absence of impl Borrow<KeyType> for KeyWrapper changes whether let lookup_key = Default::default(); is inferred correctly as KeyWrapper, or rejected as ambiguous).

Hence my preference for a version of "any use statement that imports an enum also implies use $enum::*;" - this is subtly different, in that it means that if Variant names variants in both Foo and Bar, and I could write Foo::Variant or Bar::Variant, rustc will complain about the ambiguity. This is easier to explain to a confused developer than the inference case, even if the complaint happens because Bar adds a new variant.

That said, though, I'm willing to accept either, because I cannot find a way to add a new trait implementation that causes rustc to change which unique type it infers - only ways to make it go from inferring a fixed type, to complaining about ambiguity. As long as people adding the feature are doing so aware that this is possible, and have decided that the risk of confusion because a trait implementation elsewhere in the codebase makes a type ambiguous is OK, I'm willing to trust their judgement.

A couple of points for you, that I hope you can use to steel-man your proposal against my side:

First, there's an edge case I'd like you to describe and explain how you handle, because while I don't think it's evidence for or against your proposal, it does change the semantics of Rust in a possibly surprising fashion.

Imagine I have a crate bar, with an exported function: pub fn flavour(day: chrono::Weekday) -> Flavour. bar thus depends on the crate chrono, in order to refer to chrono::Weekday. Assume that bar does not re-export anything from chrono; that means that in today's Rust, for crate foo to call bar::flavour(), it needs to have chrono as a dependency, so that Rust knows how enum Weekday is defined. How do you intend to handle this?

There's two reasonable answers: one is to say that foo can call bar::flavour(.Mon), because while foo does not depend on chrono, bar does - and we'll treat this as-if bar re-exported enum Weekday in order to make this work. The other is to say that chrono::Weekday is not visible in crate foo, and thus that it does not compile until foo depends on chrono.

Secondly, the alternative way to deal with the original problem is to extend glob imports, rather than type inference, and I'd like you to call this out as an alternative option (even if you don't expand it).

You can treat today's import machinery as having a name in one of three states:

  1. Not imported; you have to use a path to refer to it unambiguously. E.g. ::chrono::Weekday::Mon.
  2. Explicitly imported; Mon becomes unambiguously ::chrono::Weekday::Mon, and if another use statement tries to explicitly import another Mon, Rust generates an error, because there's only ever one "strong" import for a name.
  3. Glob imported; this imports things if there is no explicit import for the same name, so use ::chrono::Weekday::* will define Mon as ::chrono::Weekday::Mon, but only if nothing else tries to import a Mon into scope. If two glob imports both import the same name, then the name is treated as "not imported".

The weaker form of the extension is to say that when I do use foo, this not only explicitly imports foo (as today), but also recursively glob imports foo::*, using the glob import rules for ambiguity.

The stronger form is to extend the "unambiguous" versus "ambiguous" with resolution rules; you could track the "depth" of the import to work out which one wins. In this case, use chrono::Weekday::Mon has depth 0 for Mon, because there's no globbing or implicit imports involved. use chrono::Weekday::* has depth 1 for Mon, because it's a glob import. use chrono::Weekday has depth 2 for Mon (one level to make it match the glob import, one because Rust has inferred the import). And use chrono has depth 3 for Mon (one level to get toWeekday, then the other two to match use chrono::Weekday). A name is ambiguous if there are two definitions of it at the same level; otherwise, the smallest depth number wins (so depth 0 beats depth 1, depth 3 beats depth 4).

This is a very different path to the same intended effect; it'll have different tradeoffs to the one based on type inference, and it's thus a strong alternative to compare against in order to explain why the type inference based option is better (are ambiguities better-handled in the type inference version? Is a change to the types of a function less likely to result in surprising compiler behaviour? etc, etc)

1 Like

I agree this is suboptimal, but I think it's generally fine because it's an existing thing that applies already for literals, and is something that we already know we'd like to do something about, and thus I expect it would also work here, and is actually the less impactful version of ambiguity-from-new-addition.

For example,

trait Blah {}

impl Blah for u32 {}
//impl Blah for u64 {} // UNCOMMENT TO ERROR

fn foo(_: impl Blah) {}

fn main() {
    foo(1);
}

So I think it's fine for bar(.Variant) to fail to compile in the same kinds of situations where bar(Default::default()) will fail to compile and bar("variant".parse().unwrap()) will fail to compile, both for the same kind of trait ambiguity reasons.

It meets the "there's a way to fully quality it so that it works before and after the trait impl addition" bar for being deemed a "minor change".

And I think it'd be rare that it's an issue in practise. The best uses for this are things like make_request(.Get, url) or Config { algorithm: .Keccak, .. } where there's no traits involved at all.

If there was a different rule, like if it were something like "it looks at the enums that were used for the current scope and picks the one with that variant", then it still has non-local breakage if an external #[non_exhaustive] enum adds a new variant that happens to be named the same as some other variant that's in-scope. (Basically that's the same awkward breakage that we keep hitting on trait resolution.) And the only way to avoid that would be to not use things, which seems bad. Whereas a convention like "hey, only use this in non-generic contexts" (like a field or argument with known exact type) would keep the ambiguity from ever coming up, if for some reason a codebase was particularly worried about it.

And "I need to look up all the variants for all the enums that are used in this scope" to find out what .Variant means" is more annoying and less "locality of reasoning" than knowing something by looking up the definition of the function that was called. So I would call it a plus that println!("{:?}", .Foo) never works, in the inference version, because "it's something that's Debug" will never be enough to infer the type, rather than "well, it works for now because you happen to have only used one thing with a variant of that name" -- this is basically the trait method resolution problem, as keeps causing problems on iterators. (Subjectively, I think that the trait method resolution conflicts have been worse overall than the inference ambiguity problems, so given the choice I'd not add more mechanisms that work analogously to trait method resolution.)

So, personally, I'm quite happy for the errors to be "hey, the type for .Variant here isn't constrained so you need to write ConcreteType::Variant"-style, just like how you get "hey, the type for <_>::default() here isn't constrained so you need to write ConcreteType::default()" errors today.

4 Likes

How, exactly, is "look at the use statements - only one enum can be in scope with this variant" less local than "look at the use statements - only one function with this name can be in scope, and then you can look up this function to determine what its parameter types are defined as, and then you can determine what the type inference algorithm did to determine that in this particular place, the function parameter has only one unique type"?

One is a simple case of "look at what's imported - it must be one of those" - the other is "look at what's imported. Then find the thing you imported, look at it, look at what's imported in that location, and repeat".

I'm assuming it's not "manually import every single variant you're planning on using by its textual name (not a glob)", because then this feature wouldn't be a feature, it's what already exists.

If I'm trying to figure out the make_request(.Get, "https://rust-lang.org/") example, it's either

  • Look up the make_request in the use list, which will be there textually (assuming no globs), and that will say the parameter type, or at worst it's a trait that has only one implementation, since otherwise it wouldn't compile.

  • Look at every type use in scope (assuming no globs), hoping that everyone's following the naming conventions so that I know that I only need to look at the ones with UpperCamel names, and look up the variants they have, and stop when I find the first one with that name.

If for some reason I need to do that manually, I'd much rather do the former to figure out what type .Get is. The envelope of stuff that matters is much tighter with it.

Since it's common for use lists to end up being like

use crate::build::expr::as_place::PlaceBuilder;
use crate::build::scope::DropKind;
use rustc_apfloat::ieee::{Double, Single};
use rustc_apfloat::Float;
use rustc_ast::attr;
use rustc_data_structures::fx::FxHashMap;
use rustc_data_structures::sorted_map::SortedIndexMultiMap;
use rustc_errors::ErrorGuaranteed;
use rustc_hir as hir;
use rustc_hir::def::DefKind;
use rustc_hir::def_id::{DefId, LocalDefId};
use rustc_hir::{GeneratorKind, Node};
use rustc_index::{Idx, IndexSlice, IndexVec};
use rustc_infer::infer::{InferCtxt, TyCtxtInferExt};
use rustc_middle::hir::place::PlaceBase as HirPlaceBase;
use rustc_middle::middle::region;
use rustc_middle::mir::interpret::ConstValue;
use rustc_middle::mir::interpret::Scalar;
use rustc_middle::mir::*;
use rustc_middle::thir::{
    self, BindingMode, Expr, ExprId, LintLevel, LocalVarId, Param, ParamId, PatKind, Thir,
};
use rustc_middle::ty::{self, Ty, TyCtxt, TypeVisitableExt};
use rustc_span::symbol::sym;
use rustc_span::Span;
use rustc_span::Symbol;
use rustc_target::abi::FieldIdx;
use rustc_target::spec::abi::Abi;

I have no interest in looking at everything in there that might be a type. I'm much rather start from what I'm calling, not from everything in scope.

4 Likes

If the code compiles and does what you expect, then in both cases, you can just look up make_request to see what the type is. The type inference based version has a slight advantage here, since it's more likely to be in this case than the "automatic recursive use" based version, as it'll know that the type of make_request constrains the first argument to enum Method, whereas the automatic recursive use version can get confused because it has both Method::Post and CommunicationPreference::Post in sight.

If the code does not compile, then I'd expect that the error messages make it clear to the user why the compiler couldn't resolve the type, and what the user can do to fix the problem. In both cases, the compiler knows what types it considered possible, and can list them, along with the type the function actually wants, and the user can resolve the ambiguity manually. This one's a wash between the two; type inference wins when the function type is not generic (since the error message will occur because the one and only possibility didn't have a variant with that spelling), while I expect automatic recursive use to do better when the function is generic, since it won't pull in surprising types that you didn't expect to be possibilities here - and thus if there's only one option locally, even if there are multiple options in the program as a whole, it'll not complain about ambiguity.

The final case is where the code compiles, but does not do what you expect. There's two sub-cases here to consider:

  1. You have a fundamental misconception about the type of make_request, and the correct type of the first parameter is not the type you expect. I would argue that automatic recursive use is less likely to hit this case, since to hit this case, you need to both have a use statement bringing the correct type into play, and to not have a use statement bringing the type you expect into play, whereas type inference will find the correct type, even if this surprises you as a programmer. Indeed, depending on implementation details, type inference may even find the correct type when you have no way of writing it unambiguously (since the correct type can come from a crate not in your direct dependencies).
  2. The type is a generic type, and you've ended up with a type that meets all the constraints, but isn't the one you expected. This one's a hard judgement call - type inference will only work if there's one and only one type that meets all the constraints, but that type could come from anywhere in the program; on the other hand, automatic recursive use might find a type when there's two or more that could be correct, but at least the type is guaranteed to be something you can identify from your use statements.

BTW, I think I see a fundamental disconnect in the way we're talking about this feature - I am mostly interested in the cases where the feature confuses the user (either with surprising compile failures, or with code that succeeds in compiling where the user expected it to fail). I expect that anyone implementing this feature will get it right for the cases where it doesn't confuse the user. A lot of your comments read to me like you think the feature needs defending in the cases where it works well - so for the record, I agree that when it works well, it's a useful feature. My concern is over the unexpected consequences when it goes wrong, not over when it works well.

This is also why I think it'd be great as an IDE feature - with IDE features, you don't need to care as much about surprising consequences, since the person who faces the surprise is the person who makes use of the feature (by definition), and not a later maintainer, or someone working elsewhere in the codebase.

3 Likes

Wow, thanks!

1 Like

I would say the point of my comparisons is to poke at the details of those. If those surprising failures/successes are things that already exist in Rust today, then presumably they're not fatal for this feature either.

For example, since this works:

mod level1 {
    mod level2 {
        #[derive(Default, Debug)]
        pub struct Foo { pub x: i32, pub y: i32 }
    }
    pub fn takes_foo(f: level2::Foo) {
        dbg!(f);
    }
}

fn main() { 
    //level1::takes_foo(level1::level2::Foo { x: 1, y: 2 });
    //^ ERROR: module `level2` is private

    level1::takes_foo({
        let mut foo = Default::default();
        if false { 
            foo
        } else {
            foo.x = 7;
            foo.y = 10;
            foo
        }
    });
    //^ Works, with inference finding the unnamable type
}

It seems like it would be fine for level1::takes_foo(.{ x: 7, y: 10 }) (or similar) to work, even though one might have a misconception about where it came from or it might have come from something you never mentioned -- or indeed can't mention.

So what I'm looking for is not that whether it could go wrong -- I agree it could -- but that it's somehow worse than other similar things, or that the way it goes wrong is similar to something else that's been recurringly troublesome.

5 Likes

I strongly disagree with this - some language features are more commonly used than others, and a rarely used feature that introduces surprises is a radically different beast to a feature that's likely to be commonly used.

I very rarely see Default::default() relying on type inference - it's simply something people don't write in the codebases I work in, unless the only thing they know about the type is that it implements Default. In practice, since std::default::Default is part of the prelude, people write things like Foo::default(), which Rust interprets as <Foo as Default>::default(), thus where type inference doesn't come into play.

Remember, after all, that all of the questions in Rust Quiz are "surprising failures/successes that exist in Rust today" - to a very large extent, we don't care that they're surprising because they're also cases that you'd rarely (if ever) see in real code as opposed to Rust mindbenders.

Therefore, to show that it's not fatal, you either need to show that the feature you're comparing to is as commonly found as you'd expect this feature to be (i.e. that my bias that it's rarely seen is because the codebases I look at are unrepresentative), or that this feature is not going to lead to surprises. Just showing that it introduces the same surprises as existing features is not enough - since the existing features may well be things that you only encounter in places like dtolnay's Rust Quiz that have been built to be confusing.