Interaction of user-defined and integral fallbacks with inference


#1

So, RFC 213 was accepted some time ago, and included provisions for integrating type parameter defaults into inference. These are hotly desired by some on the libs team (@Gankro, I’m looking at you) for enabling more ergonomic use cases. @jroesch recently implemented the missing support (with one bugfix pending review). Before we ungate this feature, though, I wanted to raise the question of what the proper interaction with integral fallback ought to be, to make sure we are all in agreement.

This decision is interesting not only because we want to avoid as many surprises as possible, but because it also has some impact on backwards compatibility. That said, crater runs that we have done indicate zero regressions (and by this I mean both compile errors and new runtime behavior). Read on for details.

Example the first

In this first example, we have a user-defined fallback of u64 used with an integer literal:

fn foo<T=u64>(t: T) { ... }
//     ~~~~~
//       |
//   Note the presence
//   of a user-supplied default here.

fn main() { foo::<_>(22) }
//                ^
//                |
//    What type gets inferred here?

The question at hand is what type gets inferred for the type parameter T. On the one hand, the user specified a default of u64. On the other, integer literals typically fallback to i32. So which should we pick?

There are a couple of possibilities here:

  1. Error. The most conservative route would be to report an error if there are multiple defaults and they are not all the same type. This might be unfortunate since one of the reasons people want type default fallback is to help inform integer literal inference a bit.

  2. Prefer the integer literal default i32. We could give i32 preference. Nobody I’ve spoken to actually expects this behavior, but it does have the virtue of being backwards compatible. (More on backwards compatibility below)

  3. Prefer the user default, u64. In informal polls, this is what everyone expects. It is also what the RFC specifies.

The branch as currently implemented takes option 2. I think that for this specific example, u64 is definitely the less surprising result – as I said, at least for people I’ve spoken to it is universally what is expected. An error is however the most conservative option.

Example the second

OK, let’s consider a twist on the previous example. In this case, the user-defined fallback is not an integral type:

fn foo<T=char>(t: T) { ... }
//     ~~~~~
//       |
//   Note the presence
//   of a user-supplied default here.

fn main() { foo::<_>(22) }
//                ^
//                |
//    What type gets inferred here?

Now the question is a bit difference. The type variable has one default char, but it also connected to an integer literal type (with fallback i32). Integer literals are naturally incompatible with char.

So, again there are several choices:

  1. Error due to multiple defaults. Again, the most conservative route would be to error, as there are multiple defaults (char, i32) that apply to a single unresolved variable.

  2. Prefer the integer literal default (i32). This is perhaps somewhat less surprising than it was before, given that char is clearly not a good choice.

  3. Error due to preferring user-defined default. If we were to indiscriminantly prefer the user-defined default, then we’d get an error, because the type of an integer literal cannot be char. This is what the RFC chose, both because it seemed like a clearer strategy to reason about and because of concerns about future compatibility with more flexible literals (see section below).

I’m not sure what is less surprising in this example. For one thing, I didn’t do a lot of polling. =) I can imagine that people expect i32 as the answer here. However, the concerns about more flexible literals (discussed below) are perhaps valid as well.

Implementation strategies

There are various impl strategies we might adopt. Here are the outcomes for each example:

| Strategy       | Example 1 | Example 2 |
| -------------- | --------- | --------- |
| Unify all      | Error     | Error     |
| Prefer literal | i32       | i32       |
| Prefer user    | u64       | Error     |
| DWIM           | u64       | i32       |
  • Unify all: always unify the variables with all defaults. This is the conservative choice in that it gives an error if there is any doubt.
  • Prefer literal: always prefer the integer literal (i32). This is the maximally backwards compatible choice, but I think it leads to very surprising outcomes.
  • Prefer user: always the user-defined choice. This is simple from one point of view, but does lead to a potentially counterintuitive result for example 2.
  • DWIM: At one point, @nrc proposed a rule that we would prefer the user-defined default, except in the case where the variable is unified with an integer literal, and the user-defined default is non-integral. This is complex to say but leads to sensible results on both examples.

Backwards compatibility and phasing

You might reasonably wonder what the impact of this change will be existing code. This is somewhat worrisome because changing fallback could lead to existing programs silently changing behavior (like, now using a u64 instead of i32) rather than failing to compile. We did a crater run with the “unify all” strategy. This strategy has the virtue of causing a compilation error is there is any ambiguity at all, so code cannot change semantics. No regressions were found. From this I conclude that the danger is minimal to nil, but YMMV.

Nonetheless, when phasing in the change, it would probably be good to start with a warning cycle that will warn if code might change semantics (or, depending on what strategy we choose, become an error) in the next release. This can be achieved by simulating the “unify all” strategy.

Future, more liberal forms of literals

One of the reasons that the original RFC opted to prefer user-defined defaults is that, in the future, I expect we may try to change integer literals so that they can be inferred not only to integral types but also to user-defined types like BigInt. At that point, any rule that attempts to differentiate between an “integral” type and other user-defined type becomes rather more complicated, probably involving a trait lookup of some kind. Adding trait lookups into the processing of defaults seems like it would push an already unfortunately complex system rather over the edge to me.

My conclusion

I’ve personally not made up my mind, but I think I roughly order the choices like so:

  1. Use user-defined default always, as specified in the RFC
  2. Always error when there is any ambiguity
  3. DWIM
  4. Prefer i32

What pushes me over the edge is that the first two have the virtue of being extensible later. That is, we can convert the error cases into the “DWIM” rule if we decide to do so, but we cannot change back the other way. I am somewhat concerned that the “always error” rule will rule out a lot of use cases, and hence I lean towards the option espoused in the RFC.


#2

Here is an open PR that implements the behavior for #1. I also have a couple of other branches with different strategies implemented.


#3

I have always been of the opinion that integer fallback is a “last desperate stab” fallback. Also, we only ever expected it to show up in benchmarks/tests/examples where it probably doesn’t actually matter. Ok, it might matter for all of the above due to overflow (or perf in the case of benchmarks), but there’s a lot of stars aligning for it to matter.

Big fan of DWIM, but the interaction with user defined integer literals is certainly unfortunate. I agree that in practice, “always use the user-defined default” should be adequate.


#4

I concur that preferring the user-defined default seems like the most sensible choice. In particular, similarly to @Gankro, I view integer fallback as a “fallback” and not a “default”. As such I would only expect it to be chosen if there is no other type information at all.


#5

DWIM seems like the really intuitive and obvious option to me, but I haven’t thought much about the implications of user-defined literals for this process. Informally, I conceive of the default parameter inference as inferring this as “not the default parameter,” and then after that the i32 fallback check recognizing this as an unknown integer and assigning it to be an i32.

It wouldn’t be the end of the world to always use the user-defined default, though. Code like this would be rare, and resolving the error just requires a type annotation. It would be one of many examples of rustc behaving more thoughtfully than I would.


#6

I want to say error in both cases until we have more flexible numerics solved, and then revisit which option is better in the future.


#7

That’s a good point – there have been attempts in the past to automatically upcast constants (e.g. int32 to int64, int to float).I’m a bit wary of this, but we should be careful to avoid surprising results if both changes should land.


#8

I would expect DWIM, so I’d go with erroring; in cases where confusion is likely, clearly indicating this to the user is probably a good idea.

If nothing else, it’d provide the chance to find out what users expect to happen (particularly if the error message indicates to the user “defaults may change; see $URL”, which gives them a chance to register their intuition).


#9

Why not unify all applicable? In Ex1 it would fail as compiler do not know if it should use u64 or i32. In Ex2 char is not an option, because literal “22” definitely is not char. So only i32 is left after unification and should be applied.


#10

“Unify all applicable” could actually work for both if the i32 fallback happened after unification.

  • The first would unify u64 with integer literal, giving a u64.
  • The second would only have integer literal applicable, which would only later fall back to i32.

However, it’s not clear how to resolve “all applicable” in the general (forward-looking) case.


#11

I prefer just not unifying integer-literal variables with non-integers (and identically for floats), as we do in every other place in Rust. Integers are basically Int<i32>, Int<i64>, etc. and integer variables are Int<_>. We already need to handle unification of int variables (e.g. for trait selection).

Basically,

let defaultable = ty_infers.filter(|ty| {
    infcx.can_eq(ty, infcx.default(ty)).is_ok()
}).collect::<Vec<_>>(); // the collect is significant
for ty in defaultable {
    infcx.eq_types(_, _, ty, infcx.default(ty));
}

#12

If I understand you, this is the “DWIM” rule (just to clarify).


#13

I do want to clarify one thing from my original post as well: I don’t want to give the impression that we could easily phase in overloadable literals. @arielb1 is right that there are other places where we take some explicit advantage of things being “int-like” – though I think relatively few. (In fact, I’ve been wanting to go and investigate purging those as well, in preparation for the possibility of overloadable literals.) That said, adding another roadblock is not necessarily wise.


#14

No, it also behaves differently when you have e.g.

struct Foo<T=Option<()>>(T);
fn main() {
    let x = Foo::<_/*=$0*/>(None::<_/*=$1*/>);
}

Here $0 gets unified with Option<$1>, and then the whole thing later gets unified with Option<()>, which succeeds.


#15

We take advantage of IntVar not unifying with non-integers basically every time we conditionally unify - most importantly, trait and method lookup.


#16

I prefer and would expect the “always use the user-defined default” rule. Like @Florob, I expect integer fallback to only be used as a last resort when there is absolutely no other type information (including defaults) available. If the second example were to compile, I would expect it to call foo<char> with “SYNCHRONOUS IDLE” (Unicode code point 22), with the literal statically verified to be a valid Unicode scalar value. Given that initializing a char with an integer literal is not supported, I would expect example two to be an error. I would would find it resolving to calling foo<i32> both counter intuitive and confusing.


#17

I was a big fan of he DWIM approach, but the BigInt literal thing might sway me. However, I would like to understand why it is a problem. I imagine the tricky case is where the default is BigInt and the value is an integer literal. The question is then should we infer i32 or BigInt, and we would like to get the answer BigInt; the problem being that IntVar<_> and BigInt don’t unify, so we would end up using the integer fallback. Is this the problem?

I’m not sure how we plan to implement more liberal integer literals. But lets assume there is a trait with a lang item, wherever we have an integer literal we need to know what types it might be at the moment this is u8, i64, etc. In the future, we also include any types in scope which implement the lang item trait, so now the list is u8, i64, …, BigInt (or whatever). So now we try the DWIM procedure: 1st check if there is a type (there isn’t), then check if the default works (which it does because the default type unifies with one of the literal types), finally we would use the integer fallback, but we don’t have to in this case.

I believe we’d have to do the trait lookup in any case to type the integer literal, so it doesn’t seem any more complex to use DWIM with or without more liberal integer literals.

TBH, I think always using the default is also totally acceptable though, so I won’t object to option 1.


#18

Thanks for the writeup; it’s a tough one.

Given the widely varying intuitions people seem to have, I’m with a few others who’ve voted for option #2: make such ambiguity an error for now, until we have more experience with defaults influencing inference in general.


#19

Can you elaborate on what you are trying to show me via this example? Is it simply this point:

Yes, I am aware of this. What I am not sure of is the practical impact. That is, how often does it happen that we have e.g. an impl on a struct and on i32 (but no other type), such that we are able to unify the integer literal with i32 but ignore the struct. It’d be interesting to run some experiments on this and see what patterns break. (That could inform the decision here, as well.)


#20

I think your summary was roughly accurate, yes. And yes, we have to do a trait lookup no matter what – but the way that fallback works is that we first resolve all outstanding trait references as best we can, then we do fallback, then repeat the process. I feel uncomfortable about fallback and type inference as it is; injecting further trait lookups into the process seems to me to be over the top. Imagine for example if the types may involve other variables that are potentially being inferred:

fn foo<T=u32,U=SomeType<T>>(...)

Now support that neither T nor U are constrained, so we are weighing fallbacks to u32 and SomeType<$T> (where $T is the type variable we created to represent T). I guess we can apply our usual procedure of testing whether a trait match could possibly succeed to decide whether SomeType<$T> might be an integral type, but you can see the confusion.

Of course similar situations can arise even today:

fn foo<T=u64,U=T>(x: &[T], y: U) { }
fn main() { foo(&[], 44); }

Here there are four type variables, let’s call them $T, $U, $22, and $44. $U is unified with $44. In order to detect ambiguous cases, we do not apply fallbacks in any particular order but rather consider all cases simultaneously. So we could have to examine the fallback for $U and we would find that it is $T (which is ununified). We don’t yet know what $T will become, so it’s tricky to know whether to apply the fallback. (In this case, since $T will ultimately become u64, we probably should, but it could as well be char). I think though this is not irreconcilable: we can presumably trace $T to its fallback(s) (or, if it is unified, to the type it is unified with) and try to reach a decision that way. Just complicated.