Fixing binary operators for forwards compatibility


#1

Hello all,

I’ve been working on a change to how we handle binary operators in the compiler. It turns out that the current code is making too many assumptions baked in from the old days when we didn’t support overloading. Basically, right now we are saying that if the type of the LHS is a builtin integer, like i32, then the type of the RHS must match. This leads it to accepting too much and too little. I’ve experimented now with several approaches to fixing this problem and I wanted to summarize briefly what I tried and try to get people’s opinion about what behavior to encode.

The most conservative – and simplest – version that I tried requires that both the LHS and RHS must have “known” types. This means that they cannot be pure type variables (e.g., the result of Zero::zero). The reason for this it that it allows the compiler to decide whether this use of + is a “builtin” operator (e.g. i32+i32) or one that requires an impl (e.g. i32+MyInt). Requiring that the types be known does break some code though, mostly generic code that is using Zero::zero and other such operands. The fix is to use UFCS form. I personally find the resulting code easier to understand, but YMMV. You can see the affect on the stdlib and rustc in this comment.

We can fix these cases by taking a different approach (which I also implemented, or mostly). In the other approach, we don’t require that types are known when we type-check +; instead we defer the work and figure out the types at the end (this requires some special plumbing to make type inference work out, but it’s not that imp’t). This means that those examples above continue to type-check. There are still some bits of existing code that will not work in the new approach, but they are largely constructs that weren’t supposed to work in the first place, or nonsense like 1 + return 22.

However there is a catch. The more permissive approach doesn’t work with the current (unstable) SIMD support, because the result of i32x4==i32x4 doesn’t have type bool, as PartialEq requires, but rather i32x4. This is also related to RFC 1008, which would like to make the return type more general. The problem is that even in the more permissive approach, we assume that we know the return type of a == b (bool). This assumption is not backwards compatible. If we required more type info up front, we could in principle in the future change the behavior to “try” PartialEq first and fallback to PolyEq or whatever the more complex variant is, though I’m not sure we’d ever want to do this – and this sort of “try and fallback” stuff can be unreliable, so if we did want to generalize ==, I might prefer to do via a distinct operator anyhow.

Anyway, I’m feeling a bit torn on which approach to take.


#2

What about the alternative? Not having builtin operators, but implementing the traits (and through them the operators for those types? There are many examples of traits implemented for builtin types (Int, FromPrimitive, …)


#3

After having given this some thought in the shower and on the train, I am leaning towards saying that I will implement it in the more permissive way, which just enough logic to make SIMD types continue to work for now (but we probably should plan to switch them to a distinct operator). Basically my feeling is that neither approach really offers substantially more room to change to other return types in the future; the only question is what happens in corner cases where we know nothing about the types at hand. Either way, we can offer a tiered fallback, but it works a bit more smoothly in the one case – I still dislike offering a tiered fallback in the first place, as any such scheme inevitably has more edge cases.

@ker, regarding your suggestion of just using this traits, this is essentially what my more permissive variant does. The fact that the compiler substitutes builtin versions for i32+i32 instead of actually dispatching to a trait is almost invisible to you except via lower compilation times (since we generate tighter code). The only way that you can tell which variant is used is in constants, where only built-in operations are accepted (for now, at least).


#4

Without knowing the implementation details, I’d suppose that in “ideal” variant sugared a + b and its desugared UFCS version Add::add(a, b) would always behave identically (except for, maybe, constant expressions, as you mentioned). Is this what the “permissive” approach implies?


#5

I believe (part of) the issue is that the UFCS form isn’t necessarily known, since, e.g., a == b could either be PartialEq::eq(a, b) or PolyEq::eq(a, b).


#6

So the PR is here: https://github.com/rust-lang/rust/pull/23673

The TL;DR of what I wound up implementing is that Add::add(a, b) behaves the same as a+b with the caveat that if a is a SIMD type, we will use the SIMD variant of the operators (this is important for == and friends which don’t obey the trait). In order for us to test whether a is SIMD, this means that the type of a must not be a complete unknown (i.e., Zero::zero() + 1_i32 does not compile). This last part is true today too.

In the future I think we will:

  1. Change the SIMD operations to use other operators for ==. However, making multiple variants on == might be an option too.
  2. Add some form of auto-ref that will convert to Add::add(a, b) or Add::add(&a, &b) based on whether there are any impls that could possibly match the former.

(Note that SIMD support is an unstable extension – I was basically just trying to preserve existing behavior.)


#7

One minor concern I have with treating all binary operations as overloaded is that in the hypothetical case that we end up overloading binary operations on primitives for “widening” or whatever other reason:

impl Add<u32> for u32 { .. }
impl Add<u16> for u32 { .. }

Then code like this (x:u32) + 1 will fail to compile as the RHS could be u16 or u32. However, I guess this won’t be a real problem if we decide to make default type params drive type inference, because, for example, the previous code will pick u32 for the integer literal on the RHS (cf trait Add<Rhs=Self>).

Overall, I prefer treating all operators as overloaded over fully resolving both LHS and RHS, because the former feels more consistent (all operators are just/mostly trait dispatch) and has better type inteference.

re: == on SIMD, my opinion is that, for now, we should remove the == operator on SIMD types and instead provide a fn simd_eq(T, T) -> T intrinsic to perform the same operation. The reason being that the return type of simd == simd is not a bool and it doesn’t match the current definition of the == operator / PartialEq trait. We can remove the intrinsic / add back the == operator after we made a decision on PartialEq with overloaded return type vs PolyEq vs another operator. But I guess it’s not a big concern right now since SIMD is unstable.