Idea: PlacePattern

I like this idea; an explicit keyword makes the meaning more obvious.

If we want to use an existing keyword, we could use let (in x, in y) = foo();, or more verbosely become.

5 Likes

Overall I think there's something there but I think we need to dig into the specifics.

in feels more obvious (and the brevity is desirable) and iirc we reserved become for tail calls.

Does this include executing derefs as necessary on e.g. place.field?

I would very much like to retain patterns as a pure and total fragment of the language so I would like to restrict this to types which are !Deref and !DerefMut or at least types which morally are DerefPure (i.e. Box<T> is OK). Moreover, place expressions like foo.bar().baz or foo.await?.bar would be forbidden as well.

Also, by "valid place expression" I assume this is a semantic distinction rather than a syntactic one. That is, the parser would accept the full expression grammar. Note that this is ambiguous with or-patterns as in:

let in x | x = 0;

may be interpreted as either in (x | x) (bit-or) or (in x) | x. Ambiguities can usually be resolved by favoring a specific parse and I think the most convenient one would be in (x | x) in terms of how libsyntax (parser/pat.rs specifically) is structured today (cc @petrochenkov).

This also leads to interesting questions (cc @varkor @arielb1 @pnkfelix)

  • What does (in x) | x mean?
  • What about A(mut x, in x)?
  • What about (in x, in x) = (0, 0);?
2 Likes

If we want to limit place expressions to pure expressions, I think just having a grammar of In Ident+ % Dot for place patterns covers everything that would semantically be allowed. That would solve the ambiguity question, I believe, and doesn't unreasonably complicate the grammar.

That said, I'm mildly against making Box more special than it already is. I would prefer to preserve the potential of eventually making Box just another library type, or at the very least "just" a #[lang] item rather than a special kind of type.

1 Like

That's not a sufficient condition. You also need to prevent place.field from performing a deref in typeck or in match checking. However Ident+ % Dot would also be a wholly new grammatical category since, afaik, we have no such syntactic restriction today.

To make Box<T> just another library type we would need DerefPure which is the same lang item that we would use for the restriction needed here as well.

This is similar to (x, x) = (0, 0): is the "temporality" of patterns defined?

  • some people may want to use (x, x) as a fallible pattern allowing a structural equality check within a pattern; I personally don't think that such semantics is that clear, and would rather have a keyword or sigil for that (maybe some prefix = somewhere?);

  • if there were a clear temporality, let (x, x) = (42, 0); would create two bindings, with the latter shadowing the former, so that x ends up referring to the binding holding 0. Note that there could / should be lints against this.

    • With that in mind (in x, in x) would be valid with the same temporality reasoning (x is overwritten twice). Again, we may want a lint to warn on this;

    • But Ok(in x) | Err(x) (I prefer to think about refutable patterns first) should error in the same way as Ok(x) | Err(y) is (they create different bindings);

Anyways, this is just how I personally would "intuitively" think about this (if it were to work), I am not saying that's how it should work (maybe all these erroring is a better solution).

Overall I think you've captured the "how things fall out from existing language semantics" well. What you didn't treat with was let (mut x, in x) = (0, 1);. It seems to me that this would define one binding x and then proceed to overwrite it with 1.

(PS: let (x, x) = (0, 0); is an error; this check is defined in https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/late/struct.LateResolutionVisitor.html#method.resolve_pattern_inner -- hopefully the logic should be readable or I need to make more improvements to it)

1 Like

IMO this is unnecessarily mysterious and verbose. It should just be

(a, b) = (2, 3);

like in many other languages.

1 Like

In all three cases:

error[E0416]: identifier `x` is bound more than once in the same pattern

Similarly, I don't see any use case for combining in with |, so let's just not allow that unless a specific use case arises that makes sense.

2 Likes
  1. Note that the case you quoted uses more verbose syntax (place) than later in the thread (in).

  2. Are you suggesting that arbitrary irrefutable patterns can appear on the LHS of an assignment, and just leaving off the let will change them from a new variable to an assignment to an existing variable?

  3. in would have the additional advantage (or disadvantage) that you could do this in a match too.

I can absolutely see the argument for doing this as a compound assignment, as long as we stay far far away from overloaded assignment. If this only works structurally, with patterns, that seems acceptable. I can also see the argument for allowing this in pattern syntax.

1 Like

The point of the ast::PatKind::AssignIn(P<Expr>) form is to avoid binding an identifier and instead assign to it using the value at the given place the pattern is matching on. So at least that error message does not make sense and we would need to invent another one ("assigning with an in pattern to the identifier x bound in the same pattern").

To do so, we would need to change LateResolutionVisitor::resolve_pattern_inner to store the fact that an in $expr pattern is writing to something within the same pattern (using bindings: &mut SmallVec<[(PatBoundCtx, FxHashSet<Ident>); 1]>).

However I don't see a compelling reason to insert additional logic into resolve just because an edge case feels slightly weird if it can be given well-defined semantics and allowing it is easier than banning it (and I believe that's the case at least in librustc_resolve and NLL should hopefully just reject any problematic combinations with e.g. (ref mut x, in x) (those checks are necessary anyways because NLL should be sound on its own!).

Easy:

let Ok(in x) | Err(in x) = 0;

But I also don't think language design should be done by whitelisting the specific parts of a general logic that we like. That just makes for a bunch of ad-hoc rules rather than composability. Rather, there should be compelling and strong reason not to allow composition with or-patterns.

1 Like

Being able to both update a value and bind a new variable in the same expression is very useful, I have a few state machines where I would love to be able to write

let (in self.state, done) = match mem::replace(&mut self.state, State::Invalid) {
   ...
}
4 Likes

Yep, the code is quite clear, good job (I don't usually look at rustc code since I haven't decided to go and make the dive, so in the meantime I remain at a qualitative level of comments :sweat_smile:)

Yes, that's what I wanted to say but somehow I got diverted by the (in x) | x case


I could imagine, given:

fn mb_foo (_: bool) -> Option<Foo>;
const DEFAULT_FOO: Foo;

writing the following pattern:

let mut foo: Foo = DEFAULT_FOO:
if let Some(in foo) = mb_foo(true) {}
if let Some(in foo) = mb_foo(false) {}

as

let mut foo = DEFAULT_FOO;
match (mb_foo(true), mb_foo(false)) {
    | (Some(in foo), Some(in foo))
    | (_           , Some(in foo))
    | (Some(in foo), _           )
    | _
    => {}
}
  • and with nested | patterns (if that could ever be possible):

    let mut foo = DEFAULT_FOO;
    match (mb_foo(true), mb_foo(false)) {
        (
            (None | Some(in foo)),
            (None | Some(in foo)),
        ) => {}
    }
    

The example may look contrived, and the usage of patterns may not be as readable as the more "explicitely" imperative way, but who knows, it may come in handy.

1 Like

It's already being implemented :grin:

1 Like

Sure, the message and ID would need changing.

I'm not suggesting that the edge case "feels slightly weird", and I have no problem with allowing both in assignments and non-in bindings in the same let. I'm suggesting that allowing both to the same name seems excessively error-prone.

We should absolutely define an ordering semantic for patterns, so that things like in x, in x.y can work reliably and deterministically. I do, however, think in x, in x deserves at least a warn-by-default lint, and mut x, in x deserves an error-by-default lint.

That's a compelling argument, thank you.

I started to say that this seems to suggest it's acceptable if both sides assign to the same names, but then this use case occurred to me:

let Some(in lastval) | _ = func()

And I can very easily imagine myself using that, in preference to:

let lastval = func().unwrap_or(lastval)

or:

if let Some(v) = func() { lastval = v }

I'm fine with lints for weird edge cases since they are not a part of the language spec and they don't complicate resolve (they can be implemented as a separate pass).

I wonder if that would actually expose the pattern matching / pattern lowering algorithm too much and possibly have negative consequences re. being able to reorder for optimization purposes. cc @matthewjasper

Not necessarily. Pattern match into new locations, assign those locations into the existing locations in order, and let the optimizer coalesce the locations if it can.

I think it should be allowed to assign to a variable that was created or changed in the same pattern. Example:

let (mut x, Ok(in x) | _) = it;

This is equivalent to

let mut x = it.0;
if let Ok(ok) = it.1 {
    x = ok;
}

In cases like this, the assignments should be performed left-to-right. That's the obvious solution, because expressions are evaluated left-to-right as well, and patterns in a match are also matched from left to right.

In principle this could be

(self.state, let done) = ...

That would imply a broader merging of the expression and pattern syntax.

1 Like

Then we would have two syntaxes for the same purpose: let (foo, bar) and (let foo, let bar). Some people might find this confusing (especially people who are just learning the language). We could deprecate let (foo, bar) in a new edition, but I believe many Rustaceans wouldn't like that.

1 Like

So, it sounds like there are two separate ideas here, both of which seem worth considering and potentially RFCing.

  1. The in pattern: anywhere you could put an identifier for a fresh variable name, you can instead put in followed by an existing path to store the matched value into that existing path as though via an assignment. You can put multiple in matches and non-in matches in the same expression, and they'll get assigned or created (respectively) left-to-right. You can use in with | patterns, and only the in matches in the matching branch of the | will take effect. This works in any pattern-matching context (though it requires parentheses for disambiguation in a for loop, for (in x) in ...).

  2. @comex's suggestion to allow simple aggregate assignment of tuples, or potentially other irrefutable patterns. Note that we need to carefully define this if we want (x, y) = (y, x) to do the right thing. What about (x, x.a) = (m, n); what behavior should that have? Can we define semantics that do the right thing for both of those? I think the rule we want is that the entire right-hand side is evaluated first, then all the assignments take place. (I think we want the same rule for let and in patterns in (1) above.)

With my language team hat on (but not speaking for the rest of the language team): I'd love to see RFCs for both of these. I can imagine accepting one or accepting both, as they have somewhat different use cases. (1) works well in general pattern matching; (2) works well for simple assignment of multiple values.

1 Like