Pre-pre-RFC: match ergonomics for container types — restricted method calls in patterns

ekuber · November 11, 2020, 5:46pm

This has been something that comes up every now and then, but I was musing about it yet again yesterday.

box patterns are a nightly only feature that lets you write

    let b = Some(Box::new(5));
    match b {
        Some(box n) if n < 0 => println!("Box contains negative number {}", n),
        Some(n) => println!("Box contains non-negative number {}", n),
        None => println!("No box"),
    }

This is very convenient for some deeply nested structures, and use it often in some parts of the compiler. But this feature in particular still has open questions around syntax and behavior. It is also stopped by how could it be generalized for other container types, like Arc or String.

One of the concerns that people will rightly have is around "implicit behavior": having something akin to match ergonomics for a feature that performs arbitrary logic would be rightly out of the question. This means that we can't make this as convenient as it could be, where there's no new syntax, but we can get close. Another concern is that box is not general enough, it only works with Box.

I first thought that some new trait with an associated type for its output could be added and be made easy to derive where the borrowed inner value could be extracted by the match pattern. But that would have some either require it to be auto applicable (hiding arbitrary execution) or some top-level method call (which means that the trait needs to be implemented by most things. If we were to do the later, then we would need the first auto-derivable trait on all types, and I do not think that's worth it.

Another alternative would be to introduce deref or some other keyword in patterns in order to opt-into calling Deref::deref() on the field, and let the type system figure out what the outcoming type would be, relying on match ergonomics to make this nice to use. But adding this kind of special syntax is just shifting the box pattern to a slightly more convenient thing.

Which is how I arrived to the following thought: method calls aren't syntactically allowed in patterns, for good reason. But we could allow them in the AST, and whitelist specific methods that are for<'a> fn(&'a self) -> &'a _. Allowing this could let us write:

match value {
    E::B { owned_string.as_str(): "" } => println!("empty string"),
    E::B { owned_string: s } => println!("string: {}", s),
    _ => {}
}

instead of the current need to nest the match expressions

match value {
    E::B { owned_string: s } => match s.as_str() {
        "" => println!("empty string"),
        _ => println!("string: {}", s),
    }
    _ => {}
}

The method call syntax is currently not supported at all in rustc, although I am planning to accept it syntactically purely for diagnostic purposes. But I think that having a way to do this will enable some design patterns that are more prevalent in other languages to be much more ergonomic.

Thoughts? Pointers to previous discussions in the subject (there is potentially some context I might be missing)?

Edit: note that these cases were briefly considered in the original RFC for match ergonomics.

djc · November 11, 2020, 6:06pm

I find this syntax to be pretty far out of my comfort zone. I do recognize that there are many places where the lack of (de)ref coercion is painful -- apart from sort of "elicited errors" I get while refactoring, I would say this is probably one of my top sources of compile errors. That said, this feels like one particular instance of needing more (de)ref coercion to happen.

For that stuff, I think this issue is canonical? Tracking issue for experiments around coercions, generics, and Copy type ergonomics · Issue #44619 · rust-lang/rust · GitHub

petrochenkov · November 11, 2020, 6:12pm

This is one of use cases for pattern matching expressions!

if value is E::B { owned_string: s } && !s.is_empty() {
    println!("string: {}", s);
} else {
    println!("empty string");
}

ekuber · November 11, 2020, 6:20pm

Pattern matching expressions are certainly a good fit for this case, but it wouldn't help with the cases that I had in mind where I would like in a single step to either destructure something behind deref (values behind P<T> also comes to mind) or be able to take advantage of the slice destructuring syntax on Vecs, to give two examples. But again, I would also like to see the pattern matching expressions incorporated into the language as I feel they serve their own purpose as well.

ekuber · November 11, 2020, 6:24pm

That's fair! I came up with it while trying to keep in line with the rest of the language and without introducing new keywords, as I was walking the dog

I fear that that due to the possibility of arbitrary code to be executed in autoderef that having that happen in patterns implicitly could make reasoning about the performance characteristics of the match expression harder to do, but at the same time my fears might be completely unfounded. We could also have an initial implementation and try things out in nightly before settling on one behavior or another.

rpjohnst · November 11, 2020, 6:55pm

Putting method calls in patterns this way (limited set or no) seems like an inversion of what patterns normally mean- calls are expressions, using them in patterns would really only make sense if their meaning could somehow be "reversed," the same way constructors used in patterns become destructors. box patterns follow this- the expression box x constructs a Box containing x, and the pattern box x deconstructs the Box.

I suspect deref or similar is the best direction here, in the long term. Deref is the closest thing to the "reverse" of Box::new/Rc::new/etc. There is also more previous discussion than the match ergonomics RFC: https://github.com/rust-lang/rfcs/pull/1646.

There, the concern about arbitrary code execution is addressed via a new DerefPure marker trait. Generally the ecosystem is already pretty good about implementing Deref in "reasonable" ways, so something like this seems like it could work. Perhaps today we might consider const impl Deref instead?

IIUC another issue with "just using Deref" is moving things out from behind the pointer. Previously I've seen proposals for a DerefMove trait to go along with Deref and DerefMut, though I don't know if that's been as fully fleshed out anywhere.

steffahn · November 11, 2020, 7:03pm

For some prior art, Haskell’s view patterns and pattern synonyms come to mind here. I’m not sure how well they could be adapted into Rust though. Especially Rust’s ownership vs. borrowing and the possibility of side effects in Rust make this not quite straightforward.

regarding ownership, pattern matching in Rust usually has the capability to first only inspect the structure of the matched value to decide whether a match is successfull, but then it can move out of the matched value afterwards. (Perhaps for Deref* traits this could be solved by adding some DerefMove, but the interface for generalizing this to other kinds of [explicit!] "arbitrary code" in patterns is not quite clear.)
regarding side-effects, when you have a bunch of similar patterns in sequence (only differing in some deeply nested position), you wouldn’t want to repeat the calls to deref-methods for each match arm, but re-use the results instead. However, potential side-effects from a deref wouldn’t be repeated then which might be confusing (admitted, you probably wouldn’t want to have any real side-effects from Deref impls anyways).

CAD97 · November 11, 2020, 8:41pm

Speaking of degenerate Deref impls, I have a type made just to prove a point, where it

stores two Boxes,
Derefs alternatingly between them,
but also destroys and recreates the Boxes before dereffing them (using internal mutability) so the two different targets don't even stay in the same place.

It was written to show what kind of degenerate behavior unsafe has to allow for (thus what things unsafe traits like DerefStable or ErasablePtr are requiring you not do), but if the trait(s) to opt into derefs in patterns are not unsafe, I'll expect reasonable results from doing it with this degenerate type.

If custom-deref-code does run when matching a pattern, that would probably be an expected difference in the pattern Outer(deref MyBox( Enum::A | Enum::B )) and one with a root-level alteration. The root-level alteration I'd expect to semantically "start again" at the root of the match, whereas the deeper alteration would happen without retraversing the path to that place.

steffahn · November 11, 2020, 10:04pm

I tried "translating" some sufficiently complex example:

    struct S;
    // used to better distinguish the use of `Option` in the translation
    enum Maybe<T> {
        Just(T),
        #[allow(dead_code)]
        Nothing,
    }
    use Maybe::*;
    impl S {
        fn test1(&self) -> bool {
            false
        }
        fn test2(&self) -> bool {
            true
        }
        fn consume(self) -> i32 {
            1337
        }
    }
    let x = Box::new(Just(Box::new(S)));
    fn qux(_: &Box<S>) -> bool {
        false
    }
    fn bar(_: &Box<Maybe<Box<S>>>) -> bool {
        false
    }
    fn consume_box(_: Box<S>) -> i32 {
        -1000000
    }
    
    /*
    let _r = match x {
        b if bar(&b) => {drop(b); 42}
        deref Nothing => 123,
        deref Just(s) if qux(&s) => consume_box(s),
        b if bar(&b) => {drop(b); 420}
        deref Just(deref S) if false => 0,
        deref Just(deref s) if s.test1() => s.consume(),
        b if bar(&b) => {drop(b); 4200}
        deref Just(deref s) if s.test2() => s.consume(),
        _ => 0,
    }
    */
    // translation in playground linked below

(See the actual translation here.)

using a DerefMove trait. Bad Deref implementations like the ones @CAD97 described can/will lead to panics in this translation. Also note how it is possible to avoid duplicating any calls to deref.

I’m ignoring things like drop order here, so the translation might be not exactly what one would ultimately want here; also an actual compiler can probably use some thing like goto to avoid code duplication, so the trick using Option like I did may be not necessary.

NoamB · November 12, 2020, 3:53pm

Are trait fields still being discussed? If so, having a DerefPure trait with a target field, and having the box pattern access it would be a simple solution.

I would define trait fields as single-expression methods with allow-listed expression types (take ref, deref, field access, etc), but I'm sure there is prior art that discusses it more thoroughly.

rpjohnst · November 12, 2020, 4:48pm

That sounds less like trait fields and more like Swift Key-Paths. Which would be great to have in general, IMO, and perhaps trait fields ought to be defined in terms of them! It certainly makes pattern matching and DerefPure simpler.

ekuber · November 12, 2020, 5:59pm

Looking at this, would it not be a reasonable restriction (in order to avoid the introduction of DerefMove) to say "match ergonomics that go through auto-deref can only borrow and not modify the destructured value". That would handle 99% of the cases I've personally seen.

zackw · November 12, 2020, 7:15pm

Something else that feels related, to me, is Common Lisp's concept of "generalized references", which are an extensible mechanism for turning accessor functions (like (cadr xs)) into "places" that can be written to (e.g. (setf (cadr xs) v) replaces the head of the second element of xs). Being Lisp, it's all done with macros.

tema2 · November 12, 2020, 11:23pm

match value {
E::B { *owned_string: "" } => println!("empty string"),
E::B { owned_string: s } => println!("string: {}", s),
_ => {}
}

We could dereference field in pattern, meaning that we do deref(or its pure version) of the data. Is it confusing?

Another example:

let val = Rc::new(5u32);
match val {
     *a if a < 2 => println!("{} is lower than 2!",a),
     *a => println!("{} is greater than 2!",a),
}

dhm · November 13, 2020, 2:47pm

For completeness, @tema2, your suggestion would thus exhibit the following behavior:

match value {
    E::B { *owned_string: ref s } => { /* s: &str */ },
    …
}

I am not against, I actually find it the "destructuring dual" of &* for a String.

But if we go further:

struct Foo<'__> { field: &'__ String }

// How do we bind to a `str` / how do we `&**` ?
match value /* : Foo<'__> */ {
    Foo { **field: ref s } // ?
      // why not: `&**field: s` ?
    Foo { field: &&(ref s) } // ? (Using `&` as a `Deref` generalization)
}

The one that is most consistent with the current destructuring syntax would be the one in that last branch, and thus your original example would rather be:

match value {
    E::B { owned_string: &"" } => println!("empty string"),
    E::B { owned_string: s } => println!("string: {}", s),
    _ => {}
}

Design

The syntax would be to match <place> against a & <pat> to actually match <pat> against *<place> through Deref impls if needed (i.e., **&place), and &mut for the DerefMut case.

This means that current thing: box (ref x) could also be written thing: &(ref x).
- In that effect, the box pattern would be redundant except for its magic DerefMove capability, which, again, features parity with *-dereferencing a Box vs. *-dereferencing any other smart pointer.
For sure, some special trait would be needed, maybe even a lang item to begin with, so that the derefs are just performing basic pointer arithmetic, with a reasoning similar to structural_eq's design (may need the language to feature Deref{,Mut} derives to offer extensionability).

Advantages

Consistent with the current way destructuring pattern matching is performed (a.k.a. "backwards");
- no "methods noise" / problem of calling non-pure user-provided code within patterns;
It is overly reduced to begin with (lang item idea), which avoids having to deal with pathological designs; the motivating use case is mainly dealing with GADTs that use Box, Rc or Arc (or classic Rust references) for their pointer indirection + the Vec / String case too, all of which could be covered by a built-in lang-blessed construction.

Drawbacks

Gives more meaning to the & and &mut patterns than it currently has; while not technically breaking (AFAIS), it may be edition boundary worthy?
if the lang-item design is chosen, it would either require built-in #[derive(Deref{,Mut})] to be added, or the lang to be loosened to some carefully designed trait, in order to allow for extensionability over user-defined types. This implies that it may take a good while before that point is reached, whereas some more clever design could already feature such extensionability.
EDIT: if DerefMove were added to the language, what would be the pattern-destructuring syntax for it? There is no owned counterpart of & / &mut …

H2CO3 · November 14, 2020, 11:15am

At this point, this honestly looks like just another low-hanging fruit for refactoring, and certainly not warranting adding a language feature. The level of complexity with the interaction of method calls and implicit Derefs and collections and unsafe and and and and… shouldn't be disguised as a mere pattern match. I would consider that an anti-pattern (no pun intended).

tema2 · November 15, 2020, 1:38am

Just plain binding?: it performs a move.

It's purpose of proposed before DerefPure.

jjpe · November 15, 2020, 2:33pm

How would you choose between using Deref and DerefMove? For that matter how would you know which one is in effect?

dhm · November 15, 2020, 4:49pm

When talking about DerefMove, we are talking about moving the pointee out of its containing pointer:

let p = Box::new(String::new());
let s = *p; // `DerefMove` magic (currently only for `Box`).

In nightly Rust, you can move the pointee out of its containing pointer using the box pattern:

let p = Some(box String::new());
if let Some(box s) = p { /* s: String */ }

So my remark was regarding the extension of this box / *-move magic to other types: there has been some talk about it, it would, should it ever happen, lead to a DerefMove trait. In that case, we'd need an appropriate pattern to perform what box <pat-by-move> currently does.

For the non-move case, we can currently do:

let p = Box::new(String::new());
let s = &*p; // `s: &String`

If "we translate the place-ops on the RHS as their pattern-destructuring-ops on the LHS", we'd get:

let s = &( *p );
// `&` on a place => `ref` on a pattern.
let ref s = *p;
// read-`*` on a place => `&` on a pattern.
let &ref s = p;

which is legal syntax, and even legal semantics for let p = &String::new();

That's a good point, and one that should not be overlooked. Still, I think there is some wiggle room to allow a very limited set of this feature (e.g., at least for the very pervasive Box, Rc and Arc), so as not to worry about code being executed in a pattern while already greatly improving ergonomics:

What's being suggested in this thread is to reduce this disparity between &<place> and &mut <place> having pattern-destructuring equivalents (and box on nightly) but other stdlib pointers not having them.

What I am suggesting, in order to reduce churn, is to have the same extension that Deref let us apply to places (*-Deref or *-DerefMut) to also be applicable to pattern-destructuring, at least for #[structural_deref] types, in a similar fashion to us being able to pattern-match against constants of arbitrary types, provided they feature #[structural_eq]ality.

Whether that involves a a #[structural_deref] lang annotation, or a DerefPure trait is a detail that can be sorted out later on. The idea, is that, precisely because arbitrary code execution should not be happening on a pattern (in that regard I agree with you, @H2CO3), similar to Copy allowing bit-wise memcpy-es, I would expect some way to mark a wrapper around a pointer to express that it can be "bit-wise"-dereferenced, with no user-code involved.

It is thus:

actually filling a hole in the language (place "operators" can be arbitrarily nested, but pattern-destructuring ones can't);
such hole is an actually demanded feature; I've heard several times people in URLO that were used to working with GADTs in other languages complaining about how "cumbersome" it was having to do this in Rust, since indeed you often have to nest these matches because the *-operator can only be applied to a place, not within a pattern.

I personally do agree with that sentiment, and I know that if I were to work with a language using big GADTs, I would seriously consider using another language, just because of those ergonomics. Indeed, compare:
```
match place {
    SomeType::Variant {
        field: ref to_be_derefed, // `Box`, `Rc` or `Arc`
        other_stuff,
    } => match **to_be_derefed {
        SomeOtherType::OtherVariant { ref foo, ref bar } => {
            /* … */
        },
        _ => error(…, "expected a Variant { … OtherVariant }"),
    },
    _ => error(…, "expected a Variant { … OtherVariant }"),
}
```
vs.
```
match place {
    SomeType::Variant {
        field: &SomeOtherType::OtherVariant { ref foo, ref bar },
        other_stuff,
    } => {
        /* … */
    },
    _ => error(…, "expected a Variant { … OtherVariant }"),
}
```
We can observe excessive rightward drift in the former, and a duplication of the code handling the default case, which thus may need to suddenly be outlined just to counteract that. And that's without mentioning the case where we may want to inspect two or more fields inside Variant that happen to be enums too. We suddenly have to be doing things like match (**first, **snd) and tuple patterns start appearing where we used to have named fields.

The current ergonomics of the language in that regard are thus ~~horrendous~~ quite bad, so I don't think we should undermine efforts to improve them.

If we use this example to also compare how a destructuring-pattern op may "hide" stuff going on, vs. a place op, we can see that the outer * in **to_be_derefed has become a & in field: &SomeOtherType. Same amount of sigil(s), so they have the same "visibility" (again, provided we trust the & pattern op not to be running arbitrary code, of course, but that has already been ruled out in my proposal).

amosonn · November 16, 2020, 12:17am

Using some way to mark "reasonable" Deref-s sounds indeed a prerequisite to this (regardless of their flavour). However, even for reasonable implementations, I would find &x matching on what essentally isn't a &, i.e. Box(x) too surprising. In places we require indeed the * of &* to mark the use of Deref (albeit not in method call). The deref keyword seems a better balance to me: we still don't have arbitrary code running in matches, but we do mark the sites where some non-trival type-casts are made.

Regarding DerefMove, this seems like it would improve usability, but is quite orthogonal, so it can be discussed separately; for now, as mentioned above, there can be feature-parity with places, allowing moves out of boxes but not of other "smart pointers". This does require a bit of thinking regarding & vs deref etc, so that whatever solution is chosen to mark a moving deref in places, can be used in patterns.

One more thought, which seems relevant but I'm not entirely sure how (Maybe supporting my point above of not using & for these matches? Maybe weakening it?). The difference between &/&mut and other pointers is that they project, and the others don't: if I have a &x, I can get a &x.y. If I have a Box(x) though, I can get a &x and then a &x.y, but I can't get a Box(x.y). This means, on the one hand, that deref-ing is a destructive action that needs to be marked; but on the other hand, that once it is deref-ed, we only have &/&mut-s anyway, so we might as well match that way.

Topic		Replies	Views
Somewhat Random Idea: Deref patterns	37	5114	April 13, 2021
Making patterns more ergonomic language design	4	1054	March 25, 2019
Pre-RFC: View Patterns language design	11	2455	June 17, 2019
Match ergonomics 2024 poll language design	29	1373	October 1, 2024
Allow disabling of ergonomic features on a per-crate basis? language design	59	3991	March 25, 2019

Pre-pre-RFC: match ergonomics for container types — restricted method calls in patterns

Design

Advantages

Drawbacks

Related topics