Pre-pre-RFC: match ergonomics for container types — restricted method calls in patterns

Pattern matching expressions are certainly a good fit for this case, but it wouldn't help with the cases that I had in mind where I would like in a single step to either destructure something behind deref (values behind P<T> also comes to mind) or be able to take advantage of the slice destructuring syntax on Vecs, to give two examples. But again, I would also like to see the pattern matching expressions incorporated into the language as I feel they serve their own purpose as well.

That's fair! I came up with it while trying to keep in line with the rest of the language and without introducing new keywords, as I was walking the dog :sweat_smile:

I fear that that due to the possibility of arbitrary code to be executed in autoderef that having that happen in patterns implicitly could make reasoning about the performance characteristics of the match expression harder to do, but at the same time my fears might be completely unfounded. We could also have an initial implementation and try things out in nightly before settling on one behavior or another.

1 Like

Putting method calls in patterns this way (limited set or no) seems like an inversion of what patterns normally mean- calls are expressions, using them in patterns would really only make sense if their meaning could somehow be "reversed," the same way constructors used in patterns become destructors. box patterns follow this- the expression box x constructs a Box containing x, and the pattern box x deconstructs the Box.

I suspect deref or similar is the best direction here, in the long term. Deref is the closest thing to the "reverse" of Box::new/Rc::new/etc. There is also more previous discussion than the match ergonomics RFC: https://github.com/rust-lang/rfcs/pull/1646.

There, the concern about arbitrary code execution is addressed via a new DerefPure marker trait. Generally the ecosystem is already pretty good about implementing Deref in "reasonable" ways, so something like this seems like it could work. Perhaps today we might consider const impl Deref instead?

IIUC another issue with "just using Deref" is moving things out from behind the pointer. Previously I've seen proposals for a DerefMove trait to go along with Deref and DerefMut, though I don't know if that's been as fully fleshed out anywhere.

3 Likes

For some prior art, Haskell’s view patterns and pattern synonyms come to mind here. I’m not sure how well they could be adapted into Rust though. Especially Rust’s ownership vs. borrowing and the possibility of side effects in Rust make this not quite straightforward.

  • regarding ownership, pattern matching in Rust usually has the capability to first only inspect the structure of the matched value to decide whether a match is successfull, but then it can move out of the matched value afterwards. (Perhaps for Deref* traits this could be solved by adding some DerefMove, but the interface for generalizing this to other kinds of [explicit!] "arbitrary code" in patterns is not quite clear.)
  • regarding side-effects, when you have a bunch of similar patterns in sequence (only differing in some deeply nested position), you wouldn’t want to repeat the calls to deref-methods for each match arm, but re-use the results instead. However, potential side-effects from a deref wouldn’t be repeated then which might be confusing (admitted, you probably wouldn’t want to have any real side-effects from Deref impls anyways).
5 Likes

Speaking of degenerate Deref impls, I have a type made just to prove a point, where it

  • stores two Boxes,
  • Derefs alternatingly between them,
  • but also destroys and recreates the Boxes before dereffing them (using internal mutability) so the two different targets don't even stay in the same place.

It was written to show what kind of degenerate behavior unsafe has to allow for (thus what things unsafe traits like DerefStable or ErasablePtr are requiring you not do), but if the trait(s) to opt into derefs in patterns are not unsafe, I'll expect reasonable results from doing it with this degenerate type.


If custom-deref-code does run when matching a pattern, that would probably be an expected difference in the pattern Outer(deref MyBox( Enum::A | Enum::B )) and one with a root-level alteration. The root-level alteration I'd expect to semantically "start again" at the root of the match, whereas the deeper alteration would happen without retraversing the path to that place.

2 Likes

I tried "translating" some sufficiently complex example:

    struct S;
    // used to better distinguish the use of `Option` in the translation
    enum Maybe<T> {
        Just(T),
        #[allow(dead_code)]
        Nothing,
    }
    use Maybe::*;
    impl S {
        fn test1(&self) -> bool {
            false
        }
        fn test2(&self) -> bool {
            true
        }
        fn consume(self) -> i32 {
            1337
        }
    }
    let x = Box::new(Just(Box::new(S)));
    fn qux(_: &Box<S>) -> bool {
        false
    }
    fn bar(_: &Box<Maybe<Box<S>>>) -> bool {
        false
    }
    fn consume_box(_: Box<S>) -> i32 {
        -1000000
    }
    
    /*
    let _r = match x {
        b if bar(&b) => {drop(b); 42}
        deref Nothing => 123,
        deref Just(s) if qux(&s) => consume_box(s),
        b if bar(&b) => {drop(b); 420}
        deref Just(deref S) if false => 0,
        deref Just(deref s) if s.test1() => s.consume(),
        b if bar(&b) => {drop(b); 4200}
        deref Just(deref s) if s.test2() => s.consume(),
        _ => 0,
    }
    */
    // translation in playground linked below

(See the actual translation here.)

using a DerefMove trait. Bad Deref implementations like the ones @CAD97 described can/will lead to panics in this translation. Also note how it is possible to avoid duplicating any calls to deref.

I’m ignoring things like drop order here, so the translation might be not exactly what one would ultimately want here; also an actual compiler can probably use some thing like goto to avoid code duplication, so the trick using Option like I did may be not necessary.

1 Like

Are trait fields still being discussed? If so, having a DerefPure trait with a target field, and having the box pattern access it would be a simple solution.

I would define trait fields as single-expression methods with allow-listed expression types (take ref, deref, field access, etc), but I'm sure there is prior art that discusses it more thoroughly.

1 Like

That sounds less like trait fields and more like Swift Key-Paths. Which would be great to have in general, IMO, and perhaps trait fields ought to be defined in terms of them! It certainly makes pattern matching and DerefPure simpler.

2 Likes

Looking at this, would it not be a reasonable restriction (in order to avoid the introduction of DerefMove) to say "match ergonomics that go through auto-deref can only borrow and not modify the destructured value". That would handle 99% of the cases I've personally seen.

Something else that feels related, to me, is Common Lisp's concept of "generalized references", which are an extensible mechanism for turning accessor functions (like (cadr xs)) into "places" that can be written to (e.g. (setf (cadr xs) v) replaces the head of the second element of xs). Being Lisp, it's all done with macros.

match value {
E::B { *owned_string: "" } => println!("empty string"),
E::B { owned_string: s } => println!("string: {}", s),
_ => {}
}

We could dereference field in pattern, meaning that we do deref(or its pure version) of the data. Is it confusing?

Another example:

let val = Rc::new(5u32);
match val {
     *a if a < 2 => println!("{} is lower than 2!",a),
     *a => println!("{} is greater than 2!",a),
}
1 Like

For completeness, @tema2, your suggestion would thus exhibit the following behavior:

match value {
    E::B { *owned_string: ref s } => { /* s: &str */ },
    …
}

I am not against, I actually find it the "destructuring dual" of &* for a String.


But if we go further:

struct Foo<'__> { field: &'__ String }

// How do we bind to a `str` / how do we `&**` ?
match value /* : Foo<'__> */ {
    Foo { **field: ref s } // ?
      // why not: `&**field: s` ?
    Foo { field: &&(ref s) } // ? (Using `&` as a `Deref` generalization)
}

The one that is most consistent with the current destructuring syntax would be the one in that last branch, and thus your original example would rather be:

match value {
    E::B { owned_string: &"" } => println!("empty string"),
    E::B { owned_string: s } => println!("string: {}", s),
    _ => {}
}

Design

  1. The syntax would be to match <place> against a & <pat> to actually match <pat> against *<place> through Deref impls if needed (i.e., **&place), and &mut for the DerefMut case.

    This means that current thing: box (ref x) could also be written thing: &(ref x).

    • In that effect, the box pattern would be redundant except for its magic DerefMove capability, which, again, features parity with *-dereferencing a Box vs. *-dereferencing any other smart pointer.
  2. For sure, some special trait would be needed, maybe even a lang item to begin with, so that the derefs are just performing basic pointer arithmetic, with a reasoning similar to structural_eq's design (may need the language to feature Deref{,Mut} derives to offer extensionability).

Advantages

  • Consistent with the current way destructuring pattern matching is performed (a.k.a. "backwards");

    • no "methods noise" / problem of calling non-pure user-provided code within patterns;
  • It is overly reduced to begin with (lang item idea), which avoids having to deal with pathological designs; the motivating use case is mainly dealing with GADTs that use Box, Rc or Arc (or classic Rust references) for their pointer indirection + the Vec / String case too, all of which could be covered by a built-in lang-blessed construction.

Drawbacks

  • Gives more meaning to the & and &mut patterns than it currently has; while not technically breaking (AFAIS), it may be edition boundary worthy?

  • if the lang-item design is chosen, it would either require built-in #[derive(Deref{,Mut})] to be added, or the lang to be loosened to some carefully designed trait, in order to allow for extensionability over user-defined types. This implies that it may take a good while before that point is reached, whereas some more clever design could already feature such extensionability.

  • EDIT: if DerefMove were added to the language, what would be the pattern-destructuring syntax for it? There is no owned counterpart of & / &mut

At this point, this honestly looks like just another low-hanging fruit for refactoring, and certainly not warranting adding a language feature. The level of complexity with the interaction of method calls and implicit Derefs and collections and unsafe and and and and… shouldn't be disguised as a mere pattern match. I would consider that an anti-pattern (no pun intended).

1 Like

Just plain binding?: it performs a move.

It's purpose of proposed before DerefPure.

How would you choose between using Deref and DerefMove? For that matter how would you know which one is in effect?

When talking about DerefMove, we are talking about moving the pointee out of its containing pointer:

let p = Box::new(String::new());
let s = *p; // `DerefMove` magic (currently only for `Box`).

In nightly Rust, you can move the pointee out of its containing pointer using the box pattern:

let p = Some(box String::new());
if let Some(box s) = p { /* s: String */ }

So my remark was regarding the extension of this box / *-move magic to other types: there has been some talk about it, it would, should it ever happen, lead to a DerefMove trait. In that case, we'd need an appropriate pattern to perform what box <pat-by-move> currently does.


For the non-move case, we can currently do:

let p = Box::new(String::new());
let s = &*p; // `s: &String`

If "we translate the place-ops on the RHS as their pattern-destructuring-ops on the LHS", we'd get:

let s = &( *p );
// `&` on a place => `ref` on a pattern.
let ref s = *p;
// read-`*` on a place => `&` on a pattern.
let &ref s = p;

That's a good point, and one that should not be overlooked. Still, I think there is some wiggle room to allow a very limited set of this feature (e.g., at least for the very pervasive Box, Rc and Arc), so as not to worry about code being executed in a pattern while already greatly improving ergonomics:

What's being suggested in this thread is to reduce this disparity between &<place> and &mut <place> having pattern-destructuring equivalents (and box on nightly) but other stdlib pointers not having them.

What I am suggesting, in order to reduce churn, is to have the same extension that Deref let us apply to places (*-Deref or *-DerefMut) to also be applicable to pattern-destructuring, at least for #[structural_deref] types, in a similar fashion to us being able to pattern-match against constants of arbitrary types, provided they feature #[structural_eq]ality.

  • Whether that involves a a #[structural_deref] lang annotation, or a DerefPure trait is a detail that can be sorted out later on. The idea, is that, precisely because arbitrary code execution should not be happening on a pattern (in that regard I agree :100: with you, @H2CO3), similar to Copy allowing bit-wise memcpy-es, I would expect some way to mark a wrapper around a pointer to express that it can be "bit-wise"-dereferenced, with no user-code involved.

It is thus:

  • actually filling a hole in the language (place "operators" can be arbitrarily nested, but pattern-destructuring ones can't);

  • such hole is an actually demanded feature; I've heard several times people in URLO that were used to working with GADTs in other languages complaining about how "cumbersome" it was having to do this in Rust, since indeed you often have to nest these matches because the *-operator can only be applied to a place, not within a pattern.

    I personally do agree with that sentiment, and I know that if I were to work with a language using big GADTs, I would seriously consider using another language, just because of those ergonomics. Indeed, compare:

    match place {
        SomeType::Variant {
            field: ref to_be_derefed, // `Box`, `Rc` or `Arc`
            other_stuff,
        } => match **to_be_derefed {
            SomeOtherType::OtherVariant { ref foo, ref bar } => {
                /* … */
            },
            _ => error(…, "expected a Variant { … OtherVariant }"),
        },
        _ => error(…, "expected a Variant { … OtherVariant }"),
    }
    

    vs.

    match place {
        SomeType::Variant {
            field: &SomeOtherType::OtherVariant { ref foo, ref bar },
            other_stuff,
        } => {
            /* … */
        },
        _ => error(…, "expected a Variant { … OtherVariant }"),
    }
    

    We can observe excessive rightward drift in the former, and a duplication of the code handling the default case, which thus may need to suddenly be outlined just to counteract that. And that's without mentioning the case where we may want to inspect two or more fields inside Variant that happen to be enums too. We suddenly have to be doing things like match (**first, **snd) and tuple patterns start appearing where we used to have named fields.

    The current ergonomics of the language in that regard are thus horrendous quite bad, so I don't think we should undermine efforts to improve them.

    If we use this example to also compare how a destructuring-pattern op may "hide" stuff going on, vs. a place op, we can see that the outer * in **to_be_derefed has become a & in field: &SomeOtherType. Same amount of sigil(s), so they have the same "visibility" (again, provided we trust the & pattern op not to be running arbitrary code, of course, but that has already been ruled out in my proposal).

7 Likes

Using some way to mark "reasonable" Deref-s sounds indeed a prerequisite to this (regardless of their flavour). However, even for reasonable implementations, I would find &x matching on what essentally isn't a &, i.e. Box(x) too surprising. In places we require indeed the * of &* to mark the use of Deref (albeit not in method call). The deref keyword seems a better balance to me: we still don't have arbitrary code running in matches, but we do mark the sites where some non-trival type-casts are made.

Regarding DerefMove, this seems like it would improve usability, but is quite orthogonal, so it can be discussed separately; for now, as mentioned above, there can be feature-parity with places, allowing moves out of boxes but not of other "smart pointers". This does require a bit of thinking regarding & vs deref etc, so that whatever solution is chosen to mark a moving deref in places, can be used in patterns.

One more thought, which seems relevant but I'm not entirely sure how (Maybe supporting my point above of not using & for these matches? Maybe weakening it?). The difference between &/&mut and other pointers is that they project, and the others don't: if I have a &x, I can get a &x.y. If I have a Box(x) though, I can get a &x and then a &x.y, but I can't get a Box(x.y). This means, on the one hand, that deref-ing is a destructive action that needs to be marked; but on the other hand, that once it is deref-ed, we only have &/&mut-s anyway, so we might as well match that way.

1 Like

This is an interesting point.

Note that in C++, you can project shared_ptrs, because they allow an aliasing constructor (#8). Arc/Rc just made different design tradeoffs.

So it might be something valuable to keep even if things like Box can't.

Oh -- and pin-project is a thing, which would be an interesting thing to have work through this feature...

That actually sounds quite cool! Of course, then you need more complex syntax, to specify what kind of projection you want - or alternatively, making the "cast" (via Deref, or maybe even AsRef) explicit, so that the projection is simply that of the type which you currently matched on.

I think it would just be leaning in to binding modes. So if you match a Pin<T>, you'd get out Pin<U>s, the same as how if you match a &T you get &Us. (That would need GATs, I assume, to actually make work well.) And if you want to get &Us instead of SharedPtr<U>s, then you'd match on &*p or whatever.

I suppose that wouldn't solve the match-a-Box<T> problem that was the original point of this thread, though :confused: