Somewhat Random Idea: Deref patterns

Currently there is no way to match a type through a deref boundery, without introducing nested matches. The one exception is Box with feature(box_patterns), which I am not a fan of, because it makes Box inherently magic, similar to box_syntax.

I'm wondering if a feature could be supported to "transparently" derefence and match through the boundery in patterns? This would be much more general then box_patterns, and not make Box inherently magic as a result.

An example:

let v = "x".to_string();
match v{
   *"x" => println!("x"),
   _ => println!("Not x")
}

(The syntax is not necessarily final) (Yes, I know the above example is trivial. This less trivial examples would include when the Deref type is nested within a structure or variant)

Open Questions:

  • In which cases should we use DerefMut, or (the unstable) DerefMove, rather than simple Deref.
  • Does anyone have a nicer syntax we could use? To be honest, the syntax I showed above looks like it's dereferencing the string.
  • Would a different trait be required, one that promises structural stability of the pointee accross a Deref{,Mut,Move}?
1 Like

I would very much like a solution to this problem. I don't have a strong preference for any particular syntax.

Some past discussions:

  • RFC 462 proposed adding a deref keyword, by analogy with the ref keyword.
  • RFC issue 2099 proposed generalizing & patterns to work on any Deref type.
  • RFC 809 proposed generalizing the box keyword to work with types other than Box<T>. (See also this forum thread.)
9 Likes

Is there an example that motivates why any keyword/sigil (* / deref / & / box) is necessary for this?

Why is it not as simple as:

match /*String*/ {
    "x" => ...
    _ => ...
}

match /*Option<String>*/ {
    Some("x") => ...
    _ => ...
}

match /*Option<Box<Option<Box<String>>>>*/ {
    Some(Some("x")) => ...
    Some(Some(other)) => ... // other: Box<String>
    Some(None) => ...
    None => ...
}

I'd always figured this was where "match ergonomics" was going (generalizing from & / &mut to any Deref implementing type).

12 Likes

I'm personally a big fan of default binding modes, but I think it's also useful that match patterns aren't running user code. The big question is the interaction between or patterns and custom deref impls.

I believe the last time we discussed this someone suggested that it might be useful to treat this similarly to potential field access through traits.

Specifically, the "DerefPure" used to match against patterns wouldn't be allowed to run custom user callbacks. (Use an if let guard to add a sequence point for the side effect.) Instead, it would just be allowed essentially field accesses and reference/pointer derefs through some kind of "field path" system.

I wish I had a link to it, because I remember it being a lot more convincing than what I just fumbled through trying to remember it :upside_down_face:

On the other hand, Deref is already expected to be a "semipure" and "cheap" operation, since it's automatic in deref coercion already, so adding auto(de)ref to pattern matching contexts isn't that far of a departure from the status quo of Deref.

I expect a decent opposition to any proposals to add more auto(de)ref points to the language from "team explicit," the same as for default binding modes, though. (And to be clear, this is a good thing, if we want to find the best solution for localized clarity!)

My only real strong opinion is that a prefix keyword is probably the wrong choice for the majority use case. It's the same reason I dislike the bookkeeping required without default binding modes; I want to see the shape of the data, not have to tell the compiler that yeah, I have a reference (&) and I want to get a reference out (ref).

Is there a way to articulate what value users get from Rust not running user code in match patterns? What usefulness are you asserting?

I'm skeptical of an assertion like this that isn't backed up by actual value. People make that same assertion about all kinds of things in various languages: "I think it's useful that field access (.) does not run user code", "I think it's useful that builtin arithmetic operators (+) do not run user code."

7 Likes

I'm not certain, tbh. Before the last time this was discussed, I was full in on extending auto(de)ref to patterns, but I'm not super sure anymore.

The biggest one, though, is actually semantics of when and how often deref is called.

let b: MyBox<(Result<_,_>,)>;

match b {
    (Ok(_),) | (Err(_),) => (),
}
match b {
    (Ok(_) | Err(_),) => (),
}
match b {
    (Ok(_),) => (),
    (Err(_),) => (),
}

Are the patterns in the first two matches identical? At what point (and how many times) is the custom deref called? The answer to this isn't super clear, and I'd need a good answer before I'm super comfortable adding auto(de)ref in patterns.

If the first derefs once and the second derefs twice (I think the most consistent interpretation), is there a way to get only a single deref without the 1-tuple? I think you'd have to make another match to do so.

Also, yeah, exhaustiveness checking.

But I flip-flop on it constantly.

2 Likes

How would exhaustiveness checking work with such a pattern? It would seem there needs to be a catch-all branch in case Deref changes between applications. Or would it only be ran once but then what's the sequencing order and how do you ensure thet borrow checking if-clauses works? Dereferencing a Box is pure and does not have side effects hence the question does not arise for that pattern. (It's also clear for ., + etc). Or how does it work when two branches with deref pattern surround another branch without? Then the deref virtually must be executed twice.

5 Likes

I think that deref should be called a minimal number of times. At least with immutable references that means: at most once. For mutable ones you may need an immutable deref followed by a mutable one. Compare this comment:

Pre-pre-RFC: match ergonomics for container types — restricted method calls in patterns - #9 by steffahn

I also think a maximally ergonomic solution, potentially without keywords, sounds like a good idea. We should however try to flesh out the details of such an approach in order to be able to really evaluate it. One thing that comes to mind just now is: Wouldn't it be surprising if

match &whatever {
    Ok(foo) => { /* … */ }
    bar => { /* … */ }
}

resulted in foo being &T whereas bar still is &SmartPointer<Result<T, E>>? With current reference-only ergonomics, we have the much more consistent situation of bar always being &Result<T, E>.

7 Likes

I don't think there is necessarily an example. However, I think it would be a good idea to be explicit about when patterns do interesting stuff.

So the idea is that you would want something like that? Reading through your points, I actually agree. Would the issue be solved by a

pub unsafe trait DerefPure : Deref{}

Which makes the behaviour undefined if Deref{,Mut}::deref_{,mut} has observable behaviour or modifies any state read by the function or as a result of the pattern. Would this need to be an unsafe trait? It could be a safe one that makes it a logic error if the invariant does not hold. Then again, I'm not particularily a fan of logical invariants that aren't ensured by unsafe (even if the invariant need not necessarily lead to UB). I suppose if it's UB, then we can assume deref patterns can be exhaustive. In any case, we could leave it unspecified how many times deref is called, and if this alters observable side effects, that's the user's problem. The borrow-checking problem can be fun though.

I would love to have this.

I do think it needs a sigil of some kind to indicate the deref. I wouldn't want to see this happen without any indication, and in particular without the compiler forcing types to match exactly by default.

As an exception, doing this for literals only without a sigil seems fine. I don't see any harm in matching Option<String> against Some("hello"). My concern is primarily for inferring a deref'd type for a variable; if you match that using Some(x) you should always get a String.

I've wanted this many times for strings, for boxes, for vectors, and others. I would love to see a language MCP for this.

3 Likes

Unfortunately, I don't think limiting it to a single deref and a single deref_mut is possible, because match arms are semantically checked in order. It'd be possible to treat each "run" of immut or mut derefs with a single deref call (and without explicit ref/ref mut it'd always be a single run, I think?), but in the general case you have to fall back to calling deref/mut for each arm so that the arms can stay strictly sequential. Consider this example:

use rand::Rng;

fn its<T>(_: T) {}

fn main() {
    let x = &mut Ok::<_, ()>(0);
    let mut rng = rand::thread_rng();

    match x {
        Ok(ref     n) if  n % 2 == rng.gen() => its::<&i32>(n),
        Ok(ref mut n) if *n % 2 == rng.gen() => its::<&mut i32>(n),
        Ok(ref     n) if  n % 2 == rng.gen() => its::<&i32>(n),
        Ok(ref mut n) if *n % 2 == rng.gen() => its::<&mut i32>(n),
        Ok(ref     n) if  n % 2 == rng.gen() => its::<&i32>(n),
        Ok(ref mut n) if *n % 2 == rng.gen() => its::<&mut i32>(n),
        _ => {}
    }
}

(And yes, those are actually &i32, not just coerced &mut _ -> &_; changing the annotation fails to compile.)

It's for ordering reasons like this that I think this might need to be restricted to "DerefPure" types. Even if it's restricted to something super noisy like deref(Rc) $pat, a restriction to DerefPure still allows the compiler to collapse multiple derefs into one without changing the semantics, as well as know the value isn't changing underneath your feet for exhaustiveness checking.

Given that, would the following requirements for implementing DerefPure be ok?

Note that the above limits should allow almost every type in the rust standard library that implements Deref to also implement DerefPure. In particular:

  • All of the smart pointers (Box, Rc, Arc, as well as Ref and RefMut for RefCell) can implement DerefPure
  • All of the collections that deref into a slice-like type (Vec, String, CString, OsString, and PathBuf) can implement DerefPure This is permitted because intervening accesses through a mutable reference are allowed to affect address stability.
  • All lock guard types can implement DerefPure.
  • Pin<P> can implement DerefPure if P implements DerefPure. Note that, at least for P::Target: !Unpin, the address stability rule is obtained freely from the pinning guarantee.
  • ManuallyDrop<T> can implement DerefPure, as intervening moves are allowed to break address stability.
  • [Edit: After doing a quick check of Deref in std::ops - Rust, I've determined that every type in that list, other than https://doc.rust-lang.org/nightly/std/lazy/struct.Lazy.html, and https://doc.rust-lang.org/nightly/std/lazy/struct.SyncLazy.html, as well as Pin<P> for P: !DerefPure should be able to implement DerefPure. ]

With DerefPure defined in such a way, the compiler could perform as many or as few calls to deref and deref_mut as it wants, including only 1 call (by coercing the returned reference to a pointer, and dereferencing it as necessary, internally). (The structural stability guarantee also allows matching through a deref boundery to be exhaustive, and not require a wildcard pattern).

1 Like

I don’t know if you read my earlier post on how to translate these things, the one I linked above. But following the example there, a rough translation could look something like this. [I changed the setting to allow for any smart pointer and instantiated it with `Box`.]

The important part is that guards don’t allow mutable access to the variables. I did some testing and just found out that there’s still the problem that the variable n is still considered to be a mutable reference even inside of the guard (example playground), and it just doesn’t allow for n or *n to be mutated or moved out of inside of the guard. [Error messages are e.g. “cannot borrow `*n` as mutable, as it is immutable for the pattern guard”.] So my translation, as linked above, is still inaccurate.

I’m not sure whether e.g. casting (transmuting) a &&T to &&mut T is a safe thing to do, if it is then this last problem with the guards be easily solved, because then you’d just need to replace every n with *m where m: &&mut T is the result of such a cast.

Hm.. might not be enough yet, since, apparently, you can tell that the immutable reference you get to see in the guard is really the same one as you get in the body of the match arm. This is possible through them having the same lifetime; it’s e.g possible to keep immutable re-borrows from the guard around and get effects such as in the following example:

#![feature(box_patterns)]
fn foo() {
    let (mut x, y) = ((1,), 2);
    let mut x_r = Box::new(x);
    let mut y_r = &y;
    let _v = *y_r;
    match x_r {
        box (ref mut r,)
            if {
                y_r = &*r;
                true
            } => {
                let _v = *y_r;
                *r = 1;
                // not okay here
                // let _v = *y_r;
            }
        _ => {}
    }
}

This cannot be compiled in the fashion I suggested above.

On the other hand, what sane code needs to rely on the fact that a by-reference capture has a particularly long lifetime in a guard? What I’m saying is: I wouldn’t be sad if the above code stopped compiling for a deref-based pattern-matching feature (instead of the box pattern).


Edit: fun fact, I didn’t even know it was possible that multiple subsequent patterns in a guard can create conflicts in the borrow checker, but here we go

let mut x = Some(42);
let mut z = &0;
match &mut x {
    Some(ref y) if {
        z = y;
        false
    } => {}
    Some(ref mut _y) => {
        println!("{}", *z);
    }
    _ => {}
}
error[E0502]: cannot borrow value as mutable because it is also borrowed as immutable
  --> src/main.rs:9:10
   |
5  |     Some(ref y) if {
   |          ----- immutable borrow occurs here
...
9  |     Some(ref mut _y) => {
   |          ^^^^^^^^^^ mutable borrow occurs here
10 |         println!("{}", *z);
   |                        -- immutable borrow later used here

Also noteworthy is how this code

let mut x = Some(42);
let mut z = &0;
match &mut x {
    Some(ref y) if {
        z = y;
        false
    } => {}
    Some(ref _y) => {
        println!("{}", *z);
    }
    _ => {}
}

prints 42, yet still complains (incorrectly, I guess) about

   Compiling playground v0.0.1 (/playground)
warning: value assigned to `z` is never read
 --> src/main.rs:6:9
  |
6 |         z = y;
  |         ^
  |
  = note: `#[warn(unused_assignments)]` on by default
  = help: maybe it is overwritten before being read?

That's a quite complex trait, and are we sure the requirements—some of which are unprecedented in other unsafe traits—make it sound with respect to all borrow checking rules? How does that complexity compare to if let in match-arms which offers at least superficially a comparable functionality? That already is implemented in nightly:

let v = "x".to_string();
match v {
   v if let "x" = &*v => println!("x"),
   _ => println!("Not x")
}

And on multiple branches/Deref we can probably rely on the optimizer to detect the pure function and hoist its application, without changes to the memory or execution model of Rust.

I think it should be implemented as a binding mode (in addition to ref and ref mut), for example:

Pattern Equivalent expression
ref x &x
ref mut x &mut x
deref x *x
deref deref x **x
ref deref x &*x
ref mut deref x &mut *x

Deref coercion should only apply to literal patterns and patterns with type ascription:

match s1: String {
    ref "test" => {} // equivalent to `ref deref "test"`
    ...
}
match s2: &String {
    s: &str => {} // equivalent to `&(ref deref s)`
    ...
}
3 Likes

I think you’re mixing up the order here. It should be deref ref x and deref ref mut x. Similar to how & ref x and &mut ref mut x work today.

1 Like

A Zulip thread for this has been opened. I'm going to cross-link it here: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Deref.20patterns

2 Likes

And for those that find Zulip’s interface unusable and are just reading along: https://zulip-archive.rust-lang.org/213817tlang/30328Derefpatterns.html

1 Like
match outputs::<Option<String>>() {
    opt @ Some("example") => {
        assert_eq!(opt, Some("example")); // Error, mismatched types
    },
    ...
}

I don't understand asking to start with something which hides semantics: while terseness can come in handy when a pattern has become pervasive enough, people need to have integrated the "verbose" pattern first (I'm pretty sure there is a quote out there that phrases this better). Many programmers already struggle with patterns, and using sugar to "hide" its complexity usually backfires the moment a more subtle situation happens, such as the one above. Granted, deref, is a quite verbose annotation (I wouldn't like having to write deref "…" | deref "…" etc.), but I do think that starting with some sigil is the best way to make the whole process more likely to be accepted and integrated (the language would greatly benefit from the feature being added in some form, and I find that jumping straight to sugar will just slow the whole thing down due to all these corner cases and clarity considerations).

Then, once the need for sugar outweighs the effort needed to clarify and handle all the corner cases that it causes, as well as the surprising situations that it leads to, a follow-up change could be considered, suggested, and discussed.

For instance, I personally see that opt @, in the example above, must necessarily bind to the original expression (hence causing the type error), whereas other programmers have expressed that the "natural" situation there for them is for opt @ to bind to the deref'd pattern (and thus have opt: Option<&str>).

  • For those thinking that with the latter interpretation everything Just Works™, and that I am the one wrong, consider enum NonGeneric { Some(String), … }, and tell me what the type of opt @ NonGeneric::Some("foo") would then be.

The situation with no sigils is thus ambiguous and not yet intuitive, so it is, at the very least, a decision that should be left for a follow-up change: let's first have deref patterns with some sigil / annotation for a few months before discussing about further changes.

FWIW, taking &, a 1-long sigil, to represent deref, the examples I've mentioned are less confusing / suprising, and the incurred "sigil noise" remains low:

match outputs::<Option<String>>() {
    opt @ Some(&"example") => {
        assert_eq!(opt, Some("example")); // Error, mismatched types
    },
    ...
}
// as well as:
    opt @ NonGeneric::Some(&"example") => …
//                         ^^^^^^^^^^
//             hints at `Some(impl Deref<Target = str>)`,
//             which is correct, contrary to `Some(&str)`.
1 Like