[Pre-RFC] Unify references and make them generic over mutability

Motivation

Currently, because Rust has two distinct reference types, a common pattern exists, where many types have get and get_mut accessor functions, which most of the time are exact copies of each other, only differing in mutability. This is some unwanted code-duplication, which this is supposed to address.

Unified References

Instead of being two separate types, both types of references could be variants of one generic reference type that is generic over mutability.

Conceptually (not real Rust code), this could look something like:

struct Reference<'t, const MUTABILITY: bool, T> {/* ... */}
type &'t T = Reference<'t, false, T>;
type &'t mut T = Reference<'t, true, T>;

No Duplicate Code Generation

Preferebly, mutability parameters would be similar to lifetimes in the way that the compiler guarantees that it won't emit two different functions for calls only differing in mutability.

Misusing const M: bool like above would therefore be problematic. This would allow for implementing different behavior depending on mutability inside a function that is generic over mutability, so we will have to introduce a new type of generic argument for mutability.

Conceptually (still not real Rust code), this could look something like:

struct Reference<'t, mut M, T> {/* ... */}
type &'t T = Reference<'t, const, T>;
type &'t mut T = Reference<'t, mut, T>;

Proposed Syntax

To do anything useful with generic unified references, we would need some accompanying syntax for taking borrows generic on mutability. It comes natural to have this syntax analogous to &mut foo:

struct T(u32);
impl T {
    fn get<mut M>(&M self) -> &M u32 {
        &M self.0
    }
}

Borrowing Rules

Used generically, a reference would have the restrictions of both &mut T and &T:

let t = T::new();
let t: &M T = &M t;

// Not Copy/Clone, because it might be a unique reference:
// let t = *&t; // Error

// Not writable of course because it might be shared reference
// *t = T::new(); // Error

// Cannot be mutably borrowed, because it might be shared
// let _: &mut T = &mut *t; // Error

// It can however be generically borrowed:
let a = &M T = &M *t;

// But only one active generic borrow, similar to mutable borrows:
// let b = &M T = &M *t; // Error
a.foo();

// And of course you can have multiple shared borrows active:
let a = &*t;
let b = &*t;
a.foo();

Misc

Another use could be an iterator that is generic over mutability:

struct MyIterator<'t, mut M, T> {
    /* */
}
impl<'t, mut M, T> Iterator for FooIterator<'t, M, T> {
    type Item = &'t M T;
    /* ... */
}
1 Like

There is a reason why Rust doesn't already have it: shared references are Copy and support subtyping, and exclusive references require reborrowing and are invariant. These are semantic requirements that are quite different. Rust's generics are checked at definition time, not at use time like macros, so code generic over mutability would have to be maximally restrictive, and you wouldn't be able to unify code beyond simplest cases.

This is especially true for mutable iterators, which often already don't work with borrow checking and require unsafe to override behavior of references.

10 Likes

If you find a pattern like this helpful you can always write it yourself like this:

pub enum Reference<'a, T> {
    Mut(&'a mut T),
    Ref(&'a T)
}
1 Like

I'd love to see something like this to happen.

I'm actually currently working on a crate that would allow this on the type level (it is not yet published). It will also need some syntax support or macro magic, and it would mostly be useful only when several other libraries start using it.

This does prevent merging shared and mutable references into one type, but it doesn't prevent the existence of a third reference type (generic). The generic version would have to be reborrowed like a mutable reference, and couldn't be used for mutation like a shared reference.

Only the code that "knows" the mutability of the reference would be able to use it without restrictions.

I don't understand how this is a problem at all. This here is already possible:

struct T<const M: bool>;
impl Copy for T<false> {}
impl Clone for T<false> {
    fn clone(&self) -> Self {*self}
}

Thank you for mentioning this. As in the other cases I mentioned, using a reference generically would of course mean being the most restrictive, so it wouldn't have subtyping when used generically.

However, I don't see this as something making it impossible to unify references. Why would the compiler not be able to implement this for a specific generic argument and not another one? As far as I'm aware this isn't something that is expressible in normal syntax and is purely implemented inside the compiler in the first place.

I already mentioned this being very restrictive in my proposal, but those simple cases are exactly the ones I would like this to support.

Alright, but this isn't something this was meant to solve. However, once you have the mutable iterator written using unsafe code, there then wouldn't be any reason to write a separate one for shared references that I'm aware of.

1 Like

This is a very different pattern from what I proposed though. This would be still be completely lacking any syntax support for borrowing, there would be no guarantee from the compiler for not duplicating code (or worse introducing lots of runtime checks) and the type being passed is twice the size of an ordinary reference.

I'm sure you could do something similar with, but it would likely be worse in every aspect than just duplicating the code like it's being done right now.

Something better would already be possible currently by introducing a trait and implementing it for both shared and mutable references. However, it still lacks the syntax support for borrowing (which can be implemented by macros of course) and still has no guarantee for the compiler not to duplicate code.

That is probably what this is doing:

This is something I'm also considering writing just to try things out. However, I agree, solutions like that would not be very useful until adapted widely and especially by the standard library.

The problem is you can't use T<M> and have it Copy. You can only use Copy when you specifically limit it to only T<false>, which takes away ability to use the same code with T<true>. This is equivalent to just using & instead of being generic over lifetimes. Generic code, as long as it's actually generic and supports both types of references, will have to be maximally restricted — no Copy, and require reborrowing.

2 Likes

While that is true, it is actually enough that T<false> is Copy (or, equivalently, you can convert T<false> into a true &-reference).

The reason is that the main (only?) use case of generic mutability is the ability to create library functions that map GenRef<'a, M, T> into GenRef<'a, M, U> (instead of having to create separate &'a T to &'a U and &'a mut T to &'a mut U variants). The caller code would create a generic reference from a normal one, call one or more generic mappers on it, then cast back to the reference type it started with. Note that the caller code "remembers" the mutability all the way through, so it can use the "partially implemented" conversion methods. It is the library code that has the generic and hence restricted variant.

Oh, I have to show you my implementation, maybe that will clarify it. I'll try to get it into a "shareable" state and link it here sooner or later.

Alright. But that is just what I had already mentioned in the original post and is exactly how this is supposed to work. I thought you had seen some additional problems arising from that that I hadn't considered.

:melting_face: I feel this is kind of feature that does has its niche, but the niche is so small that it's not worth it to change the language. I don't think you can unify non-trivial cases without introducing more confusion.

Plus I think it's even more error prone when you compose unsafe with this feature. Not every unsafe implementation around &mut can be translated to & without error. Unsafe code might be relying on &mut being unique!

Some other thing I think need to address:

  1. I would assume get uses immutable reference from current code convention. This proposal breaks it.
  2. This affects method resolution and type inference (and probably borrow checking).
3 Likes

I would like to disagree with this being such a small niche. The get get_mut pair pattern is virtually everywhere and we haven't even begun to explore other uses for this.

I would consider something like a hashmap lookup to be a non-trivial case that could be implemented using this pattern.

Regarding confusion, I can't really say until I see an example where this causes confusion.

Using unsafe always requires special consideration. If the code relies on the reference being unique, this pattern simply cannot be used and would not be the correct tool. I don't see this as a good argument against though.

Those are good points and need consideration.

First of all, purely this feature, without any other changes to std, would not change any behavior of existing code and would still be useful for library implementations.

I was going to suggest to implement changes to getter-functions in std in a new edition, but thinking about this again, I think the solution is very straightforward and will be able to guarantee that no current code will break.

What has to be done is to always default mutability to const when not explicitly stated. This is similar to how variable definitions always default to const as the safer and less surprising option and mut has to be stated explicitly and will not be inferred.

Let's consider some code. The old pattern is:

struct Foo(u32);
impl Foo {
    fn get(&self) -> &u32 {&self.0}
    fn get_mut(&mut self) -> &u32 {&mut self.0}
}

The new pattern would be:

struct Foo(u32);
impl Foo {
    fn get<mut M>(&M self) -> &u32 {&M self.0}
    // Can also provide this for backwards-compatibility:
    fn get_mut(&mut self) -> &u32 {self.get::<mut>()}
}

It should always be required to explicitly use the mutable version:

let mut foo = Foo(1);
foo.get::<mut>();
Foo::get::<mut>(&mut foo)

If we let the compiler infer the mutability instead we will indeed run into problems. I think the two problematic cases would be those:

let mut foo = Foo(1);
foo.get(); // Ambiguous! Is this passing &mut self or &self?
Foo::get(&mut foo); // Bad! Different behavior than currently.

When inferring the mutability, the first call would be ambiguous, as both &foo and &mut foo would be eligible to be passed as the self parameter.

Using the old pattern, the second call will coerce the mutable reference to a shared reference, returning a shared reference that is a mutable borrow though.

With the new pattern, when inferring the mutability, this will now pass a mutable reference returning a mutable reference that is of course a mutable borrow. This would be a change in behavior.

However, when defaulting the mutability to const, nothing changes from the current behavior.

Maybe there are some other cases that I haven't considered though? I would like to see an example where this actually breaks current behavior.

There may be some consideration when a function this applies to already has other generic parameters. Calls to those would break if the arguments are explicitly stated, because they would now be missing the mutability parameter that was added. But I would argue that in the most cases this will not be the case and there could be a work-around by introducing a separate function with a different name until a new edition is released.

Is ambiguous mutability really a problem though?

In this case, get should be inferred as get<const> otherwise the second println would not work.

let mut foo = Foo(1);
let r1 = foo.get();
let r2 = r1;
println!("r1 = {}", r1);
println!("r2 = {}", r2);

And in this one get must be infered as get<mut> otherwise the assignement doen't work.

let mut foo = Foo(1);
let r = foo.get(); // EDIT: I forgot if `r` must be declared as `let` or `let mut`
r = 3;

And obviously this code doesn't compile since r1 needs to a mutable borrow for the assignement, but r2 is still alive during the assignment which would require both to be immutable.

let mut foo = Foo(1);
let r1 = foo.get();
let r2 = r1;
r1 = 3;
println!("r2 = {}", r2);

If the result is that there's still two functions, just now they're spelled get() and get::<mut>() instead of get() and get_mut(), that's a pretty unambiguous downgrade for the caller side.

Prior art: bitvec heavily uses wyz::comu to track mutability at the type level, but primarily for raw pointers and entirely as an implementation detail.

1 Like

That's hard to argue with indeed. The improvements would be mostly on the implementing side and code-generation.

On the other hand, get_mut can still be supported if that's preferred on the caller side and just directly forward the call to get::<mut>. This would still benefit from the guarantees about code-generation and if it's not a completely trivial case, like my example, also reduce code-duplication (Even if it doesn't look like it's helping in the trivial case, providing the generic implementation in the trivial case is what enables building more complex things with it somewhere else).

Other than getter-functions, I think this can also be useful elsewhere though. I already mentioned the iterator, but I'll need to come up with an actual complete example for that.

The point is that while inferring mutability might be possible, this could lead to some surprising outcomes.

This is why we also want to explicitly state mut in a let-binding.

Now thinking of that, this will also need to add a generic let M foo binding, which i haven't really considered yet.

I forgot if the deducing this proposal for C++23 was mentioned as prior reference.

At the risk of appearing self-important, rather than repeat myself, I'll link to myself.

tl;dr - there really isn't all that much useful code that can abstract over mutability. Unifying & and &mut is at best a mild improvement in trivial cases, and no improvement or even harmful in nontrivial cases, so the motivation for a new first-class language feature is thin.

4 Likes

Thank you for linking that previous discussion.

I've seen multiple times now the argument that this is such a niche feature that it's not worth it. I don't agree, especially as it keeps being brought up and as this type of code-duplication is virtually everywhere.

Your example over there is an excerpt from BTreeMap which uses some internal methods called reborrow and borrow_mut, arguing that those are fundamentally different, preventing something like this to be useful.

Interestingly, within the BTreeMap implementation there is already an unsafe reborrow_mut function with a FIXME attached that basically states that the unsafety of that function is due to other methods implemented on the type and not a fundamental problem and comes with a suggestion on how to fix that: reborrow_mut which is bascially identical to reborrow

It seems to me the difficulty with that specific implementation of BTreeMap is that there are two conceptually distinct ways to modify a map like that: You can either modify the structure by adding or removing values, or you can just mutate existing values. The internal implementation details fail to distinguish those two cases and thus ends up with more unsafe functions than actually required. This feature would of course only be useful for mutating existing values.

I don't think that was a good counterexample and to the contrary, I believe this is exactly the type of code that will benefit from this.

And it's not really the issue that there are two different functions in this one place, it that was all I wouldn't care at all. The problem is that this trickles down the entire hierarchy of abstractions, causing code-duplication everywhere along the way, adding up to a way larger amount of code-duplication.

As for the argument that it is hard to write correct unsafe code that plays together with this feature, I don't get that at all. The requirements to the unsafe code are very straightforward.

The argument you quoted there from yet another different discussion is about an iterator not unlike what I brought up as an example for how this could be useful. The argument being that there is a specific mistake that could possibly be made in the implementation that would lead to just being semantically incorrect on an iterator returning shared references, but would be unsound for one returning mutable references and therefor it's not a good idea to unify both cases.

I think that is an extremely bad argument as it doesn't talk about the correct implementation (which can absolutely be unified this way), but a theoretical mistake that someone could possibly make leading to different outcomes. Of course we don't want to unify the theoretical, incorrect iterator, but the actual, correct one! And not having both cases unified will not save us from the consequences of making that specific mistake when implementing the mutable version, the mistake will be exactly as bad without.

Also from that same post,

So far I have not seen this demonstrated. If you believe it is possible to add parametric mutability to simplify BTreeMap, rather than just telling me I'm making an extremely bad argument, show me how!

So far I have not seen this demonstrated. If you believe it is possible to add parametric mutability to simplify BTreeMap, rather than just telling me I'm making an extremely bad argument, show me how!

I'm sorry that I came across like that, but please read again! The thing I called an extremely bad argument wasn't your argument, but just something you had quoted from somewhere else and was in no way related to your example about the BTreeMap implementation and I'd be happy to help with improving that!

What I did call an extremely bad argument though was what you had quoted from someone else from a previous discussion. The argument and example being presented there is completely unrelated to being generic over mutability. Generic mutability won't cause the issue being presented there, it won't make it more likely to happen and it won't make it more severe, and yet it is presented there as a very good reason why we can't have this.

I'll get back to you about the BTreeMap implementation later, I'd like to get this out of the way first though.