Adding functions to temporarily take ownership of a mutable reference to std::mem

Link to playground with the code presented

While programming in rust I have found myself frequently including these 2 utility functions in my code:

fn update<S>(src: &mut S, map: impl FnOnce(S) -> S) {
    let value = unsafe { std::ptr::read(src) };
    unsafe { std::ptr::write(src, map(value)) };
}

fn update_with<S,T>(src: &mut S, map: impl FnOnce(S) -> (T,S)) -> T {
    let value = unsafe { std::ptr::read(src) };
    let (result, new_value) = map(value);
    unsafe { std::ptr::write(src, new_value) };
    result
}

These functions let you temporarily take ownership of a mutable reference.

From the perspective of functional programming, these are very similar to the modify and state functions of the state monad, but they can also be viewed as a generalization of std::mem::replace.

As an example of why this is useful, look at the following definition of a singly linked list:

enum List<T> {
    Nil,
    Cons(Box<(T, List<T>)>),
}

If you wanted to implement push_front for this data structure you would have to write:

fn push_front(&mut self, x: T) {
    match self {
        Self::Nil => {
            *self = Self::Cons(
                (x, Self::Nil).into()
            )
        }
        Self::Cons(cons) => {
            let old_cons = std::mem::replace(
                    cons,
                    // temporarily have the tail
                    // be empty so we can aquire
                    // ownership of `cons` which
                    // will become the real tail.
                    (x, Self::Nil).into(),
                );
            // get a reference to the empty tail
            let (_, xs) = cons.as_mut();
            // set the actuall tail using the
            // aquired `head`.
            *xs = Self::Cons(old_cons);
        }
    }
}

This is ridiculously complicated for something that, if you had ownership of self, you could write in one line as self = Self::Cons((x, self).into()).

If we use the update function we can just write:

fn push_front(&mut self, x: T) {
    update(
        self,
        move |slf| Self::Cons((x, slf).into())
    )
}

Which is both slightly more performant (because we don't have to temporarily set the tail to be empty) and less tricky to write.

update_with is for cases where you want to update something and also return a value in the process. For example, if you wanted to implement pop_front you would have to write:

fn pop_front(&mut self) -> Option<T> {
    // this time we cannot even pattern
    // match so we have to set the entire
    // list to be empty.
    let slf = std::mem::replace(self, Self::Nil);
    match slf {
        Self::Nil => {
            // in this case we just replaced
            // nil with nil so we dont have
            // to reset slf.
            None
        }
        Self::Cons(cons) => {
            let (x, xs) = *cons;
            // set the actual new value of self
            std::mem::replace(self, xs);
            Some(x)
        }
    }
}

Which is, again, pretty unnecessarily complex. Though, if you tried to implement this with update you would get:

fn pop_front(&mut self) -> Option<T> {
    let mut result: Option<T> = None;
    update(
        self,
        |slf| match slf {
            Self::Nil => Self::Nil,
            Self::Cons(cons) => {
                let (x, xs) = *cons;
                result = Some(x);
                xs
            }
        }
    );
    result
}

Which is still not ideal - what if the return type wasn't an Option? You would have to use unwrap.

Using update_with the function looks like:

fn pop_front(&mut self) -> Option<T> {
    update_with(
        self,
        |slf| match slf {
            Self::Nil => (None, Self::Nil),
            Self::Cons(cons) => {
                let (x, xs) = *cons;
                (Some(x), xs)
            }
        }
    )
}

Which is, again, way clearer.

A few questions:

  • Do you know of something like this already in the standard library?
  • If not, do you think this is worth adding to std::mem?
  • Do you have any ideas for better names for these functions? update and update_with is just the best I could come up with, and doesn't really make it immediately obvious what they do.
2 Likes

replace_with

RFC 1736

The fundamental question is what do you do when the callback panics. Your code is unsound in that case, as it permits observing a moved-from value.

20 Likes

These are unsound if map panics. (For example, update::<&mut T> can store the &mut T it gets as argument in a global variable and then panic, and after catching the unwind, you now have two copies of the original &mut T, in the global variable and in src.)

7 Likes

I mean, for this particular example, you could already just do something like the following, right?

use std::mem;

#[derive(Default)]
enum List<T> {
    #[default]
    Nil,
    Cons(Box<(T, List<T>)>),
}
impl<T> List<T> {
    fn push_front(&mut self, x: T) {
        *self = Self::Cons((x, mem::take(self)).into())
    }
}

Edit: Regarding “performance”, the difference between the above and your “update”-using version is minimal (just a single setting the value to 0). [Also, technically, if the allocator is able to panic when the new Box is created, then your update-using variant can actually run into unsoundness].

In any case, for actual improvements, I seem to getting particularly terse & nice assembly by doing the Box allocation earlier, through something like this:

impl<T> List<T> {
    fn push_front(&mut self, x: T) {
        let b = Box::new_uninit();
        *self = Self::Cons(Box::write(b, (x, mem::take(self))))
    }
}

With this improvement, the difference between usage of mem::take and your update function, unsurprisingly, also goes away in optimized assembly (the intermediate setting to zero is no longer necessary for soundness, and the optimizer can figure out it’s an unnecessary extra step).

Edit2: Even your more convoluted function optimized down to the same simpler assembly once the allocation happens early enough:

impl<T> List<T> {
    fn push_front(&mut self, x: T) {
        let b = Box::new_uninit();
        match self {
            Self::Nil => *self = Self::Cons(Box::write(b, (x, Self::Nil))),
            Self::Cons(cons) => {
                let old_cons = std::mem::replace(
                    cons,
                    // temporarily have the tail
                    // be empty so we can aquire
                    // ownership of `cons` which
                    // will become the real tail.
                    Box::write(b, (x, Self::Nil)),
                );
                // get a reference to the empty tail
                let (_, xs) = cons.as_mut();
                // set the actuall tail using the
                // aquired `head`.
                *xs = Self::Cons(old_cons);
            }
        }
    }
}

See here for some comparisons: https://godbolt.org/z/d6n461ao7

8 Likes

Also possibly worth noting: this API is unsound in combination with partial_borrow, even if you corectly handle the unwinding case

partial-borrow relies for soundness on the fact that Rust does not permit the programmer to obtain an owned value T, if they are given only reference &mut T. (Assuming some type T which doesn’t specifically allow this.)

However, there are some Rust libraries which use unsafe to extend the Rust language to provide this facility. For example, the combination of partial-borrow with replace_with is unsound.

You can get an owned value out of a mutable borrow in safe Rust, using std::mem::replace

Generally speaking, if this situation happens, then at least one of the two crates must be in the wrong. But the stdlib is never in the wrong.

std::mem::replace requires you have an owned T in order to move out of an &mut T. This crate does not give you access to any owned Ts, so that doesn't work for this case.

Yep, I had a feeling there was some unsoundness issue i had missed but I couldn't find it.

Skimming through the discussion in the rfc page, it does seem like this function is a bit too big of a rabbit-hole to get merged into std.

At the end of the day I think this is a type system issue, because rust simply does not have the ability to express the idea of "a function that takes in a valid pointer and may invalidate it in some cases" (which could be accomplished if rust had true linear types).

At least it's nice to know there are crates that do this so I don't have to rewrite the same (unsound) code every time, I think I'll use replace_with for any future endeavors.

Yeah, I was being a bit intentionally nitpicky about performance, but it's about the principle of it you know?

I assumed the rust compiler optimized out those differences anyway, but it is still cool to see exactly how it does it and when.

As for using using Box::write, I am personally not the biggest fan - I know that technically not dropping a value is considered safe, but I still don't love having it be unchecked. I think you could design a safer version of this API by having an Uninit<T> type that is guarantied to be uninitialized and then adding appropriate methods, but that's just pondering.

It does, if you use &mut MaybeUninit<T>, which is allowed to be invalid. But &mut T is not allowed to be invalid.

Possibly, but only if we go with a version that requires you to prove you can't panic-and-unwind if you're currently holding any linear type.

1 Like

I think that's not really what @DanielFH referred to. I believe they were talking about something like the hypothetical &own T reference, which could allow moving out, without having to wrap the referenced value in MaybeUninit beforehand.

2 Likes

More importantly, with &mut MaybeUninit<T> as the argument, the body of function can't assume it contains a valid T when it's passed in.

1 Like

You can already sort of have this with a function that takes an owned T and returns another one (the only difference is that this is not a pointer anymore, but it mostly doesn't matter at the type level), however that leads you exactly to your starting point. The issue is that this new kind of function (or kind of pointer) is incompatible with the current definition of functions taking a &mut references.

In the end you need to either virally propagate the new function type up the call stack (e.g. making push_front also take and return Self) or bridge the two (with replace_with or similar, but with their caveats).

The only aspect I can see being improvable here is the ergonomics of taking and returning owned values, since a new kind of function/reference could allow you to specify the type only once and use method syntax to call them.

Also note that "a function that takes in a valid pointer and may invalidate it in some cases" is a bit too generic: if the cases are not specified then it might as well always invalidate the pointer! What we need in this case is a function that always invalidates the pointer if it doesn't return normally (i.e. it returns through unwinding).

1 Like

I should probabeley clarify what I had in mind when I wrote that. My idea was to have the update function return a linear LifetimeGuard<'a> type, that contained a single valid: bool field saying weather the reference was still valid or not, then make it so the only way to consume the guard would be a fn take(self) -> Result<(), Self> method on it, thus basically forcing you to either use unreachable!() to assume the method succeeded or to abort without consuming the gaurd in the case of an error.

BUT while writing this, I actually tried to write out what this could look like, and I realized that you could basically achieve the same thing by having update take in an extra callback that gets called in case the first one panics, thus forcing you to directly deal with that case in the same ways - which is exactly what the replace_with crate does (and also std::mem::take to a certain extent).

I think a MaybeUninit could sort of be used here, but as HjVT pointed out that doesn't let you assure that the reference you pass in points to an initialized value, maybe if you could say 'a function that takes in a &mut T and converts it to a &mut MaybeUninit<T>' (which I think was possible with the type-state system that rust scrapped, but I haven't red that much about it so I'm not sure), but even then I still think that it is not ideal because a MaybeUninit implies we don't know in which cases the value is uninitialized and so requires unsafe, but we do know exactly in which cases the value becomes invalid - when the function panics.

This would achieve what I was talking about, because you could have a function that takes in &own T and returns Result<&own T, &own MaybeUninit<T>>, but there are still complications there because there is no way of guarantying the address you take in and the address you get back are the same - for that I think you would need some form of dependent types, or at least full row polymorphism, so that you could associate between the address being passed in and the one being returned. With row polymorphism, for example, you could have a pointer type Ptr<A,T> - where A is a different, unique type for each address and T is the type being pointed to, then have the update function take in Ptr<A,T> and return Result<Ptr<A,T>, Ptr<A,MaybeUninit<T>>, but with the way generics currently work in rust that would require the number of addresses used by any given data structure to be known at compile time.

Worth noting that there is actually a programming language that works like that, it's called Linear ML, it's an ml-family language that uses linear types in order to optimize functions like T -> T into in place mutations. Sadly, it does seem abandoned.

I don't necessarily agree, while at the type system level you are essentially saying the same thing, moving entire objects back and forth from functions is definitely not equivalent to passing arround a single word. Of course the compiler can optimize away such differences most of the time but not always, and generally I would say that it is preferable to only rely on optimizations that are certain (like the null pointer optimization or generic templating).

That sounds like the state monad, which you could definitely implement in rust as a trait with a run_state function (maybe there are some crates that do this, but all of the ones I could find put your run_state function in a boxed closure which isn't ideal), and a macro for do notation.

Also, because rust allows variable shadowing, you can just write -

let state = /* ... */;
let (x, state) = f(state);
let (y, state) = g(state);
let (z, state) = h(state);

- without using any macros or monads, which isn't too bad.

This would infect the call signature up to the level where you diverge, I suppose? So it is just as infectious as rewriting everything from fn(&mut T, ..) -> .. to fn(T, ..) -> (T, ..).

I only see two "recovery" strategies here:

  1. Abort
  2. Conjure a value of T to stick in the &mut T

If option #2 is viable, then you likely could have rewritten your code to initially get an owned Foo out of &mut Foo via std::mem::replace(&mut foo, conjure_foo()), which you hinted to in your example with Self::Nil. I don't really see the downside to this, after enough optimization passes it is unlikely that this even appears in codegen.

1 Like

My point was showing how this is already not very ergonomic without going into semantics arguments which would likely make it even less ergonomic.

This is what I was referring to when I said it's not very ergonomic: you have to repeat state over and over and it becomes impossible to do chaining. Moreover shadowing only works as long as you don't need to update the state in another scope, for example in an if. In that case you'll need to mutate the existing state binding and you'll be forced to do something like:

let mut state = /* ... */;

if some_condition {
    let x;
    (x, state) = f(state);
    // use x ...
}
1 Like

To be pedantic, you could write:

let state = /* ... */;

let state =
    if some_condition {
        let (x, state) = f(state);
        // use x ...
        state
    } else { state }

But I do agree with your overall point that writing rust programs in this style isn't very pleasant.

Oh, uuh, it would be nice to specify that some attribute macro is unsafe, that is, if you use the proc macro you are opting in into a more narrow set of constraints. And then use it as #[unsafe(my_macro)]

Otherwise, if both partial_borrow and replace_with are supposed to provide fully safe APIs, then one of them must be unsound. That's because the user of a safe API shouldn't need to think about soundness issues, and safe APIs are supposed to be composable, regardless of any implementation details that might use unsafe Rust.(note that there are proc macros that expand to unsafe code but are otherwise safe to use, like pin-project, so if your macro is supposed to be safe to use the fact that it expands to unsafe code is an implementation detail).

Unsafe APIs, on the other hand, don't necessarily compose. And since the user can't use those two crates together or it may lead to UB, one of them must be unsafe to use. Since replace_with provides only a safe API, I must conclude that the user of your crate should be required to spell out the word "unsafe" somewhere, to acknowledge your library imposes further invariants beyond what a safe API can require. This would at least help grepping and auditing the code.

This is a known issue: Soundness conflicts · Issue #379 · rust-lang/unsafe-code-guidelines · GitHub

When combined with the safe subset of Rust, partial_borrow and replace_with are each sound in isolation, but they do not compose soundly.