Correct term for a “thing that can be moved but also mutated”

In Rust, a “value” is a concrete sequence of bytes with an associated type. (The corresponding term in C/C++ is “rvalue”.) A “place” is some memory location that can hold a value. (The corresponding term in C/C++ is “lvalue”.) An “assignment expression” in Rust moves a value into a specified place.

Some places, but not all, are tracked by Rust’s borrow checker. These are called “move paths”. Examples of move paths are variables, i.e. named places on the stack. But there exist other move paths, notably elements of structs. For example, it is possible to have an initialized variable pair of type (String, String) and move out of pair.0. The borrow checker will be aware that the move path pair.0 is now uninitialized, while pair.1 is still initialized. This tracking does not occur for elements of arrays or other collections.

The Rust borrow checker tracks the lifetimes of things as they are moved between move paths. It is common to say that what is moved are values, but the things that the borrow checker tracks are not values: mutating a thing does not change its identity for the borrow checker.

Is there a specific term in Rust parlance for “the thing that can be moved but also mutated”? In the context of other programming languages I’d call such things “instances” or “objects”, but these terms do not seem to be used in Rust. (For example the term “object” is typically used in the context of dyn.)

If there is no term, is it because this way of thinking is not considered useful? I admit that I might be splitting hairs here, but I find that being able to clearly name concepts helps with understanding, and not being able to name concepts might betray a lack of understanding.

3 Likes

It's better not to name C and C++ in the same category here since they are very different. The latter has many more 'value categories' and its semantics of values is tied to constructors and such (for instance, the additional categories of prvalue is used to define where compilers are required to elide them). I do suppose the comparison to C's lvalue and rvalue is roughly correct except that the target of Rust's assignments can also be an assignee expression that denotes multiple target memory addresses—a value can be destructured and its parts written to different distinct memory locations during assignment. Close enough to syntactic sugar for our discussion though.

I'm not aware of that term being used. The initialization state is tracked for variables, which are separate allocations in an unspecified memory location. Tracking does not occur for elements of arrays.

No. The borrow checker solves a constraint problem between lifetimes. Lifetimes are introduced for instance by taking a reference to something. That's all. The relation to values occurs only in the form of types with drop-glue, a property that introduces additional constraints when such a type occurs as (part of) a variable and that variable is dropped at the end of a scope. (There's an internal attribute for discharging some constraints, i.e. we can manipulate the exact nature of the constraint through the type system, but that goes too deep). As far as the borrow checker is concerned, there is no such thing as value identity.

Things that can be moved and mutate are variables that are declared mut. The concept close to value identities would be Pin-ned values but there's little influence of the core language on these (there's some unsettled details of alias analysis since we want these to point to self-referential values). Rather, whoever creates a pinned value must guarantee a contract that the value is also dropped before the memory is re-use. But it's not the type itself or a special expression or magic that creates the value identity / a tie to a memory location, it's the act of constructing a Pin.

Can you explain in more detail what you think is tracked, here?

To my mind, all that is involved here are places and values. When you move a value from place A to place B, no lifetime relationship exists between them (in fact, the value must not be borrowed at this time, so there can be no lifetimes “of” it); rather, place A is marked empty/uninitialized and place B is marked filled/initialized. These are separate operations happening simultaneously. In particular, there is no significant difference between b = a; moving a value from one place to another, and b = f(a); moving a value out of one place and a potentially different value into another place.

2 Likes

I wasn’t either, but it is used in the compiler internals documentation: Move paths - Rust Compiler Development Guide

3 Likes

I was re-reading the first chapter of the great book “Rust for Rustaceans” (the first chapter is available for download there), to see whether I’d notice something noteworthy now that I’ve been using the language for a couple of years.

At the begin of the chapter the author carefully introduces some terms. In particular, he defines the term “value” as follows: “A value in Rust is the combination of a type and an element of that type’s domain of values”. While there seems to be no formal definition of “value” in Rust, this definition agrees with the term’s usage in the Rust Reference for example: A value is what a place (e.g. a variable) holds at some given point in time. If the place is mutated, its value changes to some other one, but a value like 9i32 itself is immutable.

On pages 3 & 4 in the section “High-Level Model” the author develops a picture of how the borrow checker works in terms of “flows”. In his words each flow “tracks the lifetime of a particular instance of a value”. My understanding of his picture was that mutating a variable does not establish a new “flow”, i.e. that the “particular instance of a value” lives on even when mutated. I’m not so sure anymore (explanation further below after the second quote of your post).

But even if Gjengset is perfectly consistent about his usage of the term “value”, sloppy usage seems widespread: For example, the Rust book says: “This makes it very clear that the change function will mutate the value it borrows.” Before starting to re-read chapter 1 of “Rust for Rustaceans” I actually believed that in Rust jargon “value” is just a synonym for what in C++ would be called an object or instance.

Let’s consider this toy program

fn main() {
    let mut s = String::from("Hello");
    s.push_str(" world");
    let mut m = s;
    m.push_str("!");
    dbg!(&m);
}

There is clearly a thing, an instance of String, that gets created, mutated, moved into a different variable, and mutated again. That thing is not a value, it’s not a variable, it’s not a place, but IMHO it clearly has an identity of its own that justifies giving it a proper name.

I would call it “instance” or “object”, but as I said above these do not seem to be the appropriate terms in in Rust jargon.

What you write agrees with my mental model of the borrow checker. But then Gjengset’s picture of “flows, each one tracing the lifetime of a particular instance of a value” includes “exclusive (&mut) flows”.

But I believe now that in Gjengset’s picture of “flows” it does not matter whether mutation constitutes a new flow or not. An “exclusive flow” must continue up to a statement where a mutation takes place, but whether it continues after that as a new flow or as the old one has no observable consequences, since there cannot be any concurrent borrows anyway.

1 Like

Thanks for the explanation. I wrote the sentence that you quote influenced by Gjengset’s picture of how the borrow checker works, and after a cursory lecture of the Rust Compiler Development Guide’s description. (@kpreid, that is indeed where I came across the concept of “move path”.)

But ultimately, my question is more about why there seems to be no term like “instance of a type” that is widely used in Rust. As I tried to show in my post above, I believe that the term “value” is sometimes abused in this sense.

It seems to me now that the answer to my question is that since Rust is so focused on values (e.g. moving is defined as copying the bytes of a value, and borrowed “instances of types” are protected from mutation, etc.), the concept of “instance/object”, while it can be defined, is not so terribly useful after all in Rust: it’s OK to speak about “mutating a value”, since whenever this happens, there can be no other reference to it anyway... This is very different from Python, say, where objects are one of the most important concepts of the language (not in the OOP sense of instance of a class, but in the more general sense of instance of a type).

I think the concept that this thread is about is a) practically useful and necessary for certain hypothetical future language extensions that I would like to propose, but b) not definable in current Rust and thus there isn't a common name for it.

I personally use the term "object identity" for this concept (but I don't think it's standard). In current Rust, an object identity only really applies to things that are stored in an immutable let binding or referenced by shared references (i.e. things that can't be mutated by exterior mutability) – the concept is still useful because the thing in question might have interior mutability. (But being the target of a shared reference prevents it moving, so you don't get the full power of the concept.)

Pin is close to being able to create an object identity, but doesn't quite manage it, due to the existence of Pin::set. (The fact that Pin::set is considered safe is evidence that a Pin is a slightly different concept than that of an object identity, as you can overwrite the contents of a pin with another object identity of the same type.)

There are experiments at the moment with ways to create mutable references that don't allow overwriting (only mutating) – this would allow an object identity concept, because you could have mutating methods that take a non-overwritable mutable self (thus guaranteeing that the method leaves the object identity alone rather than swapping it with a different one). Once Rust has those, object identity will be a concept that the language is able to talk about, and thus it would benefit from a proper, widely-accepted name. (I don't think the experiments have decided whether this should be a new type of reference, or a new type of field projection, or a property of a type that prevents moving out of &mut references to the type – but any of those would introduce an object-identity concept.)

1 Like

What you’re observing here is not identity but ownership, or uniqueness. Types such as String which rely on ownership generally have the property that the number of values of that type is conserved except in operations which explicitly create or destroy them. But whether they are the “same value” doesn’t actually matter. For example, these two functions are essentially equivalent:

fn do_thing_1(mut s: String) -> String {
    s.clear();
    s.shrink_to_fit();
    s.push_str("hello world");
    s
}

fn do_thing_2(s: String) -> String {
    String::from("hello world")
}

It makes no difference that do_thing_1 “mutates” a string and do_thing_2 throws it out and creates a new one; in both of them, a string is input and a string is output, and in both of them, the heap allocation is discarded and recreated[1].

Ownership is like particle physics: we can look at some piece of space over some time and say that we started with a hydrogen atom there and we ended with a hydrogen atom there, but there is no truth about whether it is the same hydrogen atom except for the specific narrow claim “nothing interacted with it, so it must be the same as it was”. Particles don’t have identities and values don’t have identities; rather, they both have conservation laws.

What you can actually say about a String:

  • In the program state, there exists exactly one String value which has this particular heap pointer,
  • until such time as an operation is performed on it that drops, reallocates, or forgets it.

Identity is an interpretation of this uniqueness, not a thing that actually exists in Rust.


  1. In both cases, the new heap allocation may or may not have the same address as the old heap allocation ↩︎

3 Likes

I agree that this is an accurate statement of how Rust currently works – in safe Rust, any two values of the same type are considered interchangeable in the sense that if, at any point, a value of a type is replaced by a different value of the same type, it shouldn't break any of the language's safety guarantees.

But I think that this is a shame, and should be changed. Imagine that I have some code that starts with a given value x of a particular type T, and cares about the fact that, at all times, x contains a value that could have been produced from x's original value via calling methods of T's public API. (If T's public API does not allow for arbitrary changes, this will be a subset of the possible values of T, rather than the full thing.) This concept doesn't actually require a concept of object identity to define: we can just say "these are the T values that are reachable from x's original value, those are the T values that aren't, x must always have a reachable value but we don't care how it was created". This sort of "reachable via the public API" restriction is very useful in proving unsafe code to be sound, and in proving safe code to be correct or non-panicking.

On the other hand, despite not needing a concept of object identity to define, this pretty much needs a concept of object identity to practically make use of. Code that's doing this sort of thing would love to be able to hand out &mut x to callbacks that take an &mut T and be able to rely on the fact that x is only changed to reachable values. But in current Rust you can't do that, because anything that gets hold of an &mut T can replace that value with any T value that it has access to, including values that weren't reachable from the previous value. So instead, you would need some way to say "here's an &mut T, but please don't change the value in an unreachable way", which is most easily accomplished by banning moves/swaps/overwrites, and at that point you have a sort of identity concept.

(Some context: not only do I want to make use of this identity/reachability concept for doing safety proofs, I want to be able to make use of it to do safety proofs at the type system level and that are verified by the type checker. This would need a name for the object-identity concept because it would appear as part of the type, so the name of the concept would probably need to become a contextual keyword so that it can be identified in type names.)

I entirely agree with you. I just don‘t think that, if you look hard enough at Rust, you will find an existing term that is the correct term. Unique ownership of things is the closest that today’s Rust comes, and that unique ownership is not even a single language feature, but a strategy that many parts of the language, and many library crates including but not limited to the standard library, collaborate to maintain.

Thanks! This is a very clear way to think about it. Indeed, I may sprinkle calls to

fn replace_by_clone<T: Clone>(thing: &mut T) {
    std::mem::swap(thing, &mut thing.clone());
}

across a Rust program and this will typically not change its behavior, except in cases like

fn check(v: &mut Vec<i32>) {
    let old = v.as_ptr();
    // replace_by_clone(v);
    if old != v.as_ptr() {
        println!("Vec reallocated!");
    }
}

This leads me to believe that the compiler is not free to insert something like calls to replace_by_clone on its own (for example as an optimization). The language only seems to guarantee that inserting replace_by_clone anywhere may not lead to undefined behavior in safe Rust.

(In any case inserting calls to replace_by_clone does not change anything for the borrow checker, which neatly answers my original question whether the borrow checker tracks object identity in some way.)

This a great way to think about it, but in light of the above it seems to be only an approximation (otherwise the compiler would be free to insert calls to replace_by_clone at will). It would be interesting to muse about what would be gained and lost by making this a hard guarantee of the language. (My intuition tells me that it would make the language too abstract for general systems programming.)

To try to understand what you are talking about, are there 1 or 2 "things" in the following code?

    let s = String::from("Hello");
    let m = s + " world";

It depends on what Add::add does: it takes possession of the thing stored in s and is free to return the same or a new thing. We could know what happens by observing whether drop is called on the Vec underlying the String.

Now the interesting thing is that for the snippet

    let mut s = String::from("Hello");
    s += " world";
    let m = s;

the situation is the same. Although AddAssign::add_assign takes a mutable reference to the thing, it is free to swap it against something different and drop the original. So whether there are 1 or 2 things depends on the implementation of String in this case just as well.

Which shows that my claim that

There is clearly a thing, an instance of String, that gets created, mutated, moved into a different variable, and mutated again.

is only an interpretation. What really happens depends on the implementation of the type, and both cases are indistinguishable anyway (whether drop is called or not, the type might reallocate its internal buffer, or not).

This brings up @kpreid’s comparison with quantum physical processes where it also does not matter whether the same particle is scattered back, or whether it was swapped against another one.

The quantum particle analogy goes a long way, IMHO. Just like in physics, small objects have no identity (in physics this goes up at least to large molecules like buckyballs), but huge objects like cats acquire identity, although fundamentally they are just a combination of small ones.

If the thing above was not just an instance of String, but a huge database, internally represented by a complicated tree of values, it does clearly matter whether it only gets mutated, or dropped and reconstructed. But this is an implementation detail of the type and the language does not care.

Hence the answer to my original question in this thread seems to be that there is no name for “thing that can be moved but also mutated” in Rust, since this is not a useful concept at the level of the language. Still, I found the discussion very interesting!

I think "object" is just fine in casual parlance, the meaning will be understood.

As others alluded to already, object identity can be a bit subjective in Rust, but the related concepts of address stability and non-trivial drop behavior are more formalized.

On the Vec underlying the String, or on the String? It seems like a struct shouldn't lose "identity" if you just call drop on one member of the struct and replace it with something new.

Counting pending drops still doesn't fully define what we'd mean by identity. When you do mem::swap(&mut a, &mut b), do the identities switch between a and b or not? Either way you get two drops later, so that won't distinguish it.

There isn't really a difference between these two cases. Strings can also be huge. In the process of destroying a huge database with trees inside, you can also extract some of the trees and reuse them later when reconstructing a "new" database.

I think so, yes. There isn't a useful distinction between "creating a new value and destroying the old value" vs "moving a value", those are essentially same operations.

What if Vec::into_raw_parts is called on it and then a "new" Vec is created using Vec::from_raw_parts?

What if the original Vec is core::mem::forgetted and a new Vec is created from scratch?

String is a newtype around Vec<u8>. It fully delegates dropping to its only element of type Vec<u8>. But of course one could implement a trivial Drop for String and observe its method being called.

Yes, in my picture identities do switch when values are swapped.

Sure you can do that, but re-assembling something complex from a few parts is not the same as building a clone of the thing from scratch part-by-part.

I’m more into simulations than databases, so let’s say that there is a value representing a simulation with a huge internal state. Cloning that simulation value could take several seconds, the computer might even start thrashing. In this case cloning and dropping the original is clearly distinguishable from not doing anything.

Another example of an object that “acquires identity” would be an object to which an OS process is attached. This is possible without violating the requirement x == x -> x.clone() == x, if technical details of the attached process itself (PID etc.) are not considered relevant for equality.

Calling forget means that the object becomes unreachable, but continues to exist, including all of its resources. So from an object identity point I’d say that it is equivalent to pushing it onto a Vec called nirvana and leaving it there.

1 Like

How about these? Which functions swap identities and which don't?

use std::mem;

struct TwoStrings {
    a: String,
    b: String,
}

fn swap_a(x: &mut TwoStrings, y: &mut TwoStrings) {
    mem::swap(x, y);
}

fn swap_b(x: &mut TwoStrings, y: &mut TwoStrings) {
    mem::swap(&mut x.a, &mut y.a);
    mem::swap(&mut x.b, &mut y.b);
}

fn swap_half(x: &mut TwoStrings, y: &mut TwoStrings) {
    mem::swap(&mut x.a, &mut y.a);
}

impl TwoStrings {
    fn swap_c(&mut self, y: &mut TwoStrings) {
        mem::swap(self, y);
    }
    
    fn swap_d(&mut self, y: &mut TwoStrings) {
        mem::swap(&mut self.a, &mut y.a);
        mem::swap(&mut self.b, &mut y.b);
    }
    
    fn swap_half2(&mut self, y: &mut TwoStrings) {
        mem::swap(&mut self.a, &mut y.a);
    }
}

The same questions arise in the real world. They do not seriously undermine the usefulness of the concept of identity there.

I agree that in the discrete world of a computer program the concept of identity is much more problematic. If in the real world we were able to perfectly clone objects unobserved, the situation would be similar to the one within a Rust program.

I found this thread quite enlightening. Naively, I had expected that the concept of object identity could be defined for Rust just as it can be defined for languages similar to Python, where each object is uniquely identified by its address throughout its lifetime. As it turns out, while it can be a useful concept in Rust as well (to reason about a program), it is only an approximate one.

1 Like

Sure, on that I agree, but is it the same as the original?

From a technical perspective, I would consider these different instances of Vec, one was consumed and another one was created.

However I suspect that you don't really care about that, instead you care about the allocation that's logically owned by the Vec and was passed from the original to the new one, or more in general about resources (allocations, file descriptore, and other "things" with some kind of external identity). These generally have no type-level presence outside of generally requiring some drop glue for a "proper" implementation.

1 Like