Rc::clone(&r) or r.clone()?

But as @quinedot pointed out, that doesn't require Rc as it equally applies to &RefCell.

We are talking about the clone function not shared states in general. And we don't really have a shared state if that shared state is not mutable.

I can create such examples all day. In fact, in many ways the example I gave is considerably less contrived (In terms of the likelihood a programmer would miss the mistake). Even so, for all the indirection my example afforded, it's still not so likely in the Rust ecosystem.

When I asked for examples from you, I mean something more likely to occur in a rust codebase then the example I gave immediately thereafter (which was already a stretch). It's a bit subjective, but changing the in-directed state of RefCell should of course result in changed referenced state of a RefCell, there's no fixing that - that's the "feature" RefCell allows. RefCells stay unique but can effect change outside themselves. That pattern isn't disappearing on the whole.

The question to answer is this: is this actually a problem? If so, how often does it present?

My feeling so far as been "yes, but it rarely matters." An example where a programmer doesn't understand interior mutability doesn't violate Rust's safety guarantees and isn't that likely to occur... That's basically okay?

I dunno. I'm not pretending to be an expert on such matters. We've shown time and again that mutability is a double-edged sword. Performance on one hand and side effects on the other. Rust has sided with performance and dulled the other side of it's blade with affine types. If you want the full story for safety, Agda's less safe cousin (Haskell) is used in production.

What's the argument we're having at this point? I feel like we're moving the goal posts...

1 Like

RefCell and cheating mutability is a very interesting subject, but I'm not sure it is related to this topic.

I'm a bit worried that maybe the first post of the topic was not clear enough. I tried to edit it and make it more clear.

Indeed, Clone does not imply a deep clone. There's a crate with a DeepClone trait, deep-clone, but nobody uses it. Because of this, this trait gets reinvented in the ecosystem, like in blend2d's DeepClone. Trouble is, two crates that define their own DeepClone trait (or worse, define deep_clone() as an inherent method) don't compose.

I think it makes sense to add DeepClone to the stdlib, yes.

1 Like

If the goal is to actually replace Clone (with a DeepClone::clone(...)), that's a non-starter and there's no point in continuing this topic.

If the goal is a companion trait with guarantees, it's still very unlikely to go anywhere in my opinion, and definitely unlikely to go into std. But you could attempt it in a crate and see.

Very unlikely as...
  • Interior mutability is absolutely at the heart of this topic, as it's the only and very thing that allows the behavior you're unhappy with

  • To guarantee it can't happen, you would have to transitively guarantee no uncloned interior mutability types post-deep-clone, which is very tricky to do without breaking arguably reasonable code that can exist today, such as round-tripping through *const () or usize or anything those can transitively be round-tripped through

  • As such, the guarantee also has to be weaker than you'd probably like in order to be useful, e.g., implemented for Vec<usize> ("no interior mutability without first converting the type of me or my contents or something")

  • It has to be an unsafe trait, or any "guarantees" have no teeth, ala Ord

  • And in combination with the weaker guarantee, the trait can probably not be safely derive-able by similar reasoning

  • It also can't be generically implemented for &T or other borrowing types more generally... without a TransitivelyNoInteriorMutability trait which faces roughly the same barriers as DeepClone to implement

    • But that won't happen in std anyway because then it'd be a breaking change to add an interior mutability field to an existing struct, which has already been grounds for rejection of (much weaker) RFCs (Freeze or ShallowlyNoInteriorMutability)

But if you made such a trait anyway, would anyone use it, for say Rc<T>? More to the point, is it true we wouldn't need Rc::clone anymore.

  • An Rc with zero observable shared state would pretty much just be a Box<T: DeepClone>. Almost no one would use a pure Rc::deep_clone for the things they use clone for, as shared ownership is the point of using Rc.

  • You could tweak the guarantees to allow this observability on the Rc itself after a deep_clone, but now the shape of the guarantee is getting quite nebulous ("no interior mutability of my contents, whatever that means, unless you change my type or theirs or something, but I might use interior mutability myself for my non-contents...")

  • But even then, that only gives you "Rc::clone is the same as Rc::deep_clone" when T: TransitivelyNoInteriorMutability

    • And that won't be implemented for in std for the same reasons as &T, so you would just never know without already having deep knowledge of the types involved

Rust focused on removing aliasing and not removing (interior) mutability. If you don't control the generic types involved, there's no practical way to guarantee you don't have it.

But on the other hand, if someone passes you a type with interior mutability, they shouldn't be surprised when it gets mutated. (Especially if they're the one who implemented your Mutate trait!)

3 Likes

The Documentation is very important. Currently in a section of the book, it says that clone is a deep copy:

https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#variables-and-data-interacting-with-clone

Having that in the book, of course people will assume that clone is a deep copy, and they will not use the DeepClone trait.

I'm just talking about adding a trait. I'm not going to implement some type of magical deep clone. Any person will implement that trait for his types if it is applicable.

And about implementing for Rc. maybe this? impl<T> DeepClone for Rc<T> where T: DeepClone {...}

Some questions about the design of impl<T> DeepClone for Rc<T> where T: DeepClone {...}:

Rc with interior mutability can easily form cycles. Will the impl infinite-loop on cycles? That'd be a pretty big pitfall.

If I'm going to be making deep clones, I would usually rather have a DeepClone that duplicates the shape of the graph (cloning the cycles into new cycles, and not duplicating any nodes that weren't duplicated in the original graph). To do that, you have to maintain an "environment" (e.g. HashMap indexed by address, mapping each node to the "new version of the same node"). You might need to include an environment-reference in the method parameters of deep_clone().

You could try to avoid that by using a thread_local to store the HashMap – but what if one of the interior nodes' DeepClone impls wants to make an unrelated deep clone of some of the same data (say, for logging purposes)? This wouldn't usually happen, but if/when it does, it would produce pretty confusing errors, since you're not expecting DeepClone to behave like that.

Also, a chain of Rcs can easily be too big for the call stack. If I was going to make a DeepClone impl worthy of the standard library, I would want it to be resilient to that situation as well. So my "deep clone environment" would maintain its own stack, and instead of having every impl do recursive calls directly, my impls would push thunks (delayed computations) onto that stack.

There are multiple possible choices for this design, and it would probably be valuable to have a (non-std) crate that provides generally-useful defaults. If there's any route to inclusion in std, it would definitely start with someone building that.

3 Likes

By writing the trait bound T: DeepClone I was trying to say that if we don't implement the DeepClone trait for problematic types like RefCell that will mostly solve the problem. We don't need to implement DeepClone for every type that can be somehow deep copied; We just need to make sure that it is not implemented for problematic types.

In case that we want to implement cloning for a data structure with loops I suppose we'll need to do a BFS or DFS and make a copy of every node we visit and we'll have to keep a visited flag inside every node to make sure we are not visiting a node multiple times.

I'm confused: if it isn't implemented for types with interior mutability, why do we need to clone the inside of an Rc at all?

We don't need to copy them. It is really simple. DeepClone is a trait. When you have a type and you want to create deep/independent copies, if the type has implemented the trait and has a deep_clone function you can safely use it an get deep copies.

Let's say I have an Rc. If I call its deep_clone and compiler doesn't give an error I know that I'm getting safe independent copies. I may not know anything about internal mutability and those stuff I just know that I'm getting what I need. Interestingly it really doesn't copy anything since my Rc is not mutable.

If it's not actually copying the inside, then DeepClone is a pretty confusing name for it. It would need to have a different name, like perhaps your suggestion of "IndependentClone" (I'm not sure that's the ideal name, but it's clearer than "Deep" or "Safe", which have other meanings.)

1 Like

If Rc allows for shared ownership, why would a programmer ever need to deep clone one? He/she could just have passed the object by reference (or even by value), because that will be deep cloned anyway. If I ever use a Rc, that's because I would like to avoid deep clones, even in presence of interior mutability. If a deep clone is ever needed, I just clone the object inside the Rc after dereferencing.

Deep cloning in general could be more useful, but I don't think it would be a great addition to the std since in 99.9999999% of cases Clone is just enough for the job.

But how can you "feel" that you are not really getting a bit by bit duplicated data? You have no way to tell the difference as a user of the function.

Assume that I have an object which contains a string. If that string is a constant string like "Hello!" why should I duplicate that string inside memory every time I'm making a copy of that object?.

Edit: after more thinking I can understand why you say DeepClone can be confusing. The problem is that a blind deep copy of the data can produce an invalid copy in some circumstances.

I think I have to clarify the subject once more time. Currently we have a Clone trait defined in the std lib.Naturally a clone means an independent copy of something and most implementations do provide that independent/deep copy. However this contract is broken in the official implementation of the Clone trait for reference counting pointers (a.k.a shared pointers in cpp). and the implementation does not provide an independent clone of the object. This can be confusing and dangerous. As a solution they have created a recommendation to use all problematic clone functions like an associated function. (for example Rc::clone() and not a method r.clone) to have some visual distinction. I don't believe that's a good fix, and I say a new trait should be added for correct/safe cloning and this new trait must not be misused for implementing different logic. The old Clone trait can be used like before and becomes a trait for implementing costume copy semantics and shallow copies.

The only justification given in the book to the convention is that "when looking for performance problems in the code, we only need to consider the deep-copy clones and can disregard calls to Rc::clone", i.e. it just helps readability a little. It's not presented as as a solution or fix to a problem with Rc's Clone impl and I'd be surprised if that was the intent of the people who came up with it.

In addition to interior mutability, there is at least one other questionable (but not unsafe) way that a user could observe the difference: They could use the memory addresses of sub-objects as data to identify the sub-objects by.

Even without that, the fact that it literally isn't a deep clone seems like enough of a reason to not name it DeepClone.

Also, having 2 traits "Clone" and "DeepClone" could mislead users by making them think Clone is a shallow clone, which it is not.

To expand on what Heliozoa said – the Rc::clone recommendation is not, and is not intended to be, a solution to this problem. It's only solving the problem where a reader might assume that a clone operation is expensive, by explicitly reminding the reader that this is an Rc clone, which is cheap.

1 Like

Rc::clone is exactly a shallow copy.

I don't think so. let's say I'm a developer that knows nothing about Rc I'm reading a code, I see r.clone() and by mistake I assume that it is a deep clone. Which one is more important?

  • I assume mistakenly that this operation takes 5 nanoseconds longer than what it really does.
  • I assume wrongly that an independent copy was created and likely will misinterpret the rest of the code.

For which one you would recommend an awkward visual distinction that enforcing it will require a lot of documentation and linter and pain and.....

Yes, but most of the time (any time it doesn't contain Rc, etc), it is a deep copy. A user familiar with, say, Python, might assume that all copies not labeled "deep" only go down one level.

Definitely not the one that does not achieve the purpose. A struct that contains several fields, one of which is Rc, has the same issue of non-independence, but there is no recommendation to use special syntax for that (and indeed it wouldn't be possible, because the reader of the code wouldn't recognize that "SomeStruct::clone" has the same special behavior as Rc::clone).

Also, you should trust that the Rust Book has accurately described its own intent.

...And has the same issue regarding performance. I assure you that cloning a struct containing three u64 fields and an Rc has exactly the same performance impact as cloning a single Rc. So if you want to make it clear that cloning that struct is not expensive, you have to apply the same recommendation to it.