But as @quinedot pointed out, that doesn't require Rc
as it equally applies to &RefCell
.
We are talking about the clone
function not shared states in general. And we don't really have a shared state if that shared state is not mutable.
I can create such examples all day. In fact, in many ways the example I gave is considerably less contrived (In terms of the likelihood a programmer would miss the mistake). Even so, for all the indirection my example afforded, it's still not so likely in the Rust ecosystem.
When I asked for examples from you, I mean something more likely to occur in a rust codebase then the example I gave immediately thereafter (which was already a stretch). It's a bit subjective, but changing the in-directed state of RefCell should of course result in changed referenced state of a RefCell, there's no fixing that - that's the "feature" RefCell allows. RefCells stay unique but can effect change outside themselves. That pattern isn't disappearing on the whole.
The question to answer is this: is this actually a problem? If so, how often does it present?
My feeling so far as been "yes, but it rarely matters." An example where a programmer doesn't understand interior mutability doesn't violate Rust's safety guarantees and isn't that likely to occur... That's basically okay?
I dunno. I'm not pretending to be an expert on such matters. We've shown time and again that mutability is a double-edged sword. Performance on one hand and side effects on the other. Rust has sided with performance and dulled the other side of it's blade with affine types. If you want the full story for safety, Agda's less safe cousin (Haskell) is used in production.
What's the argument we're having at this point? I feel like we're moving the goal posts...
RefCell and cheating mutability is a very interesting subject, but I'm not sure it is related to this topic.
I'm a bit worried that maybe the first post of the topic was not clear enough. I tried to edit it and make it more clear.
Indeed, Clone
does not imply a deep clone. There's a crate with a DeepClone
trait, deep-clone, but nobody uses it. Because of this, this trait gets reinvented in the ecosystem, like in blend2d
's DeepClone. Trouble is, two crates that define their own DeepClone
trait (or worse, define deep_clone()
as an inherent method) don't compose.
I think it makes sense to add DeepClone
to the stdlib, yes.
If the goal is to actually replace Clone
(with a DeepClone::clone(...)
), that's a non-starter and there's no point in continuing this topic.
If the goal is a companion trait with guarantees, it's still very unlikely to go anywhere in my opinion, and definitely unlikely to go into std
. But you could attempt it in a crate and see.
Very unlikely as...
-
Interior mutability is absolutely at the heart of this topic, as it's the only and very thing that allows the behavior you're unhappy with
-
To guarantee it can't happen, you would have to transitively guarantee no uncloned interior mutability types post-deep-clone, which is very tricky to do without breaking arguably reasonable code that can exist today, such as round-tripping through
*const ()
orusize
or anything those can transitively be round-tripped through -
As such, the guarantee also has to be weaker than you'd probably like in order to be useful, e.g., implemented for
Vec<usize>
("no interior mutability without first converting the type of me or my contents or something") -
It has to be an
unsafe
trait, or any "guarantees" have no teeth, alaOrd
-
And in combination with the weaker guarantee, the trait can probably not be safely
derive
-able by similar reasoning -
It also can't be generically implemented for
&T
or other borrowing types more generally... without aTransitivelyNoInteriorMutability
trait which faces roughly the same barriers asDeepClone
to implement- But that won't happen in
std
anyway because then it'd be a breaking change to add an interior mutability field to an existing struct, which has already been grounds for rejection of (much weaker) RFCs (Freeze
orShallowlyNoInteriorMutability
)
- But that won't happen in
But if you made such a trait anyway, would anyone use it, for say Rc<T>
? More to the point, is it true we wouldn't need Rc::clone
anymore.
-
An
Rc
with zero observable shared state would pretty much just be aBox<T: DeepClone>
. Almost no one would use a pureRc::deep_clone
for the things they useclone
for, as shared ownership is the point of usingRc
. -
You could tweak the guarantees to allow this observability on the
Rc
itself after adeep_clone
, but now the shape of the guarantee is getting quite nebulous ("no interior mutability of my contents, whatever that means, unless you change my type or theirs or something, but I might use interior mutability myself for my non-contents...") -
But even then, that only gives you "
Rc::clone
is the same asRc::deep_clone
" whenT: TransitivelyNoInteriorMutability
- And that won't be implemented for in
std
for the same reasons as&T
, so you would just never know without already having deep knowledge of the types involved
- And that won't be implemented for in
Rust focused on removing aliasing and not removing (interior) mutability. If you don't control the generic types involved, there's no practical way to guarantee you don't have it.
But on the other hand, if someone passes you a type with interior mutability, they shouldn't be surprised when it gets mutated. (Especially if they're the one who implemented your Mutate
trait!)
The Documentation is very important. Currently in a section of the book, it says that clone
is a deep copy:
Having that in the book, of course people will assume that clone is a deep copy, and they will not use the DeepClone
trait.
I'm just talking about adding a trait. I'm not going to implement some type of magical deep clone. Any person will implement that trait for his types if it is applicable.
And about implementing for Rc
. maybe this? impl<T> DeepClone for Rc<T> where T: DeepClone {...}
Some questions about the design of impl<T> DeepClone for Rc<T> where T: DeepClone {...}
:
Rc with interior mutability can easily form cycles. Will the impl infinite-loop on cycles? That'd be a pretty big pitfall.
If I'm going to be making deep clones, I would usually rather have a DeepClone that duplicates the shape of the graph (cloning the cycles into new cycles, and not duplicating any nodes that weren't duplicated in the original graph). To do that, you have to maintain an "environment" (e.g. HashMap indexed by address, mapping each node to the "new version of the same node"). You might need to include an environment-reference in the method parameters of deep_clone()
.
You could try to avoid that by using a thread_local
to store the HashMap – but what if one of the interior nodes' DeepClone impls wants to make an unrelated deep clone of some of the same data (say, for logging purposes)? This wouldn't usually happen, but if/when it does, it would produce pretty confusing errors, since you're not expecting DeepClone to behave like that.
Also, a chain of Rcs can easily be too big for the call stack. If I was going to make a DeepClone impl worthy of the standard library, I would want it to be resilient to that situation as well. So my "deep clone environment" would maintain its own stack, and instead of having every impl do recursive calls directly, my impls would push thunks (delayed computations) onto that stack.
There are multiple possible choices for this design, and it would probably be valuable to have a (non-std
) crate that provides generally-useful defaults. If there's any route to inclusion in std
, it would definitely start with someone building that.
By writing the trait bound T: DeepClone
I was trying to say that if we don't implement the DeepClone
trait for problematic types like RefCell
that will mostly solve the problem. We don't need to implement DeepClone for every type that can be somehow deep copied; We just need to make sure that it is not implemented for problematic types.
In case that we want to implement cloning for a data structure with loops I suppose we'll need to do a BFS or DFS and make a copy of every node we visit and we'll have to keep a visited
flag inside every node to make sure we are not visiting a node multiple times.
I'm confused: if it isn't implemented for types with interior mutability, why do we need to clone the inside of an Rc at all?
We don't need to copy them.
It is really simple. DeepClone
is a trait. When you have a type and you want to create deep/independent copies, if the type has implemented the trait and has a deep_clone
function you can safely use it an get deep copies.
Let's say I have an Rc. If I call its deep_clone
and compiler doesn't give an error I know that I'm getting safe independent copies. I may not know anything about internal mutability and those stuff I just know that I'm getting what I need. Interestingly it really doesn't copy anything since my Rc is not mutable.
If it's not actually copying the inside, then DeepClone is a pretty confusing name for it. It would need to have a different name, like perhaps your suggestion of "IndependentClone" (I'm not sure that's the ideal name, but it's clearer than "Deep" or "Safe", which have other meanings.)
If Rc
allows for shared ownership, why would a programmer ever need to deep clone one? He/she could just have passed the object by reference (or even by value), because that will be deep cloned anyway.
If I ever use a Rc
, that's because I would like to avoid deep clones, even in presence of interior mutability. If a deep clone is ever needed, I just clone the object inside the Rc
after dereferencing.
Deep cloning in general could be more useful, but I don't think it would be a great addition to the std since in 99.9999999% of cases Clone
is just enough for the job.
But how can you "feel" that you are not really getting a bit by bit duplicated data? You have no way to tell the difference as a user of the function.
Assume that I have an object which contains a string. If that string is a constant string like "Hello!" why should I duplicate that string inside memory every time I'm making a copy of that object?.
Edit: after more thinking I can understand why you say DeepClone can be confusing. The problem is that a blind deep copy of the data can produce an invalid copy in some circumstances.
I think I have to clarify the subject once more time. Currently we have a Clone
trait defined in the std lib.Naturally a clone means an independent copy of something and most implementations do provide that independent/deep copy. However this contract is broken in the official implementation of the Clone trait for reference counting pointers (a.k.a shared pointers in cpp). and the implementation does not provide an independent clone of the object. This can be confusing and dangerous. As a solution they have created a recommendation to use all problematic clone functions like an associated function. (for example Rc::clone()
and not a method r.clone
) to have some visual distinction. I don't believe that's a good fix, and I say a new trait should be added for correct/safe cloning and this new trait must not be misused for implementing different logic. The old Clone trait can be used like before and becomes a trait for implementing costume copy semantics and shallow copies.
The only justification given in the book to the convention is that "when looking for performance problems in the code, we only need to consider the deep-copy clones and can disregard calls to Rc::clone
", i.e. it just helps readability a little. It's not presented as as a solution or fix to a problem with Rc's Clone impl and I'd be surprised if that was the intent of the people who came up with it.
In addition to interior mutability, there is at least one other questionable (but not unsafe
) way that a user could observe the difference: They could use the memory addresses of sub-objects as data to identify the sub-objects by.
Even without that, the fact that it literally isn't a deep clone seems like enough of a reason to not name it DeepClone.
Also, having 2 traits "Clone" and "DeepClone" could mislead users by making them think Clone is a shallow clone, which it is not.
To expand on what Heliozoa said – the Rc::clone
recommendation is not, and is not intended to be, a solution to this problem. It's only solving the problem where a reader might assume that a clone
operation is expensive, by explicitly reminding the reader that this is an Rc clone, which is cheap.
Rc::clone
is exactly a shallow copy.
I don't think so. let's say I'm a developer that knows nothing about Rc
I'm reading a code, I see r.clone()
and by mistake I assume that it is a deep clone. Which one is more important?
- I assume mistakenly that this operation takes 5 nanoseconds longer than what it really does.
- I assume wrongly that an independent copy was created and likely will misinterpret the rest of the code.
For which one you would recommend an awkward visual distinction that enforcing it will require a lot of documentation and linter and pain and.....
Yes, but most of the time (any time it doesn't contain Rc, etc), it is a deep copy. A user familiar with, say, Python, might assume that all copies not labeled "deep" only go down one level.
Definitely not the one that does not achieve the purpose. A struct that contains several fields, one of which is Rc, has the same issue of non-independence, but there is no recommendation to use special syntax for that (and indeed it wouldn't be possible, because the reader of the code wouldn't recognize that "SomeStruct::clone" has the same special behavior as Rc::clone).
Also, you should trust that the Rust Book has accurately described its own intent.
...And has the same issue regarding performance. I assure you that cloning a struct containing three u64 fields and an Rc has exactly the same performance impact as cloning a single Rc. So if you want to make it clear that cloning that struct is not expensive, you have to apply the same recommendation to it.