Associativity, commutivity, etc. for standard operators

Well that would be a compiler bug, unless Associative was an unsafe trait. (Under the general overarching semantic rule of "safe code can't cause memory unsafety".)

This was Error::type_id, which is still unstable and doc(hidden) due to this, but was temporarily stable to implement, leading to unsound implementations being possible in safe code.

4 Likes

I'll admit it, I'm trying to create a counterexample in my head, but I haven't been able to. I'm hoping that what you're saying is correct, and it's not because we can't figure out a bad counterexample...

That's just a requirement of the language. Such bugs have arisen before. Sometimes it's just a simple patch to fix it, other times the language grows new features to address it.

3 Likes

This RFC says

An implementation that does not uphold this shall not result in undefined behavior; Clone is not an unsafe trait .

So it's not really Rust assuming anything I would say. This is more like the PartialEq contract.

1 Like

It does allow the x.clone() -> (*x) transformation. Whereas I don't think the compiler is allowed to use Eq to replace x == x with true.

Maybe I should use a different word for that?

I don't think it allows such a transformation, since the RFC explicitly says it is not UB to violate the property.

That transformation is not UB though, it's well defined, and it is already used in the stdlib, for example when specializing [T]::to_vec for Copy types.

1 Like

The difference being it is part of the implementation, not the compiler. The choice of transformation is done by core::vec::Vec and not part of the language semantics. That doesn't mean it's right. A more cautious approach would obviously be to only specialize on types that deeply derive Clone (i.e. akin to needs_drop) but still I see the interface of vec being consistent enough even with that specialization. And it's only that type, not arbitrary user types or any replacement for Vec that you use for that consistency.

I guess I have trouble seeing how one could write code that would be sound normally, but UB under that transformation. Because if Clone::clone(&x) -> *&x makes it unsound, then wouldn't vec![x].clone().pop().unwrap() also end up being unsound since that'd do the "same" transformation?

UB is a property of a program, not of a transformation, so I am not sure what you mean -- there is a category error in your post.

My claim concretely is that: it is not okay for the compiler to do x.clone()(*x). If the user calls a library function which uses specialization to do a copy instead of cloning, that is part of the contract between that library and its clients, so language semantics (and UB) are irrelevant. But if the compiler silently replaces explicit calls to a clone function by a mere copy, that means the optimized program has a different semantics than the unoptimized program, and that would be a compiler bug.

Similarly, a library can say that it expects PartialEq::eq and PartialOrd::partial_cmp to be coherent with each other, and can replace calls to one by the other in a library update or using specialization without this being considered a "breaking change" -- but that does not mean the compiler is allowed to replace calls to one by the other.

Soundness is a property of the safe API surface of a crate (and it roughly means "no safe client can cause UB even when invoking this crate in arbitrary ways"), so again I am not quite sure how to interpret this statement.

So let me try a concrete example: If I have a MyType: Copy, and its clone implementation prints to standard out, and I call that clone function, then the final program produced by the compiler must print the same thing to standard out. There is nothing that would give the compiler license to remove that print (as would be the case if the clone call would be replaced by a simple copy). Do we agree on this?

4 Likes

The RFC says this:

If T: Copy, x: T, and y: &T, then let x = y.clone(); is equivalent to let x = *y;.

From my perspective, that would give the compiler license to remove the print.

I think the RFC has two readings. I'm not sure which is the consensus of the active members of the language team:

  1. It's not a breaking change for a library to stop calling a clone impl on a type that impls Copy and instead use the copy impl (or vice versa). This was the original motivation for the RFC; Vec's Clone impl was changed to stop calling clone and start using Copy semantics for Copy types; any change to program behaviour because of this change to std is not considered a breaking change.
  2. The compiler is free to replace clone calls with memcpy for types that implement Copy, even if that would change program behaviour. This would be a stronger interpretation than the first reading, which could impact people writing unsafe code.

Personally I'm not convinced adopting the second reading is a good idea. But it wouldn't lead to UB in correctly written code, because a user who is relying on a call to clone() actually calling clone on a Copy type to uphold an invariant would have to have written some incorrect unsafe code under this reading (even if they control the type and its impls and the code where the clone call is).

The reason I think its probably not a good idea is that obviously it isn't great to have a list of special optimisations like this that the compiler is allowed to perform that users have to be informed of somewhere, even if the only reason they would run into them is that they wrote some pathologically bad code.

2 Likes

Here's my opinion:

When my code contains the line x.clone(), I absolutely expect that the clone method is called. When I write x.blorb(), I also expect that the blorb method is called.

When a function has the same behaviour as a memcpy, then the compiler can replace the function with a memcpy, but only then. The compiler is not allowed to change the behaviour.

The standard library's implementation can be modified in a way that changes behaviour of types that implement traits incorrectly. But I want to emphasize that this is a library change, not a compiler optimization. It has the potential to affect API stability, but not soundness or language semantics.

Clone is specified so that x.clone() is equivalent to *x for any Copy type that implements Clone correctly. The standard library is allowed to produce incorrect/unexpected results when a trait is implemented incorrectly. It is also allowed to make changes that are only backwards compatible iff the relevant traits are implemented correctly for your types. This might break a program that was previously working, but incorrect.

In an abstract sense only. I typically expect functions to be inlined where appropriate and I'd like to be able to write generic code with T : Clone without having to double specialize on Copy because of a performance gotcha. This was exactly the problem in Vec, in theory the compiler should be able to pierce the veil and realize that a (heavily misdirected) for loop of clones can be one big memcpy but in practice it was missing this and losing out on a basic optimization.

I think it's okay to make the performant thing the most ergonomic thing at the cost of making some wrong code even more wrong. There should probably be a warning added for impl Clone for AlreadyCopyable either way.

Yes. I'm talking about language semantics, not generated machine code. Semantically, an inlined function call is still a function call, even if no callq instruction is emitted.

1 Like

That contradicts the part of the RFC where it says there is no UB.

You cannot just change the behavior of the program "by fiat" -- you cannot just proclaim some axioms and expect them to make any sense. Instead the Rust Abstract Machine should specify what happens step-by-step, and nothing there would allow the compiler to remove this print, except if the program somehow has UB or a special case is added for calls to Clone::clone to be different from any other function call ("any call to Clone::clone, if the receiver type is Copy, may non-deterministically be skipped and a memcpy executed instead", or something weird like that). I am strongly opposed to the latter. Certainly an RFC that alters the Rust Abstract Machine needs to do so explicitly and with the intent of fundamentally changing the language, not just as a side-effect of something that looks entirely like a library RFC.

Yes, this is what I think the RFC is intended to do.

I would personally go so far as to say it is a bad idea. :slight_smile:

4 Likes

Is there currently a mathematically complete, unambiguous definition for this? Just to be clear, I'm not trying to be snarky, I've been wanting to see something like this for years, pretty much since I started learning Rust. I would love to see a definition like this as it would answer questions that I have pretty much constantly... It would also answer questions like above, explaining what transformations are legal, and which aren't.

1 Like

Congrats, you're asking the guy who got a PhD for writing one :slight_smile: (congrats again, @RalfJung!)

https://plv.mpi-sws.org/rustbelt/stacked-borrows/

This model isn't official, but is the working model implemented by miri. You can also check out wg-unsafe-code-guidelines for most of the open questions.

1 Like

Congrats on the Ph.D. @RalfJung! I knew you were working on it, but I didn't know you finished it, good job!