Redundant copy when move a variable

look at the code:

fn main() {
  struct Man {
      value: i16,
  }

  let m = Man { value: 10 };

  // m moves to n, but rust takes up a new memory
  //  to store Man, not reuse the old memory,
  //  i.e.  the address of m.value is not the same as address of n.value,
  //  it's really redundant!
  let n = m;
}

so, why design the language in this way ?

There is no extra memory used even without optimizations: Compiler Explorer

As soon as you want to observe the address, the compiler is hindered in the optimzations it can do. Printing addresses in code is therefore a bad way to inspect what optimizations happen to a program.

18 Likes

Is there a way to view the address of something without inhibiting this specific optimization? This seems like a pretty large footgun.

1 Like

You might be interested in reading through the discussion at and linked from:

In short, the obvious and simple semantics have the allocation for bindings occur when they're declared and deallocation happen at the end of scope. This results in the (stack) allocations existing at the same time, thus being required to have disjoint addresses. If you don't inspect the address, the compiler can get away with colocating them, but if you do, it can't.

It is possible that moving from a local place would deallocate it. But even if this is the case, telling LLVM to optimize based on that is not straightforward, as LLVM grew up biased towards using C++ semantics.

4 Likes

minor papercut¹ != large footgun²

¹most likely to affect explorative programming and not production use; tools to do this properly already exist
²having one's feet shot is a serious injury and potentially fatal, especially when using a large gun

9 Likes

Thank you, I really like that discussion.

My code is simple, what I worry about is that if struct Man takes up too much memory, there'll be a memory copy(let n = m), it might slow down the performance of program.

In fact, I think rust doesn't need to do the copy job, because it will stop users using m after let n = m in compile time, unless users use unsafe rust code to get around, but in this case, it is duty of users to keep code safety, not rust.

Although it's not possible to take up too much memory on stack, I am still worried about memory copy. :joy:

In general, the compiler will elide the copy when it can. That said, eliding the copies is an optimization, and will not always fire. You should not worry about the memory copy unless you have an example case where the compiler should elide the memory copy, but did not (at which point, it can be worth posting here and asking people to help you investigate why the optimization did not fire).

It's also not worth worrying without a profiler showing you that the copies are a big deal in terms of final performance; CPUs are fast at copying data, and thus if it's rare, it's not a problem.

There used to be a site tracking the stack efficiency of Rust by proxy statistic, but it seems to be down now, unfortunately. Generally, Rust/LLVM is decent at eliminating redundant copies of stack locals. It can struggle with large buffers of nontrivial contents sometimes, but it isn't something you typically need to worry about. Write code that does the right thing w.r.t. ownership and let the compiler do its job.

2 Likes

unless you have an example case where the compiler should elide the memory copy, but did not

I've played around on Compiler Explorer a bit because I was curious what exactly the output assembly does, and I was a bit surprised that the following (with -C opt-level=3) contains 2 calls to memcpy [1] (yes, they are cheap, but they are definitely avoidable here).

This does "inspect the address" becauses it gives a reference to do_something, so I understand why the compiler does not combine them, but since something like this is somewhat common pattern (move through calling functions, functions that return Self to be chained, declaration in a block that returns (and thus moves) something) I was expecting this to not copy the data around.

I cannot think of a reason (besides "that's how LLVM does things due to C") why those references wouldn't be allowed to have the same address, especially since due to the move we're guaranteed to not have a reference to the original m.

pub struct Wrapper(pub Man);

pub fn main(m: Man) {
    // Pass in the address of what we got from the caller
    do_something(&m);

    // Creates a copy, even with optimizations enabled
    let m = m;
    do_something(&m);
    // Does not call memcpy here, as there was no move
    do_something(&m);

    // Calls memcpy again, as it's technically a move
    let x = Wrapper(m);
    do_something(&x);
}

Here (example 2) is a more minimal example that still contains a memcpy with -C opt-level=3

Here (example 3) is a slightly more realistic one. inner_fn could for example be a builder pattern that modifies m (could be self).

I wouldn't have expected any of these examples to contain a call to memcpy. Maybe without optimizations but definitely not with optimizations enabled.

Am I missing something here or is that a fundamental limitation from the way things are passed to LLVM?


  1. I'm using a [i16; 1024] to make the copy easier to find in the assembly. ↩︎

3 Likes

It looks like an unimplemented optimization to me. When I look at the LLVM IR, I don't see reason for ⚙ D153453 [MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas to fire, since that's looking for a memcpy between two stack slots in the same basic block, but in your case, there's an input to the basic block that's being copied from.

Improving MemCpyOpt in LLVM would likely elide that allocation, too, since it would be able to see that the parameter m is not used after the memcpy.

2 Likes

Yeah, unfortunately it's not illegal for Rust code to do silly things, and thus LLVM is correct that in general it can't optimize certain things.

For example, do_something could, as far as LLVM knows (if it can't see the body), stash the *const i16 in an AtomicPtr, and display something different to the console if you call it with the same address as the last time, at which point it would be important that certain stack slots are not actually re-used.

So, ironically, one way to make that copy not copy it is to do a copy on every call, which doesn't actually emit more copies into the output: https://rust.godbolt.org/z/rP3nx6TxK

But there's a bunch of issues open for things like this, like Using ManuallyDrop causes allocas and memcpys that LLVM cannot remove · Issue #79914 · rust-lang/rust · GitHub

It'd be nice if it was better, certainly. And the latest LLVMs have gotten substantially better at it already.

5 Likes

True, Rust would have to use lifetime analysis to know that that it's a pure function in terms of that variable/argument (but not other arguments, if that's the right term) and does not keep such a reference, and then tell LLVM that it can assume such a pointer copy will not happen (not sure if the latter is possible). As you said, the example you gave is valid for LLVM but not allowed in Rust.

I think that's one area where Rust has a (small) edge over C/C++, which do not have a borrow checker and thus cannot know the function doesn't store a pointer to it somewhere.

If you want to make sure to avoid copying, why not pass a reference to this data instead of moving it?

1 Like

It doesn't even have to store the pointer. It can just remember the address. Lifetime analysis is not sufficient; analysis of the actual function body is required to derive that the pointer does not escape and the address is not observed.

This information could then be remembered in the signature and used during optimization of callers even without inlining, but it requires that analysis of the body.

1 Like

True, it would be an observable difference in behavior if (for example) the function compares the addresses of arguments between invocations (thus preventing general optimization). I'm not quite sure, does Rust guarantee anything about the addresses (for example stored as a usize) after the end of the lifetime? At that point there is no guarantee about anything you'd read from that memory address (could segfault), but the address itself? Consider the following:

fn main() {
    let a = T::new();
    let b = T::new();
    outer(a);
    outer_2(b);
}

#[inline(never)]
fn outer(c: T) {
    do_stuff(&c);
}
#[inline(never)]
fn outer_2(d: T) {
    // Random variable that isn't used and can be optimized away
    let x = u64;
    do_stuff(&d);
}

By the time outer_2 is called, a has been dropped (could also have been moved somewhere else by outer, but let's ignore that for now). do_stuff could store the address somewhere and it would be valid, save Rust (as long as there is no memory access using that address). If the compiler removes x (To be fair: I'm not 100% sure if it actually does that) or for some reason changes the stack layout of outer_2 so it is different to that of outer (for example when enabling optimizations and completely eliminating a redundant stack variable), or any other change that affects the layout on the stack it would change if b is or is not at the same address as a. Therefore any comparison between the address c and d are pointing to is already completely up to the compiler/optimizer/LLVM as to what the result will be (as a and b are moved into the function).

The same goes the other way: Is there anything (guarantees given by the compiler) that prevents the Rust compiler from moving k around between calls to do_stuff in the following example? It is not Pin and thus not guaranteed not to move. (Disclaimer: I don't know if the compiler is allowed to add moves when it thinks it is useful. Probably not, as this is effectively the same as this problem: An optimization or change with an observable difference in behavior).

fn main() {
    let k = T::new();
    // I'm using mutable references in this example, so it's clear that
    // this only applies when the lifetime "ends" when `do_stuff` exits
    // and `do_stuff` can only keep the address (without any guarantees
    // about the underlying memory).
    do_stuff(&mut k);

    // Do a bunch of other stuff, allocations, function calls, whatever

    // Call it again
    do_stuff(&mut k);
}

I'd argue that relying on the address of something being different or equal to a later reference, which was created after the end of the lifetime of the previous reference is fundamentally unstable and (likely) not useful except perhaps for some kind of heuristic. That future versions of the compiler could at any time change something in the stack layout or how functions are called which would break any promises about the value of an address (not the value of the thing at that address) that may exist, hence I conclude that there are no such guarantees (at least across compiler versions and most likely across optimizer levels/settings).

So yes, when just considering one single optimization step only using lifetime analysis (instead of function body analysis) could result in a difference in observable behavior, but there are a lot of other things (especially across compiler versions) with a similar impact (basically any observation of stack memory addresses depend on this). Hence my question of what guarantees even exist about the address (again, not the memory location) after the end of the lifetime you got that address from.

(again, it's quite possible that I'm missing something here or that this would be considered a breaking change due to it not being documented anywhere)

I wouldn't be surprised if nobody depends on an address (no memory access) after the end of a lifetime, due to this.

Long story short: I'm curious what you think about this and if there are any such guarantees (e.g. across compiler versions)

1 Like

This scenario was already mentioned.


The arguments about observing the address of the same live stack slot are basically the same as the arguments about observing different live stack slots.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.