At the moment, it can be very hard to identify compiler-generated copy/move operations in the output code. It's possible to see a sequence of load/store operations, or calls to memcpy, but you can't tell its a copy/move of a given type without deep analysis. In particular, it's very difficult to work out how much CPU time is spent on move/copies, and where they are happening.
I would love it if the compiler generated debug info for the operations it generates. This would identify the specific instructions involved in the copy/move, and what type is being moved. For example, it could generate calls to an artificial inline function called something like core::intrinsics::compiler_move::<T>.
Of course if the move/copy is completely elided then nothing should be generated. It should probably also skip (or have the option to skip) very small move/copies, esp for small Copy primitive types (I imagine the debug info would explode if every usize/u32/etc copy were marked).
So:
Is this something which does actually exist, and I've just missed it?
If not, is there an existing issue asking for it?
I'm poking around in the rustc internals right now, and I think I have some idea how to implement it (and not seeing anything obviously already there for it, so I think my answer to 1 is "no").
I saw someone mention trying something like this a few weeks ago, maybe on zulip? I don't know if that was you, but otherwise it would perhaps be good to coordinate your efforts.
I think this sounds like a good idea. I'm generally in favour of anything that gives the developer more insight into the software unless it has unacceptable overhead.
Will this "fake" frame survive optimisation and LTO though?
I think a good heuristic here would be "mark any move/copy that touches memory". When the optimisation level is 1 or higher, these moves and copies of trivial small things are normally done entirely in registers and don't affect memory at all. (Meanwhile, if a small thing does get moved or copied from memory, it is nice to have it marked to make it easy to distinguish the move from a spill – the two are difficult to distinguish by looking at the generated assembly.)
You would probably have to turn off the feature at optimisation level 0, which does actually store pretty much everything in memory pretty much all the time, and thus materialises all the moves/copies: it would add a lot of overhead and the only benefit to optimistion level 0 is that it compiles quickly, so it would probably prefer to not have the overhead involved.
This conceptually feels like it wants to be indicated in debug info as a code location, rather than as a fake stack frame (e.g. you set the source file location for the instructions doing the moves to core/intrinsics/compiler_move.rs) – debugging, profiling, etc. tools tend to work with that information already, and compilers are already used to annotating inlined code like that. But ideally the debug info would list both the synthesized file path that indicates that it's a move, and the actual location in the source code that caused the move, and I don't think existing debug info formats have good support for doing that.
I saw someone mention trying something like this a few weeks ago, maybe on zulip? I don't know if that was you, but otherwise it would perhaps be good to coordinate your efforts.
Yeah that would be nice, but it seems like it would be hard to implement - the decision to use memory or not would be the optimizing backend's, whereas I imagine the bulk of the implementation of this feature would be in the middle of the compiler. I think we'd just have to use the size from layout to key off.
I don't think I follow what you mean, but I suspect we're in agreement. I'd imagine the logical backtrace would be something like:
0: memcpy (or not, if its just inline load/stores)
1: core::intrinsics::compiler_move::<Foo> (???:???) (inlined)
2: where_I_move_Foo (movefoo.rs:123)
Originally I had been thinking that compiler_move/compiler_copy would be entirely synthetic, but it would probably be easier to define real functions in core/src/intrinsics.rs as placeholders (they would never actually be called), so there's some real source file/line to reference.
It would be nice to attribute frame 1 to the right line in movefoo.rs, as there could be multiple copies of the same type in that function that starts at line 123. Otherwise it will be hard to tell in sampling profilers like perf what is going on.
I think you are right. I was mostly thinking about the large crowd of people (colleagues or otherwise...) who seem to think that (cycle based) flamegraphs are the be all and end all of a profiling, rather than a first starting point (as is actually the case).
Those people will have a hard time telling which copy of the same type in a function is the issue.
It's implemented as a MIR transform. But instead of fiddling with filenames, it introduces a couple of new intrinsics compiler_move<T, const SIZE: usize>(...) (and compiler_copy) and makes the Operand::Move/Copy look like they've been inlined from there. So you can tell from a backtrace whether you're looking at a compiler-generated copy, and for what type and how big.
I haven't tested this very much yet (like actually try to do profiles) but I built rustc with it, and it has minimal impact on debuginfo size.