The way to see how memcpy and alloca introduced by move is optimized

I know memcpy is introduced by move i.e.

pub fn loop_clone_string<'a>() -> Vec<String>{
    let mut vector_string = vec![];
    let mut origin = String::from("a");
    repeat_outlined(&mut origin);
    let copied = origin; // memcpy introduced without inlining
    push_outlined(&mut vector_string, copied);
    vector_string
}

#[inline(never)]
pub fn repeat_outlined(s: &mut String) {
	*s = s.repeat(42);
}
#[inline(never)]
pub fn push_outlined(v: &mut Vec<String>, a: String) {
    v.push(a);
}

is compiled to the code like

...
define void @example::loop_clone_string(ptr noalias nocapture noundef writeonly sret(%"alloc::vec::Vec<alloc::string::String>") dereferenceable(24) %_0) unnamed_addr #1 personality ptr @rust_eh_personality !dbg !284 {
start:
  %copied = alloca %"alloc::string::String", align 8
  %origin = alloca %"alloc::string::String", align 8
  %vector_string = alloca %"alloc::vec::Vec<alloc::string::String>", align 8
...
bb1:                                              ; preds = %bb7
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(24) %copied, ptr noundef nonnull align 8 dereferenceable(24) %origin, i64 24, i1 false), !dbg !366
...

by rustc -Copt-level=3 --emit=llvm-ir see Compiler Explorer

If the mutation before the assignment and the borrow after the assignment is inlined, memcpy and alloca for copied are removed by rustc optimizations.

I believe LLVM does not optimize this because the inlined LLVM IR with opt-level=0 couldn't be optimized by opt trunk Compiler Explorer is the O2 opt result, and local O3 result also can't remove alloca and memcpy for copied.

Where are the optimizations done? Is there any way to track this optimization? I appreciate any help you can provide :slight_smile:

(Edit: fix redundant info and embed the code to clarify

I also asked on the Rust forum. The way to see how memcpy and alloca introduced by move is optimized? - #3 by khei4 - help - The Rust Programming Language Forum

1 Like

This might be of interest:

https://reviews.llvm.org/D153453

1 Like

Thanks! Yes. FWIW, I'm working on the removal of those memcpy, called stack-move optimization (although this patch is too restrictive, a coming patch will remove:) )which I took over from pcwalton's patch, but I'm wondering whether it's worth patching on MIR side optimization if it's possible.

1 Like

I've also been trying to tackle this from the rust codegen side, though my latest stab at it needs more work: Avoid `memcpy` in codegen for more types, notably `Vec` by scottmcm · Pull Request #112733 · rust-lang/rust · GitHub

A big problem here is provenance and undef and such, so perhaps the thing that would be most useful to rust for this is the bytes type proposal, so that we could move a String as b192, and thus allow it to be SSA register-ified, making it easier for LLVM to get rid of the allocas for small, known-size types like that.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.