The way to see how memcpy and alloca introduced by move is optimized

khei4 · July 14, 2023, 9:43am

I know memcpy is introduced by move i.e.

pub fn loop_clone_string<'a>() -> Vec<String>{
    let mut vector_string = vec![];
    let mut origin = String::from("a");
    repeat_outlined(&mut origin);
    let copied = origin; // memcpy introduced without inlining
    push_outlined(&mut vector_string, copied);
    vector_string
}

#[inline(never)]
pub fn repeat_outlined(s: &mut String) {
	*s = s.repeat(42);
}
#[inline(never)]
pub fn push_outlined(v: &mut Vec<String>, a: String) {
    v.push(a);
}

is compiled to the code like

...
define void @example::loop_clone_string(ptr noalias nocapture noundef writeonly sret(%"alloc::vec::Vec<alloc::string::String>") dereferenceable(24) %_0) unnamed_addr #1 personality ptr @rust_eh_personality !dbg !284 {
start:
  %copied = alloca %"alloc::string::String", align 8
  %origin = alloca %"alloc::string::String", align 8
  %vector_string = alloca %"alloc::vec::Vec<alloc::string::String>", align 8
...
bb1:                                              ; preds = %bb7
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(24) %copied, ptr noundef nonnull align 8 dereferenceable(24) %origin, i64 24, i1 false), !dbg !366
...

by rustc -Copt-level=3 --emit=llvm-ir see Compiler Explorer

If the mutation before the assignment and the borrow after the assignment is inlined, memcpy and alloca for copied are removed by rustc optimizations.

I believe LLVM does not optimize this because the inlined LLVM IR with opt-level=0 couldn't be optimized by opt trunk Compiler Explorer is the O2 opt result, and local O3 result also can't remove alloca and memcpy for copied.

Where are the optimizations done? Is there any way to track this optimization? I appreciate any help you can provide

(Edit: fix redundant info and embed the code to clarify

I also asked on the Rust forum. The way to see how memcpy and alloca introduced by move is optimized? - #3 by khei4 - help - The Rust Programming Language Forum

djc · July 14, 2023, 10:13am

This might be of interest:

https://reviews.llvm.org/D153453

khei4 · July 14, 2023, 10:14am

Thanks! Yes. FWIW, I'm working on the removal of those memcpy, called stack-move optimization (although this patch is too restrictive, a coming patch will remove:) )which I took over from pcwalton's patch, but I'm wondering whether it's worth patching on MIR side optimization if it's possible.

scottmcm · July 14, 2023, 5:57pm

I've also been trying to tackle this from the rust codegen side, though my latest stab at it needs more work: Avoid `memcpy` in codegen for more types, notably `Vec` by scottmcm · Pull Request #112733 · rust-lang/rust · GitHub

A big problem here is provenance and undef and such, so perhaps the thing that would be most useful to rust for this is the bytes type proposal, so that we could move a String as b192, and thus allow it to be SSA register-ified, making it easier for LLVM to get rid of the allocas for small, known-size types like that.

system · October 12, 2023, 5:57pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Defer memcpy when passing types that is !Copy until it is actually moved to another variable	5	1300	October 14, 2022
Can compiler’s optimizer eliminate trivial heap allocation? compiler	13	687	September 18, 2024
Memcpy()ing into a slice of cells libs	3	537	August 9, 2021
Can we pass `Copy` values by immutable reference? compiler	38	3055	April 13, 2023
Memcpy is backwards	59	14189	March 25, 2019

The way to see how memcpy and alloca introduced by move is optimized

Related topics