Can compiler’s optimizer eliminate trivial heap allocation?

findepi · September 14, 2024, 2:31pm

( cross posting from Can compiler's optimizer eliminate trivial heap allocation? - The Rust Programming Language Forum per advice there )

I have two logically equivalent functions that take a string and simply return it. This is a problem abstracted from a data processing library, so the result is returned via an output writable parameter (here &mut String for simplicity).

pub fn append(to: &mut String, a: &str) {
    to.push_str(a);
}

pub fn allocate_append(to: &mut String, a: &str) {
    to.push_str(&a.to_string());
}

Using https://godbolt.org/ (with -O compiler flag) i can see that the string allocation in the allocate_append is not being eliminated, so the function has one more memcpy call and also additional calls to __rust_alloc and __rust_dealloc.

Question: under what circumstances can the compiler eliminate heap allocation? if never, will it be possible in the future, or are there fundamental limitations preventing this?

Note: I wrote a similar code C++ code and run it thru godbolt with gcc -O9. Same behavior.

Note 2: this is a stripped down example. In my original problem I have a function with String return type and the caller writes that string to output buffer. String return type is very convenient, but to eliminate allocation i need to resort to passing an "out" param like &mut Write.

the8472 · September 14, 2024, 6:05pm

findepi · September 14, 2024, 6:27pm

Thanks the8472!

Your example clearly shows function-local allocations can be eliminated, thanks! I can see this working when your example is modified and Vec construction happens in a separate function, so it's possible for return types as well.

I was unable to get the allocation to be eliminated in any non-constant scenario though -- as in my original example, &a.to_string() could be replaced with just a.

Do you maybe know if such optimizations will be possible in the future? or is there a fundamental reason for which they are prohibited?

scottmcm · September 14, 2024, 6:36pm

That's technically an observable difference, so is probably not allowed.

Suppose you had let a = "my literal";, then the address of the elided allocation is different from that of the string literal. And for all the compiler knows, you might have behaviour that depends on the address being different -- even though you as a human know that you almost certainly don't.

So if you never use the pointer's value and just read through it, the compiler is allowed to elide the annotation, but unless the compiler can tell you certainly don't use an address, it's (unfortunately) stuck preserving address identity.

findepi · September 14, 2024, 6:52pm

Thanks @scottmcm ! In this example, the a and &a.to_string() are used as an argument to String::push_str. This function behavior depends on string content, but not its address. I think the compiler sees-through push_str (the capacity checks are inlined). What would it take to make it realize pointer address doesn't matter?

the8472 · September 14, 2024, 7:58pm

Shouldn't nocapture handle this? Afaict it does know that push_str's argument is inferred to be nocapture.

Ddystopia · September 15, 2024, 3:18pm

Vec's FromIterator guarantees to reuse allocations sometimes:

the8472 · September 15, 2024, 3:40pm

Not a guarantee. The documentation clearly says

In general Vec does not guarantee any particular growth or allocation strategy. That also applies to this trait impl.

Note: This section covers implementation details and is therefore exempt from stability guarantees.

Vec may use any or none of the following strategies, depending on the supplied iterator:

and it intentionally doesn't say when exactly that optimization is applied because, well, it's an optimization.

scottmcm · September 18, 2024, 3:38pm

As I understand it that's about https://llvm.org/docs/LangRef.html#pointercapture, which isn't enough to say you're not comparing it to, say, the address of a static.

But yes, it'd be nice if there were a "address isn't important" LLVM thing that allowed more optimizations. Maybe there is one? I'm not sure.

(In my hypothetical post-Rust ideal language, types would have to opt-in to meaningful addresses, and thus anything using the actual memory location of something would be usually disallowed, to avoid needing to worry about these things.)

the8472 · September 18, 2024, 4:01pm

I understand condition condition 3 as precluding the comparison to a static

The call’s behavior depends on any bit of the pointer carrying information.

The IR example even compares to a global and branches on that.

scottmcm · September 18, 2024, 4:05pm

I'd just read the

if it makes a copy of any part of the pointer that outlives the call

part and stopped there, because a ptreq inside a the method doesn't sound like "outlives" at all.

So I don't know what condition 3 means exactly -- whether it's just talking about something like keeping one bit in the sense of trunc i1 and saving that, or also about things that don't outlive.

the8472 · September 18, 2024, 5:08pm

Look at the example IR, it doesn't do any bit manipulation on the pointer itself. Just compares it to another value. It seems like if you could so much as infer a single bit of the pointer, even if the inference is conditional on other things, then this would violate nocapture. Otherwise this section makes no sense. So I don't think that's the optimization blocker here.

Vorpal · September 18, 2024, 5:59pm

It seems to me that condition 3 precludes pointer tagging schemes. Of course you can infer bits of (non-tagged) pointers by knowing the alignment of the pointee. I know the lowest few bits of any pointer to a u64 will be zero. If I didn't then I would have to assume that all loads and stores would b have to done unaligned.

So not sure how to square such knowledge with the LLVM manual (if we interpret this as "infer").

(I might also know that user space pointers on my architecture always have the upper N bits as zero, or that my embedded system only has things in its address space below 0xeffffff or such, but those seem even less likely to be relevant)

the8472 · September 18, 2024, 7:13pm

You can erase the tags by masking, this does not leak bits because a tagged and untagged pointer will look the same. Only when you're trying to extract tag bits you're looking at the pointer, at which point I assume it would no longer be allowed to be nocapture, which seems fine to me. Plus you can't do pointer tagging on align pointers (which we use for references) either.

"infer" was my not-sufficiently-precise word choice.

To refine it further: If runtime behavior that is not UB can be used as an (possibly partial) extractor for address bits of a pointer then it can't be nocapture.

Reading from non-allocated addresses (such as high bits being set) would be UB and is therefore not observable. Loading from unaligned pointers where the load is annotated align would be UB and therefore not observable, etc. etc.

Topic		Replies	Views
Why rustc 1.71 no longer eliminates Box allocations? compiler	12	750	February 13, 2025
The way to see how memcpy and alloca introduced by move is optimized compiler	4	950	October 12, 2023
Small string optimization: remove as_mut_vec libs	71	13877	March 25, 2019
Short String optimization	30	13108	March 25, 2019
Copy elision & RVO optimization compiler	12	2499	November 26, 2022

Can compiler’s optimizer eliminate trivial heap allocation?

Related topics