Can compiler’s optimizer eliminate trivial heap allocation?

( cross posting from Can compiler's optimizer eliminate trivial heap allocation? - The Rust Programming Language Forum per advice there )

I have two logically equivalent functions that take a string and simply return it. This is a problem abstracted from a data processing library, so the result is returned via an output writable parameter (here &mut String for simplicity).

pub fn append(to: &mut String, a: &str) {
    to.push_str(a);
}

pub fn allocate_append(to: &mut String, a: &str) {
    to.push_str(&a.to_string());
}

Using https://godbolt.org/ (with -O compiler flag) i can see that the string allocation in the allocate_append is not being eliminated, so the function has one more memcpy call and also additional calls to __rust_alloc and __rust_dealloc.

Question: under what circumstances can the compiler eliminate heap allocation? if never, will it be possible in the future, or are there fundamental limitations preventing this?

Note: I wrote a similar code C++ code and run it thru godbolt with gcc -O9. Same behavior.

Note 2: this is a stripped down example. In my original problem I have a function with String return type and the caller writes that string to output buffer. String return type is very convenient, but to eliminate allocation i need to resort to passing an "out" param like &mut Write.

1 Like

Thanks the8472!

Your example clearly shows function-local allocations can be eliminated, thanks! I can see this working when your example is modified and Vec construction happens in a separate function, so it's possible for return types as well.

I was unable to get the allocation to be eliminated in any non-constant scenario though -- as in my original example, &a.to_string() could be replaced with just a.

Do you maybe know if such optimizations will be possible in the future? or is there a fundamental reason for which they are prohibited?

That's technically an observable difference, so is probably not allowed.

Suppose you had let a = "my literal";, then the address of the elided allocation is different from that of the string literal. And for all the compiler knows, you might have behaviour that depends on the address being different -- even though you as a human know that you almost certainly don't.

So if you never use the pointer's value and just read through it, the compiler is allowed to elide the annotation, but unless the compiler can tell you certainly don't use an address, it's (unfortunately) stuck preserving address identity.

1 Like

Thanks @scottmcm ! In this example, the a and &a.to_string() are used as an argument to String::push_str. This function behavior depends on string content, but not its address. I think the compiler sees-through push_str (the capacity checks are inlined). What would it take to make it realize pointer address doesn't matter?

Shouldn't nocapture handle this? Afaict it does know that push_str's argument is inferred to be nocapture.

2 Likes

Vec's FromIterator guarantees to reuse allocations sometimes:

Not a guarantee. The documentation clearly says

In general Vec does not guarantee any particular growth or allocation strategy. That also applies to this trait impl.

Note: This section covers implementation details and is therefore exempt from stability guarantees.

Vec may use any or none of the following strategies, depending on the supplied iterator:

and it intentionally doesn't say when exactly that optimization is applied because, well, it's an optimization.

1 Like

As I understand it that's about https://llvm.org/docs/LangRef.html#pointercapture, which isn't enough to say you're not comparing it to, say, the address of a static.

But yes, it'd be nice if there were a "address isn't important" LLVM thing that allowed more optimizations. Maybe there is one? I'm not sure.

(In my hypothetical post-Rust ideal language, types would have to opt-in to meaningful addresses, and thus anything using the actual memory location of something would be usually disallowed, to avoid needing to worry about these things.)

I understand condition condition 3 as precluding the comparison to a static

The call’s behavior depends on any bit of the pointer carrying information.

The IR example even compares to a global and branches on that.

I'd just read the

if it makes a copy of any part of the pointer that outlives the call

part and stopped there, because a ptreq inside a the method doesn't sound like "outlives" at all.

So I don't know what condition 3 means exactly -- whether it's just talking about something like keeping one bit in the sense of trunc i1 and saving that, or also about things that don't outlive.

Look at the example IR, it doesn't do any bit manipulation on the pointer itself. Just compares it to another value. It seems like if you could so much as infer a single bit of the pointer, even if the inference is conditional on other things, then this would violate nocapture. Otherwise this section makes no sense. So I don't think that's the optimization blocker here.

1 Like

It seems to me that condition 3 precludes pointer tagging schemes. Of course you can infer bits of (non-tagged) pointers by knowing the alignment of the pointee. I know the lowest few bits of any pointer to a u64 will be zero. If I didn't then I would have to assume that all loads and stores would b have to done unaligned.

So not sure how to square such knowledge with the LLVM manual (if we interpret this as "infer").

(I might also know that user space pointers on my architecture always have the upper N bits as zero, or that my embedded system only has things in its address space below 0xeffffff or such, but those seem even less likely to be relevant)

You can erase the tags by masking, this does not leak bits because a tagged and untagged pointer will look the same. Only when you're trying to extract tag bits you're looking at the pointer, at which point I assume it would no longer be allowed to be nocapture, which seems fine to me. Plus you can't do pointer tagging on align pointers (which we use for references) either.

"infer" was my not-sufficiently-precise word choice.

To refine it further: If runtime behavior that is not UB can be used as an (possibly partial) extractor for address bits of a pointer then it can't be nocapture.

Reading from non-allocated addresses (such as high bits being set) would be UB and is therefore not observable. Loading from unaligned pointers where the load is annotated align would be UB and therefore not observable, etc. etc.