"The optimizer may further assume that allocation is infallible"

The documentation for alloc::alloc::GlobalAlloc says (emphasis added):

You must not rely on allocations actually happening, even if there are explicit heap allocations in the source. The optimizer may detect unused allocations that it can either eliminate entirely or move to the stack and thus never invoke the allocator. The optimizer may further assume that allocation is infallible, so code that used to fail due to allocator failures may now suddenly work because the optimizer worked around the need for an allocation.

Given that many functions in the standard library attempt to handle allocation failure (by panicking or aborting) and that there has been interest for years in allowing Rust users to attempt to handle allocation failure in less disruptive ways, I think the permission that the emphasized phrase gives to the optimizer is much broader than is intended — if the optimizer actually used that permission, all the standard library's checks for allocation failure would be useless.

I considered submitting a patch to change this phrase, but I don't know anywhere near enough about Rust at this level of detail to know to what to change the phrase, so I open this topic to propose that it be changed and leave to more knowledgeable persons choosing what phrasing to use.

Maybe "The optimizer is allowed to eliminate an allocation even if the allocation might, or would, have failed. There is no guarantee that code that fails with an allocator error in one version of Rust will continue to fail in future versions of Rust; instead, it may cease failing because the optimizer worked around the need for the allocation."?

17 Likes

First of all, this assumption is specific to the global allocator, and does not apply to custom Allocators; the docs should probably be clear about that.

Second, the docs as written seem to imply that the optimizer can insert new allocations, or make them larger, which would be incorrect (I think?)

There is some discussion about this on the unsafe a codes guidelines repo:

3 Likes

As I understand the occasion of the above post: the issue isn’t that the optimiser may work around allocations, it is the statement that it may assume infallibility, which would seem to imply that branches checking for failure should be optimised away. This is clearly problematic if true, and it should be clarified if false.

4 Likes

I believe the intent behind that rule is that otherwise the optimizer would not be allowed to remove allocations as that would turn an abort for a failed allocation into a nop.

17 Likes

Can the optimizer replace an allocation with a realloc? Or otherwise reduce the size of memory being allocated

Dumb example to show what I mean:

pub fn dumb<T: Copy>(s1: Vec<T>, s2: &[T]) -> Option<Vec<T>> {
    assert!(s1.len() == 128);
    assert!(s2.len() == 128);
    
    let mut v = Vec::new();
    v.try_reserve(256).ok()?;
    
    v.extend(s1);
    v.extend(s2);
    
    Some(v)
}

It's clear that the programmer's intent is that there is a single allocation for v, of 256 items in size. But it wouldn't be unreasonable for the optimizer to exploit the fact that we own s1 and it's going to be deallocated at the end of this function to convert the allocation of 256 items into reallocating s1 from 128 to 256 items in size, potentially reducing the memory demand.

This also (if allowed) leads to a possible case where the optimizer might make an allocation larger, via merging of allocations:

pub fn calls_dumb<T: Copy>(s2: &[T]) -> Option<Vec<T>> {
    assert!(s2.len() == 128);
    
    let mut v = Vec::new();
    v.try_reserve(128).ok()?;
    
    v.resize(128, s2[0]);

    dumb(v, s2)
}

Here, calls_dumb allocates 128 items worth of space, and then fills them in; it then calls dumb, which allocates 256 items worth of space. The optimizer could, in theory, inline dumb into calls_dumb, then note that the following code has exactly the same observable behaviour and optimize calls_dumb into optimized:

pub fn optimized<T: Copy>(s2: &[T]) -> Option<Vec<T>> {
    assert!(s2.len() == 128);
    
    let mut v = Vec::new();
    v.try_reserve(256).ok()?;
    
    v.resize(128, s2[0]);

    v.extend(s2);

    Some(v)
}

I could well be missing something, mind, but I would have thought that this is legitimate behaviour for the optimizer, even though the allocator can no longer observe my allocation for 128 items.

2 Likes

Your final example is valid even if only elision is permitted, by just eliding every allocation except for the returned one.

Turning alloc+realloc into just alloc is an optimization that makes sense but isn't permitted by most of the draft opsem specs I've seen, and doesn't seem to happen in practice.

Exactly what's permitted depends on the exact opsem specification, which hasn't been decided yet. The most aggressive spec — roughly that the runtime may introduce whatever sound calls into the #[global_allocator] at any point and Global is served by the runtime arbitrarily — would permit merging alloc+realloc to just alloc, but is such a weak constraint that it's not useful to programs (e.g. you can't guarantee any code is actually alloc free, such as, say, the allocator itself). The strong guarantee — roughly that each allocated object is non-deterministically handled by the #[global_allocator] or a "magic" allocation, maybe with a clause forcing any "realized" allocation to not be "magic" — would forbid that merging but permits the elision that we already do.

If you have any proposals for how we could define the global replaceable allocator's behavior, we'd love to hear it. But as demonstrated here, trying to define it in terms of allowed optimizations doesn't really work, and we would strongly prefer an operational definition that can then be optimized for under as-if observability. Merging realloc is actually a good stress test for potential opsem, presuming we want to permit it, since it requires the alloc to be temporarily "magic" but "real" once it escapes analysis.

(I'm a member of T-opsem and contributor to WG-alloc. This is my understanding of the domain but not an official stance of the team.)

2 Likes

It would also be nice to be able to replace dealloc+alloc with reusing the allocation, since we have the size of the allocation in the deallocation path and thus could feasibly do that: Could the compiler automatically re-use `Box` allocations as an optimization? · Issue #93707 · rust-lang/rust · GitHub

And if the compiler is required to treat that free or alloc as a side-effect, that wouldn't be possible.

9 Likes

The I believe primary/canonical discussion thread for custom allocators' opsem (as posted above by Jules-Bertholet) is What about: custom allocators · Issue #442 · rust-lang/unsafe-code-guidelines · GitHub

I just noted the alloc+realloc->alloc and dealloc+alloc->(nothing) optimizations there, and proposed a new potential opsem definition. For the stable interface, it'd be roughly

Each call to alloc, alloc_zeroed, realloc, or dealloc non-deterministically performs zero or one arbitrary sound (per the trait definition) method calls to a GlobalAlloc method of the annotated #[global_allocator] static. [NOTE: This call will often be pass-through, but it doesn't need to be. For example, a source call to realloc may correspond to a call to underlying alloc instead, if a prior call to alloc was optimized out. Similarly, source allocations may never be observed by the #[global_allocator], and allocations may may have their deallocation deferred arbitrarily long after their source dealloc call. The purpose of this non-determinism is to enable optimization and elimination of allocation events as if they were pure and unobservable effects, even when the underlying implementation has observable effects.]

I think this opsem is reasonable, though it does mean that in addition to preventing #[global_allocator] from providing extra guarantees, it also can't impose any (unsafe global coordination required) additional requirements, because the optimizer-driven decoupling of calls could break those requirements even if the source never does. Such requirements would be extremely ill-advised, but T-opsem is mostly concerned with whether code is valid (doesn't cause UB), not whether it is sound nor even a non-terrible idea.

4 Likes

zero or one

Is it intentional that you don't allow realloc = free + alloc?

1 Like

This is not clear at all. There's some desire to also optimize custom allocators. Though this would probably be opt-in.


One reason for this sentence in the docs is that code that allocates more bytes than exist in the address space can actually still seem to succeed. This is clearly wild, but sadly it is how LLVM works, so we can't just ignore it. Weird unsafe code is easier to write in C so here is an example.

7 Likes

Is it intentional that you don't allow realloc = free + alloc?

realloc is not, actually, free + alloc. It's alloc + memcpy + free.

While that may seem pedantic, there's actually a very practical difference. If I have an allocator with 2MB of memory to carve out, I can possibly realloc a 1.5MB allocation into a 2MB one, but I can't alloc 2MB upfront whilst the 1.5MB allocation is still alive.

1 Like

Sure, but, realloc + memset(0) could still be converted into free + alloc.

If I understand correctly, LLVM just assumes that malloc never returns 0, which is pretty insane. Is my understanding correct? Are there examples where one can meaningfully check the return value of malloc, or does LLVM just make it impossible to properly handle allocation failure?

I think it's a bit more nuanced, as can be seen here: Compiler Explorer

IUUC LLVM can assume that allocation is infallible only if it can prove that return return pointer is not observed anywhere else. In the third function it properly tests (test rax, rax) the return pointer, instead of simply replacing is_null() check with false (as done in f2 by xor eax, eax).

1 Like

No that's not true.

Indeed. It basically says "well this is an allocation nobody ever looks at, so I'll just artificially make it succeed and provide fake memory that does not exist, which is okay since nobody uses that memory anyway". Which is almost a sound argument except for the issue that the address space is finite and LLVM can make it seem bigger than it actually is...

8 Likes

Why could it be a problem in practice? Operating systems already routinely lie about available physical memory, which arguably is much worse.

2 Likes

Do they? What do you mean?

I mean stuff like overcommit and swap.

1 Like

And memory compression.

IIUC, my_vec.sort() can cause an allocation faillure on MacOS because the memory is compressed, and thus a re-ordering can change the compression ratio, thus may require more memory, and thus may lead to an allocation faillure!

1 Like

Note that my_vec.sort() already allocates scratch space for mergesort, so this is not the best example. .sort_unstable() doesn't allocate through, so that might make a better example.