GlobalAlloc::alloc() for zero sized layouts

GlobalAlloc::alloc() has a (surprising to some) safety precondition that the layout must have non-zero size. This means that writing code polymorphic over the layout size requires branching on whether the layout size is zero before allocating with that layout. If this required branching is forgotten, the code has undefined behavior for layouts with zero size.

In a world where GlobalAlloc::alloc() would be defined on zero size layouts to return a dangling pointer (non-zero, well-aligned), generic code could simply be parametric over the layout size without branches. Some additional consequences would be:

  • Less chances of undefined behaviors by simplifying a surprising safety precondition.
  • Better code size by factorizing the zero-size branch inside the allocator (if the allocator needs at all such special logic) instead of having it in all polymorphic callers.
  • Worse performance by forcing the allocator to handle zero-size layouts even when it is not called with such layouts (and assuming the allocator needs special logic to support zero-size layouts).

My questions to the forum:

  • Is allocation (and thus deallocation) the only operation that is not parametric over the layout size? (AFAICT reading and writing zero bytes on a dangling pointer is well-defined.) If the answer is no, then the hypothetical change above would not be enough to achieve its goal.
  • Are there other negative consequences to the hypothetical change above (besides worse performance and that it's a breaking change for allocators)?
  • What was the rationale behind requiring non-zero size layouts? Was it not clear how to specify the behavior in such cases at the time?
8 Likes

That's simple — system allocation APIs like malloc have the requirement that size is nonzero, and GlobalAlloc inherits the limitation from there.

The Allocator trait currently allows allocating with a zero sized layout, and allocation is safe due to that. But this also means that "zero sized" allocations may actually do some allocation work (e.g. to track the allocation), and thus we've found that collections generally still want to avoid calling into the allocator for the zero sized case to ensure ZST allocation still remains a no-op.

4 Likes

I'm still missing a part of the rationale, unless you mean the rationale was "copy/paste the specification of historical allocators without reconsidering them". Otherwise it would have been an option to still leverage those allocators but with a better specification, simply with a thin inline wrapper implementation:

if layout.size() == 0 { ptr::dangling() } else { malloc() }

That's nice. I couldn't find if this is meant to eventually become the global allocator API, or only the API of custom allocators. If that's the first, then we can close this thread.

This reasoning seems somehow flawed to me:

  • The "may" in the first sentence is quite important. The end user (the one defining the global allocator) can choose an allocator that doesn't track zero-sized allocations in production builds (that should probably be the default) and one that does in debugging builds for example, or whatever else they need for their use-case (collections can't even have an opinion about that).
  • If some collections care about no-op ZST allocations, they can simply not call the allocator (which is anyway what they do right now). They don't need to decide how the allocator API should look like and force their opinion on the rest of the language, in particular given that it doesn't even matter to them if the API accepts zero-size layouts or not.
  • Why not specify "zero sized" allocation to return a "stateless" dangling pointer? (like I suggested in OP) Most implementation would probably call ptr::dangling() but polymorphic ones could simply return the first address of the first "slab"/"block"/"region" of the proper alignment from which they allocate if that fits their logic for non-zero-sized allocations as well, thus without needing any branching or any state. (That dangling pointer can't end up in the middle of a future allocation as long as the allocator always keep one region for each supported alignment and produces dangling pointers from those.)
3 Likes

The second part of the rationale, I suppose, is that LLVM optimizations for allocation functions (e.g. allocation elision) assume a malloc or aligned_malloc shape. Using something bespoke means abandoning any optimization of the dynamic allocation calls, at least with the current stack. LLVM might handle __rust_alloc(0, 1) "correctly," or it might not; the LangRef is unfortunately light on the semantics of the allockind attributes beyond that the annotated functions "behave like malloc."

A tertiary rationale is that it's slightly safer to ensure a straightforward hookup between a malloc shaped implementation and GlobalAlloc is sound than to require an extra wrapper on that side of the bridge.

That's the performance bit — because global allocation is always dynamically dispatched, there's no "inline" available (to the caller). Allocation is distressingly hot, so even that little bit of additional overhead needs justification.

For the layout size ≤ isize::MAX rule, it was decided eventually that having every caller responsible for upholding that requirement was intractable, and Layout started enforcing it on our behalf. On the other hand, though, the non-zero requirement for allocation is well established for manual allocation in C/C++ land and isn't particularly onerous — especially since many collections want/need to handle the zero sized case specially anyway.

That's basically what impl Allocator for GlobalAlloc does. Having some alloc::DangleEmpty<A> to wrap allocators and guarantee ptr::dangling semantics for zero-sized allocation is likely a good idea, though.

Because it's not as simple as you might initially think, e.g. handling realloc(p, 0), which C23 finally caved and made into UB because of unresolvable existing divergence.

Collections are the primary consumer of allocation API, so their needs does directly inform the shape of the allocation API. And what we're finding is that most collections more involved than a naive Box already want to treat zero sized allocation especially.

Because if it's that level of guaranteed, then it should be handled on the caller side of the bridge, not on the impl GlobalAlloc side of the bridge. And handling the case of zero-sized allocation on both sides of the global general purpose dynamic allocation API is needlessly non-zero-cost.

The “if” is doing a lot of heavy lifting there. Because most allocators allocate some control block for each allocation to handle the malloc shape which doesn't provide size/align information on deallocation, so aliasing zero-sized allocation into the same control block is potentially problematic. Especially since synchronization is involved.

Ultimately, it's just a lot simpler to forbid zero-sized allocations for the global allocation bridge. The nicer Allocator API can (and I still think should) allow for zero-sized allocation, and we can (and I think should) allow for #[global_allocator] to use impl Allocator without explicit impl GlobalAlloc (either via default impl or specialization-like magic).

4 Likes

Thanks! That's exactly what I was missing. This makes much more sense now.

Sorry but this is still flawed. And actually the proof of that, is that collections don't use GlobalAlloc anymore, they use Allocator, and thus rely on the impl Allocator for Global by default, which is the thin wrapper. And the reason they are fine with Allocator is because it doesn't matter to them that Allocator accepts more layouts, they simply won't call it with zero-size layouts (and I guess now inlining works, because it's static dispatch, so they don't pay the cost of the thin wrapper). However, this is irrelevant to the discussion so we can simply disagree on this particular point.

As of my current understanding (which might still be wrong), the answer to this thread would be:

  • GlobalAlloc was designed based on LLVM optimization capabilities (like many other design decisions in Rust).
  • Allocator is slowly replacing it, but not because of its better API. Collections adopt this new API to provide finer control over the allocator to the user.
  • Library authors should write code parametric over Allocator instead of calling directly into GlobalAlloc. This will reduce their chances of writing UBs while providing finer control to their users. (If they want to optimize their handling of ZSTs with an additional branch, they can, but they don't have to if they prefer robust code through simpler invariants.)

I'll note that anything that wants a const fn new() for zero-size allocation (be that Vec::new() or a hypothetical Box::<[T]>::empty() or whatever) de-facto needs to do that anyway. Because you can't call the allocator in const, and even if an allocator supports zero-size, if you didn't get the pointer from the allocator in the first place you can't deallocate it, and thus you need the check on the deallocation path too.

TBH it'd be kinda nice to rethink Layout for the future allocator API. Maybe just pass NonZeroSize (aka isize is 1..=isize::MAX) and ptr::Alignment separately, even...

I'm not fully convinced, tbh; the semantic benefit that Layout provides is still meaningful, in enforcing that the round-up size fits in isize, along with the layout manipulation methods. And Layout's scalar pair ABI means that there's no ABI difference between the separate or combined function call, so it's zero cost. Even if Layout's API deals in usize instead of the more constrained size/align integral types.

Plus, I still think Allocator allowing callers to ask for zero sized allocations is the correct call, even if the common case is wanting to avoid such, and although I firmly believe that GlobalAlloc forbidding such was the correct choice for the global dynamic allocation dispatch.

But we could make NonZero<Layout> "just work," technically, if we wanted to.

The storage API sketches have all used Layout, and minimizing diff from Allocator as much as possible is I think the right call for it, to focus on handles and what they mean, instead of other extraneous details.

The biggest question about Allocator is tbqh the NonNull<[u8]> return type. While we get real benefit today from passing size/align to deallocation, slack space on allocation doesn't seem like something that allocators want to provide or that collections want to take advantage of in the default allocation path. Due to GlobalAlloc not exposing the slack, it's better to just ignore the returned size and just use the one you already had to have to request the allocation with.

Just as an addendum, under the storage API, this does still require some amount of cooperation from the storage; there needs to be a const way to create a placeholder handle, which is different from a raw pointer.

The manipulation methods I agree are useful.

The round-up I'm less convinced, since NonZeroSize + Alignment can't overflow a usize, so much of the benefit of the rounded-up invariant -- the main reason it's one thing at all -- disappeared once the isize::MAX limit started to exist.

And I feel like a realloc equivalent that doesn't need to handle "oops, they asked for a different alignment this time" would be a good thing to have, and that would thus not take two Layouts anyway.