Questioning usefulness of alignment in OOM errors and handlers

Out-of-memory handling in alloc customarily passes Layout to functions like handle_alloc_error. Layout is also embedded in TryReserveError's internals.

At the first glance it seems quite sensible, because Layout is given to allocators, so OOM handlers get exactly the same information as the allocator.

However, Layout is just size + align, and I don't see any serious use-case for having align.

  • It's not possible to use the align field to report an invalid alignment. It has to be valid by the contract of Layout. Invalid align values will never reach the memory allocator or the OOM handling machinery. Invalid alignment goes through different paths, and causes panic!("capacity overflow") or a TryReserveError with (not yet stabilized) CapacityOverflow kind. When capacity overflow happens, you don't get any details at all.

  • The Allocator API takes a Layout, so any direct users of allocators already know what layout they're using. The allocator has no need to report it back, and couldn't even if it wanted, because AllocError is a unit struct (which I think is good, because it keeps Result of allocators as small as possible).

  • As far as I can tell, all current valid values of the alignment are pretty boring and predictable. Collections always use mem::align_of::<T>. Rc/Arc only pad it to the alignment of usize. Users can easily replicate the align value if they really wanted to. I can imagine some more advanced collections could try to align data to cache lines or page size, but even then, what's the use-case for reporting that to the OOM handler in particular?

Users may want to track how much memory is wasted due to alignment and other reasons, but the OOM handler is unsuitable for that. It only gets a singular data point, only in exceptional circumstances. In practice, all metadata about memory usage and efficiency is better collected by wrapping the allocator, which knows the Layout and its own internals, and can expose a custom interface for reporting as much info as desired (which jemalloc and cap already do).

There's an open question whether custom allocators should be returning custom errors, but Layout.align certainly isn't it. Layout is stable, not parametrized by an allocator, and extra info would go into its own separate type.

Hashbrown's TryReserveError exposes the Layout. I've searched github for uses of it, and couldn't find any uses of align. It's common to just map it to application's own Error::OOM.

There's unstable set_alloc_error_hook. I've searched github for its uses. One printed {layout:?}, and all more specific uses only looked at the layout.size() or didn't even touch the layout. Gecko's OOM handler only takes size. The hook is scheduled for removal to be replaced by oom=panic.


Therefore, I suggest reporting just the size on allocation failures in try_reserve and OOM hook/panic. I think even size isn't strictly necessary, but it can be useful to know whether an allocation failure happened on an unrealistically-huge bigger-than-ram allocation, which suggests it's a bug in the program (e.g. a negative size) not an actual OOM situation.

This would allow TryReserveError to be half the size. It can internally be a NonZeroUsize (or a custom usize niche) with values > isize::MAX meaning capacity overflow. It'd be cheap to construct (from saturating arithmetic), and it'd be small in Result<(), TryReserveError>. Currently it's an opaque type, so its internals can be changed. There's proposed AllocErrorPanicPayload that exposes Layout, but that could be easily changed to return only size.

2 Likes

This is reasonable, since general purpose GlobalAlloc impls should be able to satisfy any alignment request. However, just to note, seeing the alignment in the error message would be useful for custom allocators which do fail when requesting an unsupported alignment.

For comparison, the C++ default std::bad_alloc::what (~AllocError::description) is typically just "bad alloc" or similar (doesn't report allocation size) and std::new_handler (~AllocErrorHook) is void(*)() (fn()).

1 Like

In principle, the alignment information is useful. An allocator will likely keep different pools of memory per alignment class, and thus you can get an OOM due to alignment-based memory fragmentation, where there is sufficient memory for small but not large alignment types. Whether that information is useful to report to handle_alloc_error or embed in TryReserveError, I have no opinion. OOM in general is hard to handle gracefully.

4 Likes

If hooking the OOM handler makes it possible to retry the allocation (for example after freeing caches), the alignment would be useful

3 Likes

Another case where alignment can be useful: if the overallocation scheme for large alignment is used, a large alignment means the actual allocation size is that much larger, which can matter if that extra slack puts you over some maximum allocation size.

I say keep the information in the panic payload, but drop alignment from TryReserveError (since T implies it). Seems reasonable enough to me.

3 Likes

The OOM handler has -> ! return type, so it doesn't have a sensible way to retry an allocation. The OOM handler also does not have a reference to the allocator instance that reported failure, nor the type that has been allocated.

With oom=panic the OOM handler can unwind, and then the caller may catch the panic and retry the allocation, but that's overcomplicated and abuses panics to be C++ exceptions. Retries are more straightforward when not using the OOM handler:

if vec.try_reserve(n).is_err() {
   ALLOC.flush_caches_for(Layout::for_size_align(n, mem::align_of::<T>()));
   vec.try_reserve(n)?;
}

That's true that alignment can affect success of allocations, but what useful thing a general global error handler (or catch unwind caller) can do with it this bit of information?

1 Like

Some alternative designs that could provide even more information:

  • The allocator instance can report more information "out of band", e.g. keep information about few most recent allocation failures, and the OOM hook could query that information. let lots_of_info = ALLOC.last_error(). Or the custom allocator could write failures directly to application's error log.

  • The Allocator trait could have an associated type for allocator's AllocError type, which could include more info. OOM panic payload could support downcasting to that custom error type. I imagine most allocators in release mode would keep that error as an empty unit type for performance, but they could switch to verbose errors in debug/testing builds.

Mostly I was commenting that it's still potentially useful to the developer, so the error message shown by the hook including both size and alignment provides some context to narrow down potential causes.

For the case where the OOM hook knows the allocator in use, this is likely the best solution. Sidechannel error reporting is painful for common paths (e.g. errno/last_os_error) but actually somewhat ideal for edge case probably-fatal-anyway errors. The allocator is probably also the best equipped to resolve potentially recursive OOM.

I wonder if adding handle_alloc_error to the Allocator trait would make sense. In the presence of custom bounded allocators, knowing which allocator class errored is likely more useful than the layout of the allocation which failed for any allocator that doesn't have a global fallback.

3 Likes

Alignment is also useful for narrowing down the source of bugs. Either the allocation failure is from an absurd size alone, or it isn't. I'm imagining something like @afetisov's suggestion of memory pools grouped by alignment, or alternatively an absurd alignment that necessarily precipitated an absurd size. In both of these scenarios I'd want access to the alignment information.

2 Likes

In the parts of libstd/liballoc that use TryReserveError and OOM handlers, the alignment is always known at compile time, so you can't get a surprising value at run time. Checking for absurd alignment can be a clippy lint (like there's one for large Futures: Clippy Lints).

I can see a custom allocator or memory pool running out of space due to fragmentation, but I think that's just yet another case of Layout not actually giving you this information, only being used a weak post-mortem clue for a guess you can't verify. So I've left feedback on allocators-wg to support providing actual failure reason from allocators: Controlling amount of information in AllocError and OOM panic payloads · Issue #121 · rust-lang/wg-allocators · GitHub

1 Like

I haven't dug into the specifics, but I suspect the reason that TryReserveError carries a Layout is specifically such that reserve can be essentially self.try_reserve().map_err(handle_alloc_error).unwrap(). For Vec the error could just be the target capacity, but for more interesting collections managing more than one allocation that isn't sufficient. If a collection owns allocations of different types, then even the alignment needs to exist for the call to handle_alloc_error to be on top of try_reserve.

Which makes me think that this might come down to making a call to stop using handle_alloc_error(Layout) and just have a nullary raise_oom_panic() -> !.

3 Likes

Having the allocation size in the panic/abort message is really useful to determine if the allocation failure was because the requested size is obviously bogus (eg trying to allocate a Vec with n elements where n comes from user input and someone put -1 in as value) or if there is simply not enough memory left on the system.

3 Likes

That is useful, but specifically in the case of -1 you will not get OOM error. It's impossible to create Layout larger than isize::MAX, so you get panic("capacity overflow") instead, with no size information.

I think it would make sense to make TryReserveError carry a size, and maybe even use saturating arithmetic for the capacity calculation, so that capacity overflow is less special, and OOM hooks or panic payloads simply get usize that may potentially be > isize::MAX.

What about OS allocators? That may be able to request freeing some memory. I haven't looked deeply into how they work when written in Rust (e.g. RedoxOS), nor at what stage in the allocation such handling would happen (maybe it would be before we even get to handle_alloc_error).

But such functions might very well need the full layout to know which memory pools to try to free resources in.

Also as others said, it is useful diagnostic information to the programmer to determine the cause of OOM (e.g. did a huge page aligned allocation fail, or was it a more normal one).

1 Like

The Allocator API is not affected by the change I'm proposing. The allocation and deallocation functions still get the full Layout (although it'd be nice if Rust could specialize the System allocator to optimize that out, but that's an entirely different thing).

Yes, that's the recurring theme, but also note that Layout doesn't say any of that! It's only a weak clue that cannot be verified. So I think this indicates it's a wrong API, because people want to know cause of the failure, but the Layout is not describing the cause of the failure. The true cause could be in AllocError returned by the allocator, or some other actual-real-cause error reporting API.

2 Likes

I don't think that is possible though. Even in a std environment on a traditional OS, you might need to call either malloc or posix_memalign, depending on alignment. There are things like SIMD types, or types contiaining atomics that have had their alignment increased intentionally with a wrapper type for cacheline reasons. Or you might want page alignment for low-level trickery.

And in a kernel or embedded setting you definitely can't optimise it out, since the global allocator might be different. Which means you would need different code paths for this case and std, and it probably isn't worth the effort to do so.

I went and looked and a Layout is just size and alignment, with alignment stored as an enum of powers of two. So there should be plenty of niches for niche optimisation in a Layout as well. Seems pretty efficient already.

And it saves the underlying allocator from having to track size of allocations itself (free has to know how much memory somehow). So it permits implementing a more memory efficient allocator potentially. Current allocators like System, jemalloc and mimalloc won't take advantage of this of course, since they aren't rust specific. But I'm imagining a rust-centric future, and then this is nice to have.

(Tangent: Related to this Rust-instead-of-C-centric future: I was a bit disappointed that Redox went with userspace rust on top of relibc, rather than completely bypass most of the legacy C mistakes for the std crate. This means that it can't take advantage of Layout in deallocation, has to deal with errno still, non-reentrant functions, thread-unsafe environment etc.)

2 Likes

Relibc could still export new functions through libredox (redox-os / libredox · GitLab) which fix all those C issues and then use those in libstd for as long as it doesn't conflict with C code. So for example a new (de)allocation api would be possible, but only because std::alloc::System is not guaranteed to directly use malloc/free on Unix systems and thus mixing allocation through std::alloc::System with deallocation using free in C is not allowed already. If it were allowed, libstd couldn't make use of that new api unless mixing it with malloc/free was permitted.

2 Likes

Would it be possible to have rust std on a theoretical platform without a libc? I guess Windows is kind of that, but it still has C-centric API and ABI.

I'm wondering what rust std on a no-legacy platform might look like. Of course such a platform might deviate in more ways (tag and query based instead of hierarchical file system is a dream of mine for example, and plan9 had a lot of different cool ideas...) which might make it hard to support standard APIs.

In a sense wasm is probably the closest we have to such a target currently.

(Oh also, there were some wild ideas in the past about how computers should work (I'm interested in computing history). Lisp machines is probably the most well known example, but there were many more. Then we mostly standardised on the boring and pedestrian Unix-like + Windows, with a handful of obscure legacy systems running in banks and similar still.)

1 Like

The only ABI which Rust supports in a stable manner is extern "C". Your target could define "the C ABI" to be whatever scheme you want, but it's still limited to using C vocabulary.

So your target's libos would end up looking like a well-annotated C API (maybe plus unwinding for "you should've checked that" errors). You might be able to get away with some cleverer types with the expectation that they'll be translated to wrappers making their use easier, but the stable layer needs to exist with a C compatible shape.

2 Likes