Could we support unwinding from OOM, at least for collections?

I’m running x86_64, Ubuntu 14.04. malloc fails quite regularly for me. Rumours of overcommit seem greatly exaggerated.

SIGILL is easily generated by:

let b = vec![0;30 * 1024 * 1024 * 1024];

While, ideally, rust would expose fallible calls like malloc via Result<> types, those APIs look very far away.

I started porting an in-development C image processing server to Rust after hearing catch_unwind was stabilized. Then I discovered that the crowds were shouting for all OOM situations to abort instead of panic. I found this strange, as small mallocs will often succeed after large mallocs fail. And large mallocs are quite important to my use case - and my use case permits many strategies for backing off large malloc requests if they fail.

So while my current C code gracefully handles all allocation failures, I cannot seem to accomplish the same in Rust - and real problems arise quite quickly.

1 Like

Like you, I believe that unwinding from OOM is an important use case for user-space. I also believe that there needs to be a library that exposes Result<> based APIs for allocation, for use in freestanding environments.

I think that aborting on OOM is not reasonable in general. It is reasonable in some situations, such as for client-side programs that should never, ever run out of memory, or for servers that can simply be restarted.

Note that not every language supports recovering from OOM. Perl doesn’t by default, and Go and GHC Haskell both treat OOM as fatal. Furthermore, parts of glibc exit the process on OOM!

As far as your image processing server is concerned, you can implement a custom allocator that panics when it runs out of memory. If you do that, expect to file bugs against the unsafe code in libstd for not being robust in such situations. Alternatively, you can keep track of all memory in use, and fail requests that will use too much memory. The user of the server can pass the amount of memory available as a parameter at server startup. Finally, note that you can write your application without the standard library – even #![no_std] Rust is still much, much better than C.

My suggestion, if you have time, is to write a library yourself that handles OOM as you desire. libstd is by no means magical: it is entirely ordinary Rust code (possibly with a small amount of C and assembler, though I am not sure if there is any). You most definitely can implement the parts you need yourself, and they WILL work (assuming you implemented them correctly). I promise you that you are not the only one wanting this – people writing Rust in kernel mode do too, so you should be able to get contributions. Try talking to the Redox project.

libstd does not use any C or assembler. However it does use a large helping of unstable compiler black magic that you need a nightly version of rustc to do yourself.

But yes, definitely writing your own library to handle allocations and such with graceful allocation failures is a very feasible option. Even just switching from std's Vec to your own vector type can be enough.

It seems to me like it would be ideal for something like of liballoc’s heap interface to be stabilized, so one could write data structures against it directly & handle OOM however they wish in that use case, without resorting to defining a custom allocator or anything like that.

Even #![feature(panic_on_oom)] or #![feature(abort_on_oom)] as an API-less option to change the OOM behavior would be enough for me.

I realize I might (in theory) uncover libstd bugs, but I don’t expect a vector to be usable after an OOM situation; I’m likely rebooting the entire task. Trying to access data structures after an exception is a no-no in every other language; why would I try to do it in Rust? I know Rust wants to offer different guarantees, an is held to a different standard - but then, let’s tackle those specific cases by exercising malloc failure.

Here’s the tracking issue for this topic: https://github.com/rust-lang/rust/issues/27700

1 Like

Panic during panic is already an issue Rust has to deal with (e.g. when Drop panics during unwinding), so panicking on OOM wouldn't introduce any new problem.

It seems fair if the second OOM during OOM panic aborted the process, but I'd really like Rust to give process at least a chance to recover.

I'm assuming unwinding doesn't have to do large allocations, so it would have a chance to work if the process caused OOM with a large allocation (e.g. process panicked when it tried to allocate 1GB, but unwinding just churns through 100KB). There are also strategies for making programs unwind with no allocations, IIRC Postgresql pre-allocates memory needed for error recovery.

Having function variants that return Result is not a great solution for me, because allocation in Rust is not very explicit, and I would have to be very careful to use the right variant everywhere in my code, e.g. remember that collect() allocates too.

I see panic on OOM as very important for preventing DoS on servers. I don't want one bad request to take the whole server down, e.g. if I forget to put a sensible limit on some resource and malicious user requests /?resource=9999999999999999 I'd prefer that to panic that one request with OOM rather than crash the server and interrupt all requests for all concurrent users.

2 Likes

Another important use case of panic on OOM is simply figuring out what went wrong. Currently, some allocation failures will cause the program to just immediately exit with no indication that anything even went wrong, apart from nonzero exit status. It doesn’t even tell you that you ran out memory, let alone where.

3 Likes

That’s perhaps my biggest problem with the current OOM situation, the fact that it’ll completely abort the entire process without as much as a peep, despite the majority of OOM situations actually being the user just overflowing the length they pass to a Vec. At the very least on OOM we should try to print out a backtrace.

3 Likes

I have since realized another reason why OOM recovery is crucial: libraries that use custom allocators that allocate memory by requesting it from the host program, to limit the memory that can be consumed by untrusted input. It may not be possible to bound this a priori (consider the case of a sandboxed but Turing-complete scripting language), so the host may impose a limit for security reasons. In such situations, an artificial OOM is not only possible but becomes normal operating conditions.

2 Likes

I’ve been told that writing a database also requires handling out of memory conditions, at least on operating systems which support deterministic memory allocation failures.

I know a product (written in C++) that is tested with by running the application repeatedly with a system allocator which allows one more allocation to succeed for each successive run until it exits successfully. Basically, it exhaustively tests that the application doesn’t crash or behave poorly when any allocation fails. I found the rigor involved in this approach to be most refreshing.

5 Likes

Does this product use C++ exceptions, or does it use return codes for everything?

Note that Linux supports deterministic allocation failures if one disables overcommit. The OOM killer will still be there but only fire if the kernel runs out of memory it needs for some critical structure.

Exceptions. Using return codes to report memory failure in C++ isn’t really feasible. C++ has the three exception safety guarantees which are always possible to provide for all functions. The weakest one is the basic guarantee, which is that if an exception occurred while calling a function, that all objects touched by that are in an unknown but destructible state, it isn’t allowed to leak memory or execute undefined behavior, etc. So, worst case scenario, you have to unwind the stack until every object touched by the function which threw (and had the basic guarantee) before returning to a known good state.

http://exceptionsafecode.com/ has multiple versions of a talk given by a friend of mine about exception safety in C++, if you’re unfamiliar.

In terms of Rust, I see no reason why panic in Rust should result in the program being in an unknown state. Obviously, panics aren’t considered the appropriate mechanism, but they shouldn’t make the process unrecoverable. Though, I’m not sure how one would even go about determining if panics in current Rust violate this at all.

I like the idea. I think Rust should use Result to report expected failures (I/O error, corrupted file, invalid JSON, and so on). panic!() should be used to report errors that most modules will just want to propagate up the stack, and will generally be caught (if at all) by a handler near toplevel that discards the result of whatever was being worked on at the time. Running out of memory certainly falls into the second category.

Am I correct in guessing that the C++ code had very few places where std::bad_alloc or any superclass thereof was caught?

Exception safe C++ code generally contains very few try/catch blocks, because most cleanup is done in destructors, using RAII.

I can’t remember ever seeing a catch block specific to std::bad_alloc.

I’ve never seen one in person either, so the only examples of C++ code handling bad_alloc that I’m aware of come from this cppcon talk:

Note that ideally we’d support fallable operations too, in case people want to explicitly handle failed allocations. But I think that’s a separate conversation.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.