Notes + Interviews on Fallible Collection Allocations

oli-obk · August 2, 2017, 9:44am

I have another generics based idea for solving the problem. We change all methods that can fail due to oom to have a generic parameter of an AllocFailure trait, which is defaulted to give the current experience:

#![feature(default_type_parameter_fallback, associated_type_defaults)]
trait OomFailure<ReturnTy> {
    type ResultType = ReturnTy;
}

struct AbortOnOom;
impl<T> OomFailure<T> for AbortOnOom {}

struct OomResult;
struct OomError;
impl<T> OomFailure<T> for OomResult {
    type ResultType = Result<T, OomError>;
}

struct Vec<T>(T);

impl<T> Vec<T> {
    fn push<AF: OomFailure<()> = AbortOnOom>(&mut self, t: T) -> AF::ResultType {
        unimplemented!()
    }
}

fn foo<T>(v: Vec<T>, u: T, w: T) {
    v.push(u); // doesn't work yet due to https://github.com/rust-lang/rust/issues/36887#issuecomment-296787518 being unresolved
    v.push::<AbortOnOom>(u); // the above line should work just like this one
    v.push::<OomResult>(w); // warning: unused `std::result::Result` which must be used
}

jethrogb · August 3, 2017, 6:46pm

I think the server aspect of this discussion is under-appreciated. Note that none of the people you interviewed are working on servers. Embedded is not quite the same.

I think there are at least three different types of servers with different memory uses:

Request-based (think web application): many concurrent requests with individually low memory use. Technically you might be able to isolate the effects of memory allocation failure between individual requests.
Classic RDBMS: a long-running process that uses as much memory as possible. A process should never die.
Big Data: a (relatively) short-running process that uses as much memory as possible. A process abort can be handled gracefully but should be avoided as it increases overall computation time.

I can put you in touch with a Spark developer if you’re interested.

lilith · August 3, 2017, 7:03pm

The pushback on fallible allocations is incredibly bizarre to me.

Graceful handling is the status quo with C libraries, and Rust is supposed to be a systems programming language with a high level of control over memory. I was shocked to find Rust lacking in this when I first started using it full-time, and a year and a half later the same flawed arguments against proper allocation keep returning.

I’m questioning what Rust is even for, now.

joshlf · August 3, 2017, 7:08pm

There’s been some discussion of isolating per-request (or per-work unit) failures by having allocation panic (via unwinding) and threads catch the unwind. We should keep in mind that this only works for the thread-per-task model, but not for an event loop model (ie, used in async IO) with worker threads. For that, you actually need normal fallible allocation that returns an error when allocation can’t be performed.

Nemo157 · August 4, 2017, 9:59am

Depends on how expensive panic isolation is. Potentially something like tokio could wrap every single invocation into a Task and fail that Task as a whole if it panics. If you have something like a web server then it could isolate the request processing into a separate Task from the overall request handling which would allow returning a 500 on allocation failure during the actual processing stage.

jethrogb · August 4, 2017, 3:47pm

Unwinding is not a panacea. Depending on unwinding for continuity makes it so that you can’t use std::sync::{Mutex, Once, RwLock} anywhere in your code because of std::sync::PoisonError.

Gankra · August 14, 2017, 8:22pm

Can you elaborate on the problems? It seems like you just tack .catch_unwind() onto your futures and call it a day? (outside my area of expertise by a mile)

joshlf · August 15, 2017, 5:08am

Can you elaborate on the problems? It seems like you just tack .catch_unwind() onto your futures and call it a day? (outside my area of expertise by a mile)

Sure. Two issues that I see: performance and unwinding. I benchmarked it, and the performance overhead of catch_unwind isn't bad, but it's definitely worse than just matching on a Result (I forget the exact numbers - this was about a week ago). Also, this assumes that the panic strategy is unwind - it doesn't work for panic = abort.

Gankra · August 15, 2017, 1:51pm

How is this different than using threads? I don’t see what’s unique about futures here.

joshlf · August 15, 2017, 3:55pm

I suppose that the panic = abort vs panic = unwind issue isn’t different - you’re right. For the performance issue, I’m thinking that in the threaded model, you have some outer loop that loops acquiring more work to do and then doing it, and the catch_unwind is called in a parent function of that loop, so the check is only performed either when the thread is quitting normally or when there’s been an unwinding panic. On the other hand, since in an event loop-based system, the unit of work that you’d want to restart on a panic is a single future/async execution, you’d need to catch at that granularity, and thus you’d need to do catch_panic around each of those executions, checking whether you were panicking each time you were about to switch to do a different unit of work.

Gankra · August 18, 2017, 3:36am

Based on the discussion and some others I’ve had elsewhere I’ve posted the following RFC: https://github.com/rust-lang/rfcs/pull/2116

It ain’t perfect, but to be blunt I’m exhausted and this problem is awful.

RalfJung · August 23, 2017, 12:32pm

Could you elaborate? Poisoning actually helps dealing with unwinding; if it wasn't for poisoning, unwinding would be much more likely to introduce subtle bugs into programs. So it actually seems to me like especially when you do unwinding should you use concurrency primitives that do poisoning.

If you don't do unwinding, things will never be poisoned anyway.

jethrogb · August 23, 2017, 5:30pm

I’ve responded to your question on GitHub to keep the discussion in one place.

system · March 25, 2019, 8:28am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Feedback from adoption of fallible allocations	17	3544	August 4, 2021
Combining the allocator and storages APIs libs	17	1845	August 17, 2021
Subteam reports 2016-03-25 announcements	1	1403	March 25, 2019
Gauging Interest for an Allocator Trait Working Group libs	23	2325	July 31, 2019
[Pre-RFC] Allocators, take II ideas (deprecated)	10	4050	September 17, 2014

Notes + Interviews on Fallible Collection Allocations

Related topics