Here are my notes on adding fallible allocation to Rust’s collection types, including several interviews with stakeholders or people with experience in the problem space.
This document doesn’t propose a concrete solution, but instead lays out the options and my notes on them. I have a pretty strong bias going in, and it was reinforced by the interviews.
I’ll be prepping an RFC this week based on this feedback and the discussion here.
Background
Methods like push
on Vec don’t provide any way for the caller to handle OOM,
and this is problematic for our several users. Notably gecko devs will be forking
our collections to add this functionality for Firefox 57 (November 2017).
As such we should fast track a design so, at very least, their fork matches
what std ends up doing eventually.
Note: I will be usually be using push
as a stand-in for "all allocating APIs"
for simplicity here.
-
Today we always abort (call
Alloc::oom()
)- Unless computing the allocation size overflows isize::MAX (yes isize), then it panics
- Both of these are in the implementation of Vec, and not done by the allocator
- Unless computing the allocation size overflows isize::MAX (yes isize), then it panics
-
Moving to unwinding on OOM is problematic
- unsafe code may be exception-unsafe on the basis of current strategy
- almost everyone tells me that C++'s version of this (
bad_alloc
) is bad and disliked - libunwind allocates, making panic on OOM sketchy at best
-
Alloc::oom
says it can panic, so local or custom global allocators might unwind…?- Seems sketchy… should we update docs to forbid this? Was this discussed in the RFCs?
- pnkfelix and sfackler seem to think it was only intended for local allocators
- global allocators aren’t currently distinguished by the Alloc trait
- this + catch_panic is a “simple” way to get fallible allocation out of the existing APIs
- still unstable APIs anyway, so won’t help gecko devs for 57
- unwinding isn’t always supported by our stake holders (no unwinding in gecko)
For all considered proposals, I’ll be using a strawman error type, which I’ve only thought about a little:
enum AllocFailure { Exhausted, CapacityOverflow }
Note that the allocator API defines:
pub enum AllocErr {
Exhausted { request: Layout },
Unsupported { details: &'static str },
}
- I don’t think
Unsupported
should be exposed (it will instead be folded into Exhausted). - I don’t think
Layout
should be exposed (it’s an implementation detail).
Most consumers will probably just do:
foo.try_push(x)?; // evaporate details
// or
if foo.try_push(x).is_err() { /* do something */ }
But some might be interested in reproducing Vec’s behaviour (including Vec itself):
match foo.try_push(x) {
Exhausted => Allocator::oom(),
CapacityOverflow => panic!("capacity overflow"),
}
Felix notes Exhausted having requestedBytes: usize
might be useful for debugging crashes – was it “real” oom or did we try to allocate something massive?
Major contenders
-
Types to distinguish fallibility
-
FallibleVec<T>
, replacespush(T)
withpush(T) -> Result<(), (T, AllocFailure)>
- doesn’t support generic use of Vec/FallibleVec
- hard to do mixed usage of fallible and non-fallible
- or at least, outside allocating code, fallibility loses relevance
-
Vec<T, F: Fallibility=Infallible>
, makespush(T) -> F::Result<(), T>
- requires generic associated types (stable late 2018, optimistically)
- probably requires type defaults to be improved?
- works with generics, but makes all of our signatures in rustdoc hellish
- maybe needs “rustdoc lies and applies defaults” feature
-
-
Methods to distinguish fallibility
- Make mirrors of all methods –
try_push(T) -> Result<(), (T, AllocFailure)>
- works fine, but people aren’t happy about lots of methods
- Only add
try_reserve() -> Result<(), AllocFailure>
- minimal impact
- methods like extend/splice have unpredictable allocations
- doesn’t work with portability lints (see below)
- might be nice to have anyway?
- Add some methods, but ignore niche ones
- Weird, going to make people mad
- Make mirrors of all methods –
-
Middle ground: method to temporarily change type
- as_fallible(&'a mut self) -> FallibleVec<'a, T>
- can do it for one method:
vec.as_fallible().push(x)
- or for a whole scope:
let mut vec = vec.as_fallible()
- doesn’t enable generic use, weak for library interop
- can be built on method style
- note: this is different from type-based b/c a lifetime is involved
- can do it for one method:
- as_fallible(&'a mut self) -> FallibleVec<'a, T>
Possible augmentation: negative portability lints
In some sense “don’t use infallible allocation” is the same kind of constraint that kernel code has for “don’t use floats”. The latter is intended to be handled by negative portability lints, so we can do that too.
portability lints were spec’d here: https://github.com/rust-lang/rfcs/blob/master/text/1868-portability-lint.md
But the negative version (removing portability assumptions) was left as future work.
Strawman syntax – add maybe
as a cfg selector in analogy to ?Sized
:
// In liballoc
impl<T> Vec<T> {
// No need to mark push, implicitly #[cfg(std)] ?
fn push(elem: T) { ... }
// Say try_push doesn't infallibly allocate -- forces verification of body
#[cfg(maybe(infallible_allocation))]
fn try_push(elem: T) -> Result<(), AllocFailure> { ... }
}
// In your code
#![cfg(maybe(infallible_allocation))]
/* a bunch of functions/types that shouldn't use infallible allocation */
// or (equivalent)
#[cfg(maybe(infallible_alloction))]
mod allocation_sensitive_task;
// or (more granular)
#[cfg(maybe(infallible_allocation))]
fn process_task() {
/* will get warning if any function called isn't #[cfg(maybe(infallible_allocation))] */
}
Note this analysis is local, so if you call any undecorated function from a third-party library, you’ll get a warning. This is a bit annoying, but strictly correct insofar as longterm stability is concerned: they should publicly declare that they guarantee this. In this vein, adding a #[cfg(maybe)]
from a public item isn’t a breaking change, but removing one is.
This will also require a ton of careful stdlib decorating (don’t want to promise things we shouldn’t).
Interviews
I interviewed several people with industry experience in this problem, only some stakeholders in Rust providing this API (noted here).
Interview with Ehsan (Gecko dev; doesn’t use Rust for it):
Gecko has fallible allocation in its standard collection types. Distinction can be done at the type level or method level – there are factions that disagree on the right approach, and the issue doesn’t appear to be settled?
- Personally prefers methods
- Almost all allocations in gecko are infallible; crashing is simple and maintainable (especially with multi-process!)
- Will fallibly allocate for some key things to improve reliability. Notably when website can create allocation disproportionate in size to network traffic (image size is a few bytes).
- Doesn’t need to handle all fallible allocation in that region of code, or even on that buffer
- happy to crash if the going gets tough.
- In quick search of gecko, [^1] couldn’t find any actual mixed use
- Except a sketchy pattern [^2]
- Fallibility is a maintenance/safety hazard! Many untested branches.
- In a quick search of gecko, I found a few cases that are written in a confusing way
(last two points are why methods are preferred)
[^1]: https://searchfox.org/mozilla-central/search?q=%5B%5En%5Dfallible%5C)&case=false®exp=true&path==
[^2]:
// Fallibly reserve space
if (!aOutput.SetCapacity(aSource.Length(), fallible)) {
return false;
}
for (auto x : aSource) {
// Says fallible, but this is actually infallible; otherwise this is UB on OOM
*aOutput.AppendElement(fallible) = x;
}
In rust this would probably just be output.try_extend(source)?
, although FFI
might make you write code like above?
Interview with Whitequark (embedded dev; uses Rust for it):
Three lines of defense against the specter of allocation:
- First: statically allocate; much harder to mess up.
- Second: Crash on oom! Usually hard abort (need to know how to recover anyway), but sometimes unwind (some Cortex-M devs)
- unwinding isn’t commonly supported here, so unwinding won’t ever be a complete solution.
- Third: actually handle oom.
- fail at a task/request granularity
- all allocations for task are in a pool, so that on failure we free the whole pool; avoid fragmentation
- all allocations in this region of code are handled fallibly, no mixing strategies
Likes try_push, but wants #[deny(infallible_allocation)]
If we do a typed approach, would prefer something generic for library compat.
Fallible allocation is a last resort, and devs are willing to put in work to use it properly.
Interview with Swgillespie (CLR dev, works on the GC; doesn’t use Rust for it)
Need collections for state in GC traces, e.g. stacks in graph traversal. If allocation fails, can try to shrink the stack and retry. OOMing while trying to GC is a bug.
- Uses global allocator (new with std::nothrow)
- Would use
#[deny(infallible_allocations)]
- No preference on typed vs untyped.
- No need for being generic over fallibility (GCs are fairly concretely typed)
- No concern with interop with third-parties
- Lots of bugs from missing spots or failing to check results
Interview with nox (Servo dev; uses Rust for it)
Stylo needs it for Firefox 57, will be forking libstd collections until we provide these APIs.
Code like this which parses a list should be fallible: https://github.com/servo/servo/blob/de0ee6cebfcaad720cd3568b19d2992349c8825c/components/style_traits/values.rs#L251
Style sheet should just come out as “couldn’t parse”/“didn’t load” when this happens.
- Prefers methods to integrate into existing code where desired
- Moving to infallible likely to be incremental, as it’s a big job
- Controls all the relevant libraries
- Doesn’t care about generics
- Would like
#[deny(infallible_allocations)]
, not super important though