Rc is unsafe [mostly on 32-bit targets] due to overflow

@Gankra: good point, we only need saturating arith when incrementing the counter: once the counter reaches the maximum, it’s impossible for safe code to bring it back down to zero. This means the decrementing code can stay unchanged (no need to special-case the “counter is saturated” case).

This also suggests a cheap fix for Arc: If I see it correctly, it would be sufficient to replace

self.inner().strong.fetch_add(1, Relaxed);

with

if self.inner().strong.fetch_add(1, Relaxed) >= isize::MAX as usize {
     // poor man's racy saturating counter
     self.inner().strong.store(isize::MAX as usize, SeqCst);
     // sequential consistency is overkill,
     // but I don't trust myself to pick the correct ordering
}

Since Rc takes more than 2 bytes of memory, keeping the counter saturated around isize::MAX should be as safe as saturating at usize::MAX. While there is a race that allows the counter to temporarily exceed isize::MAX, this should be unproblematic: we can’t hit an overflow because that would require usize::MAX - isize::MAX threads, which can’t fit into memory. Losing some increment calls to race is harmless, they don’t have any effect on a saturated counter anyways. Losing some decrement calls to the race is also harmless, because we know the count can never reach zero again.

As for overhead, at least on x86 the comparison could use the sign flag that already gets set by the lock xadd instruction, so something similar to this change would only add a single assembler instruction to the hot path. There’s a good chance this costs zero cycles if branch prediction works properly.

4 Likes

Exactly my suggestion. Also see http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ for a thorough discussion of load/store atomicity - in short, x86 32bit-systems usually don’t reorder instructions anyway.

Saturating math seems a bad solution due to the performance implications (the cost of it and the fact that it might prevent LLVM from eliding reference count operations).

This seems a very good reason to make mem::forget unsafe again.

Also in general I think that leaking Rc cycles cannot be disallowed by the language, but the API should reserve the right (and should do so in debug mode) to output an error on task (for Rc) or program (for Arc) termination and thus correct programs should not leak memory (this is easily done by keeping a list of all rc boxes and checking that it’s empty at termination).

Note that we don’t need saturating math on 64 bit architectures, only on 32 bit (ARM and x86). For the former, triggering unsafety would take a looong time while the latter makes it simple enough using a branch on the sign bit to implement a workaround to avoid it.

It occurred to me that the fundamental problem with Rc isn’t cycles, per se, it’s that it provides shared ownership in a system heavily designed for a single-owner multiple-borrow model.

While we might be able to use this to design a safer yet more performant approximation to shared ownership, I suspect there isn’t any way to use this to fix the current problem at hand :confused:

I’m finding these proclamations that this means mem::forget should be unsafe quite frankly crazy. If you can do it in safe code, it’s safe. Full stop. There’s no “oh well this is kinda of nasty, and it’s hard to do in safe code…”. If Rc can make cycles, then mem::forget is safe. These have been the semantics for over a year (though many of us forgot them because mem::forget was marked unsafe erroneously), and are the semantics we have stabilized on in 1.0.

And it’s all for a theoretical perf hit in one method that no one has benched? (thanks to @dgrunwald for bringing at least some numbers into this). Also this is very easy to fix now (do a checked or saturating add) with an eye to fixing any perf fallout later.

4 Likes

FWIW many people are of the opinion that footgunny things should be unsafe too, but that's debatable and should be decided elsewhere (I believe the current consensus is that unsafe means unsafe and nothing else)

Yeah, I find the saturating add method to be the best way to fix this now.

I could have misunderstood something, but isn't the key difference here that forgetting by Rc cycles actually leaks memory (in the sense that unreachable memory has not been free'd), but mem::forget doesn't?

I think @pcwalton is right that it seems too likely that it is possible by some means with the declared safe abstractions in std to completely free memory without running Drop for this to be a safety commitment that Rust makes, but I also think you're rather badly misrepresenting the other side of this.

Leak amplification is a thing you have to deal with when you start mixing mem::forget and collections. In particular the ever tricky Vec::drain guarded against forget by marking all of its memory uninitialized to start, so that if you leak it, all the happens is the dtors get leaked. As such it’s quite trivial to blow up an allocating leak into a much-less-allocating leak.

e.g.

let mut vec = vec![];
for i in 0..1000000 { vec.push(make_thing_to_leak()); }
forget_that_wastes_memory(vec.drain(..)); 
// wastes one Vec::Drain worth of memory, but we leaked 1000000 thing_to_leaks.

Note that you can re-use the same Vec over and over since it’s not freeing or leaking its actual buffer.

1 Like

But if it wastes even 1 byte of memory, an OOM error will occur rather than Rc overflowing using the code in the first post. Doesn't your code still do that?

1 Like

You can e.g. destroy thousands of Rc-s each time leaking a few bytes of memory.

Seems like its efficient collections, efficient Rc, memory safety - pick two.

That only holds if one byte per Rc is wasted. The code I provided wastes about 1 byte per 100000 Rc’s. This would overflow the Rc count after only ~40k of leaked memory (on 32-bit).

Here’s a complete version of @Gankra’s code that demonstrates it concretely:

#![feature(collections_drain, alloc)]
use std::rc::{self, Rc};
use std::cell::RefCell;

fn forget<T>(x: T) {
    struct Cycle<T> {
        _x: T,
        y: Option<Rc<RefCell<Cycle<T>>>>
    }
    let x = Rc::new(RefCell::new(Cycle {
        _x: x,
        y: None
    }));
    let y = x.clone();
    x.borrow_mut().y = Some(y);
}

const N: usize = 100_000;
const M: usize = 100_000;
fn main() {
    let ptr = Rc::new(());
    for _ in 0..N {
        let mut vec = vec![ptr.clone(); M];
        forget(vec.drain(..));
    }
    println!("{}", rc::strong_count(&ptr))
}

Output when running under GNU time:

10000000001
25.78user 13.45system 0:39.20elapsed 100%CPU (0avgtext+0avgdata 11316maxresident)k
0inputs+0outputs (0major+15545136minor)pagefaults 0swaps

i.e. it took ~11MB of memory to well and truly overflow a 32-bit integer (that’s 10 billion and one).

3 Likes

And it's all for a theoretical perf hit in one method that no one has benched?

I'd really like to benchmark it but I'd need to find the time to get my old broken atom netbook to work, and put a rustc on it (which unfortunately I cannot do during my commute or during a pause at work...)

OK, that clearly shows that the assumption made in Rc is wrong: available memory simply doesn’t limit the number of non-destructed instances. (no matter whether mem::forget is safe or not)

But the idea for an efficient fix to Rc seems to rely on a similar assumption: the available memory limits the number of live Rc instances (“live” in the sense that their destructor can still be called later). This is a necessary assumption if we want to use “a saturated counter can never again reach zero” to keep the decrement operation unchanged. But is it true? @stebalien’s “wider than memory” store (a type that allows moving arbitrary rust structs to disk and back) would violate this assumption.

With Rc, we would just need a test in the destructor to avoid decrementing the count if its usize::MAX. With Arc, avoid decrementing once the counter reached the outer limits.

So we still have a slight tradeoff between the performance of Rc/Arc and the capabilities of safe Rust ("ban wider-than-memory stores from working with arbitrary types like Arc"). In the interest of not adding completely arcane rules to the “how to write unsafe code” list, we should probably take the performance hit on decrementing.

Could you please elaborate on why saturated arithmetics (plus panics on reaching max) requires the number of Rc instances to be bounded by available memory? Do you mean Rc clones or disjoint Rc instances?

Also the decrement operation needs to have the same change as the increment: If the loaded refcount has bit 31 set (note we’re talking about 32 bit systems exclusively), panic and be done.

I’m not sure if panicking is a good idea – it can cause quite a bit of code bloat (esp. if it’s the only source of panics in a function that has landing pads). It seems simpler to just keep going and to ensure memory safety by never deallocating the RcBox after the counter has reached the max. value.

I was talking about the case where incrementing is changed to saturating arithmetic, and decrementing is left unchanged. This would be cheaper than also changing decrementing, but is only safe if the number of “live” Rc values (clones) is bounded by available memory: because we can only decrement once per live Rc, the counter can never reach zero again, so there’s no need to keep the reference count stuck at the maximum value.

I’m assuming this is what @Gankra meant with “Yes, saturating arith would be safe […] To decrement refcount by k still requires k Rcs to exist simultaneously” – use saturating arith for incrementing, don’t panic, don’t change decrementing.

But if wider-than-memory stores of Rc are allowed, that argument falls apart and we’ll need to avoid decrementing when the counter is saturated.

I’m not familiar with the details of a wider-than-memory store. How does that work? Worth noting that Rc’s definitely can’t be validly serialized to disk or anything.

Wider-than-memory is incorrect; I should have said wider-than-address-space. Basically, memory includes all system memory including disk. While you can’t address more than 2^32 bytes on a 32 bit system, you can certainly store more than 2^32 bytes. So the problem is that physical memory can be (and usually is in 32bit systems) larger than addressable memory.

To implement a wider-than-address-space vector, you would simply allocate on-disk directly. To access an element, you just mmap the page containing that element. There’s no serialization here because you’re treating disk as memory.

If the mmapped page is interpreted as a vanilla Rc<T>, then the contents will still be a 32-bit pointer directly into the current process’s address space, which is unlikely to do what you want in any case. (In particular, there is usually no guarantee of reobtaining the same absolute chunk of address space, so it’s preferable to have a special pointer type that’s encoded as an offset from the address of the pointer itself.) I’m a fan of mmapped data structures, but you’d have to be pretty crazy to try to reuse existing container types in such a way that you’d run into an issue with this.