Rc is unsafe [mostly on 32-bit targets] due to overflow


#1

The reference counts in RcBox are of type usize, and creating a new Rc increments it with regular addition. In optimized code, where overflow checks are removed, it is possible to overflow this count, allowing a use-after-free.

In the old world where values supposedly could not be forgotten (without running the destructor) without actually leaking memory, it would be impossible for this to wrap around: doing so requires 2^X Rcs where X is the pointer width, ergo sizeof(Rc) * 2^X bytes, which obviously cannot be allocated in a X-bit address space, so allocations would start failing before the count overflowed.

But now we have safe mem::forget, and can do this:

use std::rc::Rc;
fn main() {
    let val = 123;
    let ref1: Rc<&u32> = Rc::new(&val);
    {
        let _ref2 = ref1.clone();

        for i in 0u64..4294967295 {
            if i % 10000000 == 0 {
                println!("{}", i);
            }
            std::mem::forget(ref1.clone());
        }
    } // drop ref2
    // attempt to reallocate the memory used for the &u32 as an integer
    let _reuse_plz: Vec<Rc<u32>> = (100u32..1000).map(|x| Rc::new(x)).collect();
    println!("{}", *ref1); // segfault!
}

As written, this only works on 32-bit targets: even with -C lto -O, the conditional print seems to force LLVM to actually count one by one (as opposed to by ten millions), which takes about 30 seconds on my VM. I estimate that counting to 2^64 at that rate would take about 3,800 years. If you remove the print, LLVM realizes it can skip the loop and segfaults immediately on both 32-bit and (changing the iteration count) 64-bit targets.

Since it takes so long to actually count to 2^64, on 64-bit targets this issue can only be exploited by code actively trying to trigger the LLVM optimization, as opposed to a user exploiting a buggy Rust application. In the 32-bit case, while the chance is slim, I think it’s not outside of the bounds of possibility that someone would write this pattern into their application by mistake, allowing it to be exploited (in much longer than 30 seconds though).


#2

Nice catch, although this probably could have just been an issue in Github.


#3

One option that seems acceptable to me is using saturating add and never decrementing if count == usize::MAX.


#4

Adding an overflow check to Rc::clone() will cause quite a bit of code bloat and might prevent LLVM from optimizing reference counts :frowning:

I assume this also affects Arc? Safely preventing the overflow / doing a saturating add would mean we’d need a CAS loop instead of an atomic add…

This might have non-trivial performance costs. Now I’m wishing that I hadn’t advocated a safe mem::forget…


#5

Just for the record, my suggestion was to make the reference counts u64 rather than usize, and using an asm! trick like black_box() does to prevent LLVM from coalescing multiple increases. In theory this should fix the problem with no/minimal performance impact on 64-bit.


#6

@comex: that’s a good idea for Rc (still sucks for 32-bit targets though).

But Arc is more difficult: I don’t think all 32-bit platforms support 64-bit atomics. Basic on a quick non-scientific test, a CAS loop is going to slow down Arc::clone() by a factor of 2…


#7

For the record: on ARM/ARM64, atomic adds can’t be done without a loop anyway, so among popular targets it’s only 32-bit x86 that would need a loop but doesn’t today. Which is a very important one, of course.


#8

I think having overflow checks on should be OK for Debug targets, but with release we should try for performance.

FWIW, the bug seems hard to trigger by accident.


#9

Hard to trigger by accident or not, memory unsafety using only safe code and the standard library is unacceptable. u64 overflow OTOH is effectively impossible (I’ve heard Google once overflowed a 64 bit integer, but that counter was certainly incremented in steps larger than one and/or shared among a bazillion servers). For the forseeable future, a machine overflowing a u64 would run so long that (a) we’d all be dead and (b) it’s much more likely to get an invalid pointer via cosmic rays long before the counter overflows.

That said, how to do the check efficiently would be an interesting puzzle. checked_add(1).unwrap() may be faster (certainly for decrementing, because that part doesn’t need to change) but it does mean crashing the thread rather than just leaking. I actually like that, because 4 billion references to something is certainly a bug. A more serious downside is that it won’t work for Arc because you can’t take down the other threads with you.

All things considered, just using a u64 on 32 bit platforms is probably good :+1: and any of the other solutions will be fine too. std::rc::Rc already isn’t the slimmest of smart pointers (it has machinery for supporting weak pointers, for example), it easily extends to Arc (just not as cheaply). The performance of refcount operations isn’t that important anyway, because unlike in many dynamic language runtimes and even carelessly written C++, a lot of code just passes borrowerd pointers around or moves the Rcs.


#10

For Arc you could set an upper limit of 2^31 for the ref-count, and then just do a separate check before or after updating the reference count. As long as you have hardware concurrency of less than 2^31 then it can’t possibly overflow.

Assuming LLVM code-gen is decent, it would even be able to compress the two atomic operations into one, since most architectures support a “fetch-and-add” atomic instruction.


#11

We could protect against this perhaps by using overflow to our advantage. Use a pair of integers, one is solely for incrementing and the other is solely for decrementing. When their difference reaches zero, deallocate.

(Not sure how well this would work)


#12

That would require 64 bits, and cache effects would probably cancel out any benefit over 64bit. Also we could still overflow.

So in a word: no.

What would work: Carve of a range off the 32 bit space that for arbitrary reasons we’ll call the outer limits. Let’s say everything from 0x80000000 (that is, bit 31 set). Add a check to .clone() and .drop() that panics if this range is reached. Note that this would just require a branch on negative on x86 that we could mark unlikely (see the RFC PR #1131.


#13

Maybe roll back the safe mem::forget?


#14

One can still create a safe mem::forget via Rc cycles, but I guess that case won’t be optimized so well (and is contrived anyway).

I and a couple of other had mentioned that we could keep forget unsafe in a backcompat manner, but I think the consensus was that safe functions should not be marked unsafe because they are a footgun.

I hope that decision is reevaluated. Though perhaps directly fixing the overflow issue should work too.


#15

It’s interesting that the problem isn’t safe mem::forget specifically, but a safe mem::forget that doesn’t itself allocate (keep the forgotten object around somewhere in memory). The assumption implicitly relied on here is not that you can’t safely forget, but that the amount of times you can do so is constrained by the available memory to store the forgotten objects. Notably, it’s not clear that either of the main alternative proposals would’ve prevented this: both of which were concerned with narrowing, but not eliminating, the class of types which may be safely leaked. (I.e., I’m not sure whether or not they explicitly had a safe forget<T: Leak> resp. forget<T: 'static> as part of their text, but it would be consistent with their designs, and would give rise to the same issue.)

So one weird but straightforward solution would be to actually just make the safe mem::forget move the forgotten object to the heap (and perhaps add a separate unsafe variation on forget which, like the current one, doesn’t). This is pretty weird and narrowly tailored to closing this particular gap in the soundness of Rc, but we already do a lot of things to accomodate Rc, and pragmatically, it might be preferable to put the burden of suboptimal performance not on Rc but on forget (which no one is really meant to use anyways - it’s just there for display as deterrent against assuming its impossibility). I suppose the language spec would then also change accordingly from “destructors are not guaranteed to run” to “objects are not guaranteed to be deallocated”, or something… (Speaking of which - would it also be possible to trigger this overflow via the destructors-not-run-while-panicking avenue?)

In case it’s not obvious, I’m not sure what I think of this solution.


#16

Basically, the new guarantee would be that rust will never deallocate something (make it unaddressable) before running its destructor (except when the program exits).


#17

Yes, the ideal solution is to introduce a new guarantee: safe rust code will either run the destructor or leak the memory when “forgetting” a value, so that the number of “live” values always fits into usize (except when sizeof(Value) == 0).

But mem::forget is being used by unsafe code that needs to avoid Drop being called (e.g. because ownership of the underlying resource is moved from Rust to C code). Just last week I deleted the unsafe block around one of those calls, since it was producing the “unnecessary unsafe block” warning.

@glaebhoerl: I don’t think silently breaking my code by introducing an artificial memory leak in mem::forget is the solution here – it’s strictly worse than breaking my code by marking mem::forget as unsafe and making me add back the unsafe block.

In fact, grepping for mem::forget in vec.rs is quite instructive in seeing the valid use-cases of mem::forget:

  • Forgetting self in order to suppress the Drop impl and transferring ownership (e.g. Vec::into_iter())
  • Forgetting a value after making an unsafe copy using ptr::read
  • Forgetting zero-sized values in Vec::push, since they can be re-created from thin air later.

We’d need to formulate a guarantee that types like Rc can rely on while still allowing all of those uses in unsafe code.

So I see two main solutions here:

  1. Introduce the new guarantee, and make mem::forget unsafe. This is a breaking change, but how much code would really break? I’d expect most callers to still have the unsafe block around mem::forget calls, since it hasn’t been safe for all that long. On the other hand, a new guarantee means it’s one more thing to check when reviewing unsafe code. Writing unsafe Rust code is already more difficult than writing C code due to the number of (often poorly-documented) invariants you need to uphold. How much more do we want to add to this burden?

  2. Make Rc check for overflows. The expected performance cost on 64-bit platforms is extremely low / often zero – we just have to inhibit some optimizations to prevent overflows from being possible. On 32-bit there’s going to be a real cost, though probably not as bad as I initially thought.

If this were pre-1.0, I’d prefer option 1. Not sure now… this is a safety hole, and we’ve reserved the right to make breaking changes in these cases… but on the other hand, a fix without breaking changes should be the preferred solution in post-1.0 world. I guess we need more data to decide properly: a) How much code is going to break if we make mem::forget unsafe again? b) How big is the impact of overflow checks, both on performance and code-size?


#18

How fascinating. And annoying.

I’m not sure that the right solution in retrospect would have been to keep forget unsafe, because I’m not sure that someone wouldn’t have just discovered some other way to leak without allocating. In other words saying that the safety of Rust depends on not running a value’s destructor implying reserving at least one byte of address space in perpetuity seems awfully shaky to me.

I think checking for overflow is probably a reasonable solution, at least for now. (Maybe you can use saturating arithmetic safely? Hmm.)


#19

Good point. All you have to do is make something unaddressable so a custom “wider than memory” (stores data on disk) vec could do it.


#20

Yes, saturating arith would be safe. All mem::forget does is artificially inflate ref count. Bad stuff only happens if you manage to artificially decrease ref count (which overflow does). To decrement refcount by k still requires k Rcs to exist simultaneously, which this code was already assuming was impossible for k = usize::max.