Having Rc<T> point into the current processes address space is exactly what I want. The object of type T would be stored on the heap but the Rcs pointing to it would be stored on disk (in the wide vec). This can be useful where you have a couple of frequently accessed objects stored on the heap and a bunch of small objects that want to keep references to these objects (stored on disk).
I’m not suggesting re-using vec as-is, just implementing a vector like container. That wouldn’t be possible. (also, you’d have to be pretty crazy to forget 2^32 Rcs…).
Is it acceptable to just treat storing things in memmapped space as unsafe?
It certainly seems frought with danger. This would allow Rc to not worry
about this.
So the new leak rule would be: every leaked resource must consume at least one address? What about the leak amplification you noted?
Anyways, this wouldn’t spell the end for safe memmaped storage, it just means that all objects stored in an memmapped file must be addressable. Unfortunately, this also means that memmaped files can never be unmaped because they might contain leaked resources (which must remain addressable).
In debug code, it should be a no-brainer – as was noted before, using up 31 bits of refcount is bad news, so we might as well give the coder the headlines. In release / bench code, I'll concede that just leaking is probably good enough.
This hinges on the idea of somehow keeping LLVM from munging all our increments together. Has this already been implemented?
@Gankra I’m trying to decide how fundamental this problem is – this “amplification effect” that you achieved with drain, is that specific to the drain API or do you think you could achieve it with other (stable) vec APIs?
As far as I recall, it’s currently fairly unique to Vec::drain in that
it:
moves data out of the collection without consuming the collection
via a proxy value
and the leak doesn’t itself leak the collection’s allocation
However there may be other cases where a collection chooses to handle
unwinding-safety like this: mark the collection as empty temporarily so
that a panic just leaks all the elements.
Wrong, Vec::drain dates back to at least RFC 509 in December 2014 (I think the idea is even older). The implementation has changed to not be unsafe when the drain iterator is leaked, but the API didn’t. The drain iterator’s destructor would free the remaining values in the vector, so even if leaking was unsafe, leaking the drain iterator would also leak the remaining values in the vector — only it wouldn’t free the objects that were moved either, so it was completely unsafe in addition to amplifying leaks.
Actually Vec::drain was designed unaware of leaks in possible in safe code (remember, those were always possible!) and that the API works nevertheless, although with a slightly different implementation, is a happy accident discovered during the leakpocalypse. The API dates back to a happier, “leak-free” time.
Specifically, I was trying to understand whether it would be possible to deprecate safe forget and add the condition that it must consume memory. It seems awfully subtle though and I second the general concern that we're forgetting things (no pun intended) that could permit leak amplification in other cases.
So I was thinking about performance costs, particularly for Arc, and I was wondering if this is a good idea:
The idea is not to use a single sentinel value for “wrapped around” but rather an entire range. The motivation is that then we don’t need to use a compare-and-swap but rather we can use an atomic increment and just check whether the previous value was in that range of values that are deemed as “too high”. In this particular code, I used an isize and said that the ref count should never go negative (actually, we should never inc the ref when the previous value was already negative). Seems like 2^31 (or 2^63) is high enough, but you could use a usize and just set a threshold wherever you like. Anyway, the idea is that even if we have a ton of CPUs executing in a very unfortunate scheduling, at least one of them is going to observe the increment start from an invalid state.
I haven’t thought super hard about this so possibly there is a very obvious flaw. Certainly the performance is very good when I measure it locally (no difference from a straight addition, basically).
I chose to have the code just abort the process. I generally am against aborting (vs panicing) but:
this seems like a distributed invariant across many threads, so panicing one thread isn’t clearly enough
aborting is cheaper because no need for landing pads (right?)
It is probably possible to adapt the setup to leak the value instead, e.g. by setting some boolean flag to true (“count may have wrapped”).
To clarify something: the reason to use a range, rather than just checking if you added one to USIZE_MAX, is to eliminate the “window of unsafety”. That is, you will abort well before the counter would ever actually overflow. I think you would need 2^31 processors to do otherwise…? In any case, it’s seems vanishingly unlikely (though of course I’d love for someone to point out the flaw in my thinking if there is one).
Ah, another case of leak amplification I just recalled: My hypothetical Mox type would leak its allocation if it ever was dropped, but it’s intended to be used with a pool allocator which can easily be cleared to “reclaim” all the memory. Generally local allocators are ripe for introducing massive leak amplification sites with bounded memory consumption.
Here are some benchmarking results for a simple case of spawning a bunch of threads, each of which do increments. (There is more of a slowdown that I saw at first, but still relatively small, certainly compared to CAS.)
“cas” is using compare-and-swap, “inc” is just a plain increment, and “check” is the “danger zone” variant I described.
UPDATE: @alexcrichton saw much better results (check and inc virtually identical) on his machine, so I imagine this will vary by machine somewhat.
Given that the situation occurs due to leaking references, leaking the value seems much more intuitive than an abort to me. One way to do this without introducing an extra flag might be to have both inc and dec set the count to the middle of the negative range (-1073741824 for 32-bit) if the previous value was negative. This gives headroom of 2^30, which is plenty.
The real limitation is address space: even if one had 2^31 (or 2^30) processors, there wouldn't be enough address space for them to all have a distinct Arc on which to operate.
On other note, would this problem also apply to RefCell? It seems that one could get a shared borrow, take and forget enough additional ones to overflow the count, and then take a mutable borrow, yielding both a shared and a mutable borrow to the same memory. What about RWLock?