@Gankro: good point, we only need saturating arith when incrementing the counter: once the counter reaches the maximum, it’s impossible for safe code to bring it back down to zero. This means the decrementing code can stay unchanged (no need to special-case the “counter is saturated” case).
This also suggests a cheap fix for Arc: If I see it correctly, it would be sufficient to replace
self.inner().strong.fetch_add(1, Relaxed);
with
if self.inner().strong.fetch_add(1, Relaxed) >= isize::MAX as usize {
// poor man's racy saturating counter
self.inner().strong.store(isize::MAX as usize, SeqCst);
// sequential consistency is overkill,
// but I don't trust myself to pick the correct ordering
}
Since Rc takes more than 2 bytes of memory, keeping the counter saturated around isize::MAX should be as safe as saturating at usize::MAX. While there is a race that allows the counter to temporarily exceed isize::MAX, this should be unproblematic: we can’t hit an overflow because that would require usize::MAX - isize::MAX threads, which can’t fit into memory.
Losing some increment calls to race is harmless, they don’t have any effect on a saturated counter anyways. Losing some decrement calls to the race is also harmless, because we know the count can never reach zero again.
As for overhead, at least on x86 the comparison could use the sign flag that already gets set by the lock xadd instruction, so something similar to this change would only add a single assembler instruction to the hot path. There’s a good chance this costs zero cycles if branch prediction works properly.