So I was thinking about performance costs, particularly for Arc, and I was wondering if this is a good idea:
The idea is not to use a single sentinel value for “wrapped around” but rather an entire range. The motivation is that then we don’t need to use a compare-and-swap but rather we can use an atomic increment and just check whether the previous value was in that range of values that are deemed as “too high”. In this particular code, I used an isize and said that the ref count should never go negative (actually, we should never inc the ref when the previous value was already negative). Seems like 2^31 (or 2^63) is high enough, but you could use a usize and just set a threshold wherever you like. Anyway, the idea is that even if we have a ton of CPUs executing in a very unfortunate scheduling, at least one of them is going to observe the increment start from an invalid state.
I haven’t thought super hard about this so possibly there is a very obvious flaw. Certainly the performance is very good when I measure it locally (no difference from a straight addition, basically).
I chose to have the code just abort the process. I generally am against aborting (vs panicing) but:
- this seems like a distributed invariant across many threads, so panicing one thread isn’t clearly enough
- aborting is cheaper because no need for landing pads (right?)
It is probably possible to adapt the setup to leak the value instead, e.g. by setting some boolean flag to true (“count may have wrapped”).