I just think it's unfair for a data structure that doesn't have droppable elements to run destructors for objects that belonged to completely different structures. Is this something you agree with?
Kinda, but not really. It's basically the premise of crossbeam, and users have already accepted an expensive set of deallocations on pin. Same with Arc - you accept that maybe you trigger a huge drop set. I don't think the solution is to give up and do something more expensive. The epoch GC should definitely be incremental, drops should probably be considered differently from plain frees, and per-datastructure or per-type bags should be considered. But I really don't think that a refcount on data reference is a good forced solution.
This is true. Pinning requires a SeqCst fence and refcounting requires AcqRel fences. A simple way to eliminate subsequent SeqCst fences incurred by pinning is by creating a pin yourself beforehand (remember that pinning is reentrant).
On Intel, all RMW operations act as a SeqCst fence and on Arm, AcqRel results in the same fences as SeqCst for RMW operations and power isn't much better. Even on aarch64, I suspect that the 'optimized' ll/sc operations carry the same fence cost on most implementations. Some very weak results I have support this. Ultimately though, an atomic refcount increment and decrement doesn't come cheap unless the cache line is already held in an exclusive state, in which case it's just expensive.
Can you wait for one week, starting from today, until I finish my skiplist? Iterators are based on refcounting and the overhead is really small.
Absolutely! I would be very happy if empirically the cost wasn't very big! With anything linked you have to wait for loads to complete anyways so maybe it doesn't matter too much. I'm very glad to see that a good skiplist is almost done since I need one for transactional datastructures and don't want to rewrite one.
Care to share them?
For latency control - my thoughts are essentially that if you accept the fact that you may enter a GC run and do a lot of deallocations you've already given up on any sort of tight latency. With that in mind, I think it's more fruitful to think about controlling when threads can/can't GC, how much time they can GC for, how many elements, etc. Segregated drop/non-drop lists could go a long ways, especially in combination with per-datastructure drop bags.
Memory usage control: Similar situation to above. You have to accept that in a concurrent situation, you can get exact control over how much memory is used. Refcounting can free memory more quickly, but still can't free you from a stuck thread. With per-datastructure epochs or multiple distinct EBR groups a structure can register to a lot could be done to prevent a stuck thread in some random location from blocking memory reclamation.
Maybe a better plan could be to find a scheme that would let an eager destruction scheme work on top of crossbeam without preventing one from submitting destructors to the garbage bags. While exposing this to users would increase complexity, maybe it could be hidden by default and only users that can sacrifice throughput for tighter drop latency can use the option. Sounds tricky though.