Reference-counting garbage collection can be quite efficient

We can already write

pub fn foo(x: Box<i32>) -> Box<i32> {
    let mut x = x;
    let r = &mut *x;
    *r += 1;

Yeah, I know, it's a bit more work, but it works and reuses the allocation.

1 Like

Or if you want to be succinct

pub fn foo(mut x: Box<i32>) -> Box<i32> {
    *x += 1;

It would help greatly if this were possible also for distinct types because that is something that is really not possible right now without unsafe:

pub fn foo(x: Box<i32>) -> Box<u32> {
    Box::new(*x as u32)

Isn't this the kind of thing that we could potentially get from LLVM, "for free"?

This would be a use case for something like Haskell's Coercible - which I also want for a bunch of other reasons.

I knew someone who was working on a Rust system using auto traits to implement something like that - a function that always monomorphizes to mem::transmute, but has trait bounds so it only compiles in cases where the conversion is known to be safe, starting with simple things like u32 to i32, but including much more sophisticated things like &Box<[u32]> to &&[[u8; 4]]. Unfortunately, I don't think she's still working on it, I don't have contact with her, and it doesn't seem to be anywhere online :frowning:

Not sure which person you mean, but I hacked together a proof-of-concept for something similar a while ago. There's no inference with this version, though: you have to build up the chain of conversions yourself.

(Note that the linked implementation doesn't get variance quite right, so I recommend against using it in its current form.)

(It also invokes undefined behavior left and right, since you're not allowed to e.g. transmute Vec<u32> to Vec<i32> directly even though they have the same representation.)

As I recall, the system I saw did do variance and inference right. It might have even been theoretically complete, but it had a few issues because it was making rustc figure out the chains through trait inference – which made it have long compile times and sometimes not manage to find that a moderately complex conversion was valid.

Your system for explicitly specifying the sub-conversions looks pretty neat. If I find some free time and energy, I might think about a way to include variance (and some other issues) into it. (EDIT: this is very unlikely because I know this project could turn into a endless rabbithole for me)

May I ask why the code uses instances of the Transmute ZST rather than having it be an uninhabited type? Is it just to make the syntax easier?

I use this pattern all the time btw. So convenient. (Not only with boxes, but actually any owning container.)

That is a different operation though, + 1 is like fn(i32) -> i32 while += 1 is like fn(&mut i32).

Maybe a better example is negation, for which there is no &mut method.

pub fn foo(mut x: Box<i32>) -> Box<i32> {
    *x = -*x;

The main point still holds though. :slight_smile:

Would it be possible to implement a reuse method like this:

impl Box<T> {
    fn reuse<N>(mut self, new: N) -> Box<N>

except that it ensures that T and N have the same size?

We could do this right now, with no change to the compiler

impl<T: ?Sized> Box<T> {
    fn try_reuse<N>(self, new: N) -> Result<Box<N>, Self> {
        self.try_reuse_with(|| new)

    fn try_reuse_with<N, F>(self, new: F) -> Result<Box<N>, Self> 
    where F: FnOnce() -> N {
        use std::alloc::Layout;
        let layout = Layout::for_value(&*self);

        if layout == Layout::new::<N>() {
            unsafe {
                let ptr = Box::into_raw(self);
                let ptr = ptr as *mut N;
        } else {

Another possibility:

fn reuse<N>(self, new: N) -> Box<N> {
    self.try_reuse_with(|| new).unwrap_or_else(|| Box::new(new))
1 Like

I thought they might have cited

It appears all their benchmarks are single threaded. What really worries me about refcounting performance is when multicore workloads that are logically read-only, like traversing a complex read-only datastructure simultaneously from multiple cores, turn into read-write workloads and now you have lots of CREW cache misses when you wouldn't without refcounting.


Hi, one of the authors of the paper here.

Note that you don't really need our hypothesis in this case because Rust statically knows that x will be deallocated here; we need the hypothesis because we do not have a linear type system to give us static guarantees about memory management. So we basically need to guess and check at runtime. A closer example in Rust would be a function that takes and returns an Rc/Arc and tries to reuse the allocation by use of try_unwrap (but don't forget to drop nested Rcs early like in the map example in the paper!).

Yeah, arxiv was not kind to our Latex sources. It should be better now.


Thanks for the link, I wasn't aware of this paper before. The analysis section looks very nice and thorough. The proposed delayed refcounting optimizations do not seem to be compatible with ours that require exact reference counts.

I would hope that for your use case, most of the traversal is done using borrowed references that do not change the reference counter. But since our (heuristical) borrowing is much more restrictive than say Rust's, there is certainly room for improvement in that regard.

Keeping the allocation alive without the object sounds exactly like keeping a Weak before try_unwrap. On that topis, could one reuse a Weak that has no current strong owners?

impl Weak<T> {
    /// Reuse the allocation if there are no strong references to it.
    pub fn try_init(&self, val: T) -> Result<Rc<T>, T> {
        // or `Option<Rc<T>>` but avoids the drop of `T`

if reference-based garbage collection is good then why C# doesnt use it? i am sure they are doing their best to make the language as fast as possible

Is that an argument from authority?


It kind of is an argument-from-authority, but in any case, I think the question can be pretty easily answered.

The problem is that it's not as simple as "RC is good or bad". The OP said that it can be quite efficient, not that it necessarily is efficient, or efficient in all use cases.

Well-written Rust and C++ use reference counting selectively, both by allowing you to allocate objects with no reference count at all, and by manually eliding unnecessary bumps even when the object itself does have a reference count. In C#, on the other hand, all class instances are implicitly garbage collected, so if C# was using reference counting, it would be ubiquitous reference counting. Taking the existing C# design verbatim, but replacing the tracing GC with atomic RC, would probably make it slower (it would spend a bunch more time messing around with the reference counts). Taking the existing Rust design, but replacing Rc with a tracing garbage collector, would probably also make it slower (since it would no longer have deterministic destruction, and would have to be able to trace through all objects, a bunch of weird optimization shenanigans would suddenly become illegal).

All optimization is optimization to a niche, and Rust and C# target different niches.


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.