Shifgrethor: garbage collection as a Rust library


#15

Thanks for the answers @withoutboats. But I’m not fully satisfied with the first answer. What I meant was: say I’m writing a library with the following API:

pub struct Library {
    // ...
}

impl Library {
    pub fn new() -> Library {
        // ...
    }

    pub fn do_something(&mut self) {
         // ...
    }
}

Is there a way to fill in those // ... so that Library can keep a graph as an internal data structure (constructed in new and read/updated in do_something), where the graph’s nodes are garbage-collected by Shifgrethor? The issue is that any roots you create with letroot! in new() die before the end of new(), you can’t store them. Unless you create a generator that borrows the roots across yield points and you store that generator, but I’m not sure if that is possible (or will be possible when self-referential generators are fully implemented).


#16

@withoutboats, I think the performance benefits of generational collectors come from being also copying/compacting:

  1. Generational hypotesys is that most ot the obects/allocations die young.
  2. Copying collectors are best suited for this case as the work for collecting a nursery is linear to the live objects/allocations (small fraction, according to 1), and the dead objects can be discarded with just a pointer switch (if you don’t have exceptions as pinning).

So for copying collector it may be beneficial to have 65-75% nursery size if your workload is generating a lot of transient garbadge. I think this is not going to be the common case in Rust as transient allocations can be statically allocated/deallocated (compared to Scala for example), so not subject of GC.


#17

In another post I discussed some ideas I came up with.

Basically, I think Rust need a mechanism to say “whenever you send this value to somewhere else, you have to send that value as well”. This guarantees, whenever we move the “weak reference”, the referee is also moved or at least visible for the compiler to generate adjust code.

Without this ability (I think it is fair to say that no existing languages provide such checking in compile time), we will need to use unsafe code to ensure the invariant above. Then we can use the Access trait to achieve this.

(My Access trait is similar to Index/IndexMut, but they are different: it allows to return something that is not a reference. So you can return something like Option<&T> or Result<&T,E>. Also, its self is the index, not the container)


#18

I’m curious if we could have pointers into a single “world-stopping” GC, i.e. some kind of sync Agc<T> pointer. Naively, I imagine each root having mutex that locks whenever any of its Agc<T> s are being read or written, and which also locks during collection of that pointer. This hardly sounds performant, though, and unmanaged OS threads aren’t the sort of thing you can really “stop”…


#19

I’m making a note to explore this code while it’s still got a naive implementation, so that I can understand it. Thank you for writing it.

Learning Rust has opened up the world of systems programming that has always been opaque to me, and projects like this are part of the reason why.

I started my career as a web application programmer, working mostly on the frontend. Like a salmon swimming upstream to spawn, I have been gradually migrating down the stack towards metal for 10 or 11 years now. But I’d been stuck, for a while.

The problem I’ve always had with reading the code under the hood is that most libraries and languages typically have a FFI to core lib written in arcane C or C++. I don’t use C and C++ on a daily basis, and although I know those languages at an intermediate level, the preprocessor and template shenanigans that are used in production libraries have always evaded me. I can read through C++ application code, but as soon as I try to read Boost source code to demystify the magic, my head starts continually exploding until I have to look away.

Not so with Rust! I frequently and without undue hardship can read the implementation source for major, production-quality subsystems, like tokio, std, rayon, and crossbeam.

When I started learning Rust, I was expecting to get certain benefits. Beyond the headline “memory safety without GC”, there’s the rest of the superb type system, algebraic data types + destructuring, and emphasis on cost-free abstraction. All those promised benefits completely panned out. But these cost-free abstractions synergized to provide an unexpected benefit: production-quality crates are written in the same dialect of the same language that I’m writing. The result was a demystification of the programs that I write. The demystification revealed the beautiful hidden world of systems programming.

Thanks for all of your code, and, by the way, all of your fantastic posts, which have been a supreme education for me on CS fundamentals. Whenever I see a new blog post, RFC, or even github comment from you, I know I’m in for an education. It’s difficult to overstate my gratitude in words.


Testimonials page for Rust?
#20

I’ve posted the second post in the series, which provides some technical details about how managed heap is represented as background for the rest of the series.


I want to express how enormously well-received this comment is by me, and not only for the obvious reason that it is very high praise. I think it’s not widely enough known that my formal education was in the humanities, not computer science or even a related technical field. I found Rust (back in 2014) very early in my experience with computer programming, and I had exactly the experience you describe: the Rust language and the Rust community rapidly accelerated my education.

For me it was not only the systems programming problems that Rust tackles and makes accessible, but also the particular intersection of different expertises and goals - the systems problems, the type theoretical approach, the commitment to an accessible user experience - that made Rust a fulcrum of learning for me. Even as my knowledge grows, there has always been a more informed person - often a world class expert - available to answer my questions.

I’ve been very fortunate over these past 4 years to have had the opportunity to become more and more involved in Rust, to the point that now it is my full time employment. It makes me so glad to hear that my work has had for others the impact that the work of others before had on me.


#21

From one humanities major to another, I salute you! I’m amazed that you’ve been at this for such a short time, as well.

Getting back to the GC, though. Would a tricolor mark-and-sweep algorithm be implementable under the constraints that shifgrethor’s API imposes? Like a naive mark-and-sweep, it is a precise, tracing collector.

In practice, it achieves excellent latency even for large heap sizes, and it never moves a pointer, so the whole pinning problem just drops out. The tradeoff of “extra work, but concurrently on another core” is an increasingly good one for modern hardware.


#22

Nothing about that algorithm prevents it from being used in shifgrethor. Its designed to be run concurrently with program operations (as opposed to “stop the world” GCs which run no user code during collection, like shifgrethor’s current operation). Concurrent GCs in general can introduce some tricky problems, but not specific to that algorithm.


#23

Post #3: Rooting GC’d objects in Rust. This provides a brief look at well-known rooting strategies used by other languages, and then goes into detail on how rooting is implemented in shifgrethor, including the somewhat novel concept of the 'root lifetime (which will come up in later posts as well).


#24

Looking at this, it occurred to me: what if we just stopped tracking roots in the stack at all? Instead of this doubly linked list of roots, we could actually use a proper global stack.

I just started experimenting with exactly that today :smiley: Nice to see that i was on the right path.


#25

I hope I’m not jumping the gun by posting this before @withoutboats, but I see the 4th post is up now!

Shifgrethor IV: Tracing

Another great post in this amazing series. :slight_smile: After reading the previous posts on GC in Rust leading up to this project, it really feels like there’s something special here.


#26

Thanks, I was just coming here to post it! :smile:


#27

The Rust community has written some amazing series, but I’ve never been as excited as I am about this one. :slight_smile: Ever since the first post came out, I spend half of my days checking to see if there’s a new post to read or just rereading the existing posts and poring over the source code.

Thanks so much for the effort you’re putting into this!


#28

The posts are great, but is shifgrethor ready to be published to crates.io some time soon?


#29

Again, interesting read. Thanks for doing the work and writing about it!


#30

I’m curious about how Shifgrethor’s interior mutability rules work, in comparison to something like a List<T> in e.g. C#. My knowledge of GCs and Rust is limited, so please bear with me. :slight_smile:

In C#, I could easily have a List<T> that allows me to add/remove other GC items inside it. This is a far different situation to Rust of course, since ownership/mutability isn’t in play.

However, if I wanted something similar in Rust with Shifgrethor, how would this look? Is this even a valid use case? I’m wondering if there could be some kind of “updatable” GC pointer that allows this, that could be put inside the Vec. Is there a better way to achieve something similar?


#31

This is the subject of a future post in the series, but in brief: the tricky part of interior mutability is making sure that you can’t unroot a Gc pointer while you’re mutating it (for example, by swapping it onto the stack, where its no longer being traced). Probably collections like the one you’ve described will be best handled by custom collection types that have APIs designed to make sure you never get access to an unrooted (or unrootable) gc pointer.


#32

Not sure! It’s pretty much done as far as its current iteration goes, but its current iteration is to support this blog series describing the research into what Rust can express, and not to be generally useful. Once the blog series concludes, I’d be interested in sharing ownership with anyone who thinks shifgrethor has the potential to be used for real programs & helping to support them in trying to improve its performance & ergonomics.


#33

Ok, cool. I think I understand… let me see if I have this correct. :slight_smile:

An updatable pointer could work then, if you were forced to create a new root whenever you use it? Unlike GcStore which, IIUC, is just a simple transmute into a Gc as long as you access it through some other Gc. That would then let you swap it out onto the stack like you mentioned, since it has been properly rooted. You just have to pay the expense of rooting, vs the free transmute.

I’m anxious to see what’s next once the blog series concludes as well. I don’t have the expertise to drive design, and I don’t have a use case… but I’d love to work on this. :slight_smile:


#34

In general, growing collections is not a problem: for example, you’ll trace every element of a List, so adding new members to the list is no problem, because they’ll also be traced. The problem is that when you take an element from the list, you have to make sure that it is still rooted wherever you put it.