Motivation for this post: @zwarich asked a good question here, about potential API’s for having allocators that need not be tightly integrated with the garbage collector:
https://github.com/rust-lang/rfcs/pull/244#issuecomment-56929140
I attempted to provide a complete answer there. But there are some extra details that I think are relevant to the topic, yet I did not want to make my response comment too large.
Therefore, I thought I would put my more complete thoughts here, and hopefully they might be revised over time as our vision for GC becomes more complete.
Conceptually, I think of the memory in a garbage-collected runtime as being partitioned into three or four kinds (I’ll explain the “or four”).
- The first is GC-managed blocks of memory; they are allocated by the collector and it is the responsibility of the collector to trace them (and reclaim them when possible).
- The second are non-GC-managed blocks of memory that contain no pointers into the GC-heap; these are manually managed (explicit free’s or other reclaiming scheme) and need no integration with the collector.
- The third is the stack, which, with the program registers, makes up the obvious root set used in standard collection schemes.
- The fourth is non-GC-managed memory that can contain pointers into the GC-heap; these are also manually managed, but still require GC integration (especially if you want to trace such things precisely)
(If you squint your eyes. the third case could be considered just a special case of the fourth. But for a moment, let us treat them as distinct cases.)
The question is: What do you do about that fourth kind? Even assuming that you have stack maps that tell you which stack slots contain GC ptrs (and perhaps even what their precise types are, so that you can derive type-specific tracing methods), those stack maps will not include the fourth kind of memory.
-
So, do you try to find paths from the roots (the so-called “third kind” above) to blocks in the fourth kind? Note that the reason to do this is not because one thinks that the blocks may have become unreachable, but rather because one can derive from such paths what the precise type to use when tracing the block of memory.
-
Or do you register each such block itself as a root (and then either scan it conservatively, or include with the registration a type, or a description of its layout, or a tracing hook)?
I think the strategy of trying to find a path from a root in the application code to a native block is perilous; to me, part of the point of being able to allocate native memory is precisely so I can hand it off to some native library (perhaps temporarily), and not have to deal with in my own application code.
(Of course, one could argue that the list of registered blocks is itself acting as a very direct path from a root (the registry itself) to the block. But it still seems a lot simpler than trying to force such a path to be otherwise derivable.)