RFRFC : std thread at_start(callback)

This is a Request For an RFC (thus "RFRFC"), for a standard library function that allows one to register a procedure that is invoked for every subsequent thread that is spawned.

(One might think this the dual to thread::at_exit, but that's not quite true ... at_exit only invokes its callbacks when the main thread exits. This is meant to run when any thread starts.)

The immediate use case I have in mind is for the boehm-demers-weiser (BDW) conservative GC, which strongly encourages a configuration where one registers each thread that may call into the GC. See also https://github.com/swgillespie/boehm_gc_allocator/issues/2

  • (I can register each thread I create manually, which is what I am planning to do for the short term. But an interesting aspect of the BDW collector is that it should allow Gc<T> to derive Send if the T itself is Sync, I think -- something which historically Rust has deliberately not attempted to support, because the past GC's were meant to be isolated to their own thread, but if BDW can do it, then it might be nice to provide a way to support it. Which definitely requires registration of the threads via std::thread::at_start.)

Here is my rough draft of its signature and documentation:

/// Enqueues a procedure to run when a thread is started.
///
/// Returns `Ok` if the handler was successfully registered, meaning
/// that the closure will be run sometime after each subsequent thread
/// is constructed, but before it starts running its associated main
/// routine.  Returns `Err` to indicate that the closure could not be
/// registered, meaning that it is not scheduled to be run.
///
/// FIXME: Should we also provide some way to unregister a function?
#[unstable(feature = "TODO_unnamed_feature", reason = "recent API addition", issue="99999999")]
pub fn at_start<F: Fn() + Send + 'static>(f: F) -> ::result::Result<(), ()> {
    sys_common::at_thread_start(f)
}

You can see my prototype implementation here:

https://github.com/pnkfelix/rust/commit/bfff1821c26c3900b0abab531ff1b5f59260fcd3


Anyway, I largely am posting this here to find out if other people have example use cases for such a function. It would encourage me to actually jump into writing an RFC if I felt like there is demand for this.

I’m a bit reluctant, because this would be a feature that Rust adds over the operating system. Kind of like a new runtime.

I can see situations where Rust code is executed in threads not created by thread::spawn (for example using a Rust library from C, or inside a callback), in which case the functions registered with at_start haven’t been executed.

Spawning a Rust thread already does strictly more than pthread_create or the like, doesn’t it? Even today, can’t Rust threads that weren’t created by thread::spawn run without the stack overflow handler in place? (here). I understand wanting to make Rust threads as “pure” as possible, but we already have the hazard of stack overflow segfaults when running Rust on a thread not owned by Rust.

The main problem encountered when integrating Rust into the BDWGC, as @pnkfelix noted, is that BDWGC expects every thread that will be allocating to be registered with the collector. Since things like the test harness spawn threads and immediately allocate without ever calling into user code, there’s no way to inform the collector that we’re going to be allocating on a thread until we’ve already allocated. The BDWGC aborts if a collection is triggered on a thread that hasn’t been registered, and any allocation can trigger it.

I don’t usually participate in RFC-like discussions so my knowledge of Rust internals is limited, but I think that it would useful for runtime-like library features - in particular, GC-aware allocators or virtual machines - to be able to keep track of what Rust threads are currently running. As far as I can tell, there’s no way to do this on Linux. A mechanism like this would allow an allocator to require the registering of all spawned threads in a way that’s completely transparent to the user of the allocator.

@swgillespie maybe we need to interpose pthread_create or similar.

I don’t even see that is too bad, as I rather have something more static like a lang item anyways. That would also address @tomaka 's concerns. Only issue is it will be a pain to implement nicely in build-system.

@Ericson2314 Yeah, that’s what the libgc C/C++ API does - it #defines pthread_create to point to a function that creates the thread and registers it with the collector.

That’s an interesting idea - maybe we could have something like our allocator API, like __rust_create_thread? Could be interesting.

This could possibly return a closure that unregisters the function, in the success case.

Is there a problem with writing a wrapper that registers the thread on the first call to the allocation function?

2 Likes

[quote=“logician, post:7, topic:2877, full:true”] Is there a problem with writing a wrapper that registers the thread on the first call to the allocation function? [/quote]

That’s what I’d do. The GC-crate should offer its own spawn function. I’m fairly sure the type system can express a special Send type, to make sure that you don’t send GCable types across normally spawned threads. Here’s something along which lines it should work.

```rust impl !Send for Gc {} unsafe impl GSend for T where T: Send {} unsafe impl GSend for Gc {}

struct Hack<RES: Send + 'static, F: GSend + FnOnce() -> RES + 'static>(F);

unsafe impl<RES: Send + 'static, F: GSend + FnOnce() -> RES + 'static> Send for Hack<RES, F> {}

fn spawn<F, T>(f: F) -> JoinHandle where F: FnOnce() -> T, F: GSend + 'static, T: Send + 'static { let hack = Hack(f); // init gc here std::thread::spawn(move || (hack.0)()) }

<strike>Works in the [Playground](http://is.gd/ZaMGAl)</strike>

UPDATE: This doesn't work with the regular std-threading-tools like channels or mutexes... :( (see http://is.gd/YYz4xW for the attempt)

I think this is why @pnkfelix mentioned making Gc<T> sendable -- in that case, you might wind up with GC'd data even if you yourself did not do any allocation.

This seems fairly analogous to atexit. Of course, there is the obvious downside that threads spawned via some other means (e.g., direct calls to pthread_create) will not (and cannot) be automatically registered.

Is there a problem with writing a wrapper that registers the thread on the first call to the allocation function?

Off the top of my head, this would require some facility for introspecting the address of the cold-end of the stack for the thread. (After all, the first allocation call may occur somewhere deep in the threads activity.)

We do not currently offer such a facility; we might be able to add it. (@alexcrichton warned me that not all platforms provide the primitives we would need to do this accurately for all targets, but for such targets, the system thread_start code could initialize a thread-local with a a conservative approximation.

Registering the thread on startup allows us to side-step adding such introspection facility. (Having said that, I myself am not opposed to adding such introspection.)


Update: (And of course there is the issue that you have to give up impl<T:Send+Sync> Send for Gc<T> if you do this, as @nikomatsakis already reminded us above.)

is there any downside for a thread to have the gc-framework enabled if it doesn’t own/reference any gc-able types? (except the additional cost at the spawning of the thread, I mean while the thread is running)

Well, for BDW, every registered thread will get paused by the collector when it does a stop-world for collection.

From what I read, It was precisely this drawback that led the people behind BDW away from automatically inferring the set of threads and instead started requiring each thread to register itself. This way threads that are not doing GC things do not get interrupted by the GC.


(Of course this argument can be used to argue against the using the std::thread::at_start API to auto-register threads, since such a system would then fall victim to the above problem. But I am willing to say "there are trade-offs -- let the end-developer decide what they want." Also, it would be interesting to figure out if the std::thread::at_start API can be generalized in some way so that each callback is only run from the tree to threads that are spawned from the original parent that registered that callback. That would allow for some amount of modular reasoning.)

so there are 4 alternatives:

  1. at_start callback (slowdown for all threads)
  2. rewrite every sync primitive for gc (lets not do that xD)
  3. generic Sync + Send traits so the sync primitives can be re-used (not even sure if that would work)
  4. check a lazy thread local at every alloc/dealloc call of the gc-crate (slowdown for every single allocation, not sure how much that is)

You could still send a Gc<T> upwards in the tree until you are past the at_start setting

I don't quite understand what you are saying here: there is no frame above the one established for the thread's start.

(The at_start call itself is on the main thread, not the other threads that are spawned off of it. One does need to register the main thread itself at the very beginning of the program run.)

Update: Oh, oh, you're referring to the tree of threads, not the tree of procedure calls. Never mind, I understand now. Yes, the thread-tree use case for at_start is probably not useful for a sendable Gc<T> -- its more something I was considering for other use cases...

By the way, a "call into the GC" in this context includes allocation of roots, which, in BDW as I imagine it would work in Rust, means all allocations.

(That is, even if you never allocate an object on the GC heap itself, BDW will still complain if you allocate other state on an unregistered thread. This is probably a useful safe guard for the library, but it does complicate certain coding patterns.)

I think @nikomatsakis 's point above against this still holds, no? Did I miss some counter-argument about this approach missing potential roots (sent to a thread that itself does no allocation? Am I being silly assuming such things exist?)

Well, it's "sent to a thread that has not YET done allocation", which seems like something that will clearly exist.

No, I misunderstood.

Also it's not a question of missing roots. Since the allocator was replaced completely, even for normal rust allocations (still manually managed, but known to the gc), all of this is moot anyway. I'd wager your test fails with the same error if you simply allocate a Box.

If you use the Gc crate, all threads need to register with the Gc, OR there needs to be a way to use different allocators and make sure Gc-objects can't ever be put into an object allocated by a different allocator.

If you can create an instance of the following object with the normal allocator instead of the Gc allocator

struct EvilType {
    blub: Option<Gc<i32>>,
}

then you could send this object to a gc-thread, that thread could put a Gc in there and you quickly get a dangling reference once the gc collects the object you are pointing to.

All that does is shows debug output so I would say it doesn't count. It's a debugging feature that can be completely replaced by a small gdb script, not a fundamental part of language or runtime.