Idea: "Ambient data"/"Current execution context"/ Was: Provide a thread-local scope with hooks for `std­։։­thread­։։­spawn`

Sometimes it’s necessary to implement a form of “implicit context” to pass a data too cumbersome to pass explicity everywhere. Eg. in slog there’s slog-scope that allows setting a “current logger”, for a duration of a function call. The called code can then retrieve and use that logger even deep down the call stack. It’s not really global variable, though it is implicitly passed around.

Internally this is just implemented as thread-local Vector of objects. https://docs.rs/slog-scope/4.0.1/src/slog_scope/lib.rs.html#102

The problem with this approach is std::thread::spawn looses this information. Any thread spawn inside such a scoped code, will execute with an empty logger.

So it makes me think - it would be great if it would be possible to register some form of a hook that would get executed when std::thread::spawn is called, that would allow slog-scope to initialize thread-local data for a new thread to appear in the same way as it was called inside the original thread.

I guess this would be implemented quite similarily. Rust stdlib would have a global thread-local variable holding Vec of some hooks. When std::thread::spawn is called, it would call all these hooks, so that any code implementing their “implicits” can store and restore the implicit context.

It seems to me the API would actually have to consist of a pair of hooks - one preparing the data from the current thread-local variables, right before spawn, and a second one that would get that data passed as an argument, right after spawn so it can store it in the thread-local variable of the new thread.

I’ve definitely wanted this, but I’m not sure how I feel about it being connected to std::thread::spawn per se – it’s a tricky question. It’d be nice if e.g. rayon were able to preserve those “thread-local” values, as well perhaps as Future tasks and things. Basically there is a need for a “current task” abstraction that various runtimes can re-use, I would say. But I don’t know just what it should look like.

1 Like

In a way I would worry the least about std::thread::spawn – in part because if you can experiment in user space a bit, it makes it easier to find something that seems to work, and encourage various libraries to adopt it, and perhaps eventually it will find its way into libstd.

This kind of feature could be useful for libtest capturing output:

Hmmm... Now that you mention is, it's I see that the way I was thinking about it, might be insufficiently general.

I was thinking about one way where the code can switch context: start execution on other thread due to std::thread::spawn. But in other cases it can get there eg. by being sent through a channel to a worker. And potentially other cases where "current task" needs to be explicitly preserved/taken care of.

We could think about some thread-local "task context" abstraction, where any library could put their own information related to a current "task context" and well behaved libraries would store and restore it. I'm thinking sort of a AnyMap maybe?

Eg. rayon would capture current "task context" and sent it along with the work to the worker thread, and worker thread would apply this context before executing each work.

std::thread::spawn could store and restore such context automatically too. Maybe with some form of opting-out of it when necessary.

Futures had a concept of a current task, with task-local data, just as you’re describing. The task can be moved to different threads. However, the proposed changes to Future for libstd removed the task::current() ability (now, it’s just a “waker”, and must be passed as an argument, you cannot ask for the current one like you can a thread), and task-local data.

Maybe I’m stating the obvious here, but I think this construct has been called an inheritable thread-local variable in another context.

IIRC @mitsuhiko was tweeting about this subject recently and published https://github.com/mitsuhiko/rust-execution-context.

Since I have already been name dropped here: I worked quite a bit in trying to figure out how to best solve this in different languages now (for more than two years) because we need such things at sentry to drive out product.

The execution context that was linked is based on the .NET one and it’s called “ambient data” there. It’s effectively a copy on write structure that forks any time someone overrides a flow local variable (called an async local in .NET). Python now has something similar in 3.7 which internally is also a COW structure but it does not “fork” in the traditional sense but requires uses to invoke copy_context.

I think the .NET one has the best design based on using it but it does come with the downside that one needs to generally be mindful in using it.

2 Likes

My old friend Nathaniel wrote an essay a few months ago on “structured concurrency” that seems relevant. The key idea is what he calls a “nursery”, which is an object with scoped lifetime, that owns some concurrent tasks (of whatever stripe) and doesn’t allow control to leave its scope until all the concurrent tasks are complete. It also gives you a place to store state shared among the concurrent tasks.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.