Thread lifetime for TLS

jgarvin · December 10, 2020, 9:53pm

I'm going to guess this has come up before but I don't immediately see why this wouldn't work.

TLS data in Rust is a little painful to access, because you have to go through LocalKey::with and introduce a lambda and a level of nesting to access your data. This prevents you from writing idiomatic code that could return references to thread locals, even in cases where this would be safe.

Why not have a 'thread lifetime, which would automatically be associated with references taken of a thread local variable? As far as I can tell the semantics wouldn't be too complicated: 'thread would be longer than every lifetime except 'static. Because thread spawning requires 'static, there would be no danger of thread local data being accessed across threads. But the compiler would still be able to understand that thread locals will live longer than a stack variable, and would be safe to return from a function and access without a with method.

Thoughts?

CAD97 · December 10, 2020, 10:21pm

This isn't true, because of things like crossbeam's scoped threads.

Stuff being 'static to cross thread borders isn't a language invariant, it's just a detail of std::thread::spawn creating an unbounded thread. The invariant for crossing thread borders is Send. Data in TLS is necessarily not Sync, so the theoretical best you could expose is &'static ThreadLocal<T> (from which getting &T would be unsafe unless &T: !Send).

jgarvin · December 11, 2020, 12:05am

Maybe there are two problems...

thread_local! under the hood makes variables declared static but that have #[thread_local] on them. My understanding is that the problem with giving users direct access to this underlying variable is that the lifetime would be 'static, because the variable is declared static, which is just incorrect b/c the data won't live until the end of the process. So the first "fix" is making taking a reference to a thread local have 'thread lifetime.

Scoped threads we don't want to have deref a TLS reference b/c they will see a different copy with different data. You're right, they probably only require Send, so we need references to thread locals to be !Send. Currently I believe the rule is &T is Send if T is Sync. If you can only apply thread local to non-Sync types, then I think everything is fine -- because I assume somehow scoped threads still prevent you from giving them !Send types -- although I can't tell from reading the docs. I don't see anything here that indicates values captured by the lambda given to the thread must be Send, so I'm not sure how this works.

However I also see nothing currently enforcing TLS can only be used on !Sync types, even though using it on a Sync type would be odd. LocalKey<T> appears to only require T: 'static.

jschievink · December 11, 2020, 12:22am

Note that you cannot get a &'static reference to a #[thread_local] static. It is treated like a local variable in this case.

jschievink · December 11, 2020, 12:23am

Ah, except that there's a bug around #[thread_local] static mut: https://github.com/rust-lang/rust/issues/54366

jgarvin · December 11, 2020, 12:26am

I was remembering this from local.rs:

// It's not valid for a true static to reference a #[thread_local] static,
// so we get around that by exposing an accessor through a layer of function
// indirection (this thunk).

Maybe that predates the idea of making references to #[thread_local] not be static? Or is this only here because of the bug you linked?

matklad · December 11, 2020, 4:13pm

The hard bit about thread-locals are destructors. What with achieves is a lazy re-initialization of thread-locals.

In other words, what's the semantics of the following program?

#[thread_local]
static A: S = S(0);

#[thread_local]
static B: S = S(1);

struct S(u32);
impl Drop for S {
  fn drop(&mut self) {
    println!("{:?}", (A.0, B.0));
  }
}

fn main() {
  std::thread::spawn(|| {
    (A.0, B.0);
  }).join().unwrap();
}

bjorn3 · December 11, 2020, 6:59pm

It doesn't drop anything. There are two ways to implement thread local data. The first is using statics marked with #[thread_local]. The initializers of those are stored in the .tdata section for ELF. Conceptually when starting a thread the content of .tdata is copied to a location in memory specific for that thread. When the thread exits, that memory is freed. (in reality glibc lazily allocated the memory as it is accessed first most of the time) The second is using tls keys. These can be created using pthread_key_create. A tls key is a value that can be freely shared between threads. When trying to access the content, the content specific for the current thread is returned. Tls keys can only contain a single pointer. Neither way runs any destructor. Libstd uses platform specific methods to run a function just before a thread exits. This function is responsible for dropping all thread local variables registered by thread_local! {}.

github.com

rust-lang/rust/blob/2225ee1b62ff089917434aefd9b2bf509cfa087f/library/std/src/thread/local.rs

//! Thread local storage

#![unstable(feature = "thread_local_internals", issue = "none")]

#[cfg(all(test, not(target_os = "emscripten")))]
mod tests;

#[cfg(test)]
mod dynamic_tests;

use crate::error::Error;
use crate::fmt;

/// A thread local storage key which owns its contents.
///
/// This key uses the fastest possible implementation available to it for the
/// target platform. It is instantiated with the [`thread_local!`] macro and the
/// primary method is the [`with`] method.
///
/// The [`with`] method yields a reference to the contained value which cannot be

This file has been truncated. show original

RalfJung · December 12, 2020, 2:05pm

Right, but that's not what we want for a user-exposed stable notion of "thread-locals". And from what I understand, only supporting non-dropping thread-locals would not be very attractive.

Once thread-locals support destructors, we inherently have the problem of running the destructors in some order, so that thread-locals destructed later may not access thread-locals destructed earlier. To make this sound, we fundamentally need something like the with method that can check, at run-time, if the thread-local being accessed has been already destructed.

So the reason that thread-locals are more restricted than regular static has nothing to do with their thread-local nature, and everything with the fact that they have destructors that need to be run. static, on the other hand, do not get destructed -- and that is a crucial part of why we can take regular references to them.

@jgarvin So I'd say the answer to your question is that a 'thread lifetime could only work for destructorless thread-locals, and that was not considered useful enough to be worth pursuing so far.

matklad · December 13, 2020, 4:54pm

TBH, my feelings here are that nobody tried to pursue this design, rather that we tried and decided that it's not worth it. I think fast thread locals without drop are important for some high-performance use-cases.

For example, I believe that it's not really possible to implement a general purpose allocator with great performance in stable Rust due to this issue.

jgarvin · December 13, 2020, 6:25pm

I don't see why supporting destructors by itself necessitates using a with method. RefCell lets you extract references conditional on a check succeeding, without requiring you to wrap all the code that uses the reference inside a with call.

Another more restrictive option would be to make it so that types stored as thread locals simply can't contain references of thread lifetime. That would require deep recursing over the structure to make sure nothing anywhere is parameterized as having thread lifetime, but that's already what auto traits essentially do.

If you want to get really fancy GCC has an extension for C++ that lets you annotate static data with a priority number so that you can exactly control the order of construction and destruction. Then you would make it so that the lifetime can have a number associated with it, so that it's safe for a thread local with lifetime X to have references to things with lifetime >X.

RalfJung · December 13, 2020, 6:42pm

So you are proposing a guard-based interface for TLS? Note that this won't give you 'static/'thread references either; the references you get out of a RefCell are always shorter-lived than the RefMut/Ref guard object.

There probably is a reason though that TLS uses closures instead of guards... but I don't know it off the top of my head (hopefully someone else can fill in).

Yeah, and auto trains are a huge pain for things like semver compatibility. Traversing the type structure is pretty much a no-go from that perspective.

But also, I don't see how this helps. Thread-local statics are globally named, so one thread-local's destructor can access another thread-local simply by name, without holding any reference. See this example posted above, where there are no references being stored in any type.

You'd also have to check that X's destructor does not access variables that are already destructed. Which is basically impossible to do as X's destructor can call arbitrary safe functions and they might access whatever.

jgarvin · December 13, 2020, 7:47pm

Are you sure that rust guarantees that thread local destructors won't cause thread local variables to get reinitialized? This comment in local.rs makes me think it's not guaranteed:

/// # Platform-specific behavior
///
/// Note that a "best effort" is made to ensure that destructors for types
/// stored in thread local storage are run, but not all platforms can guarantee
/// that destructors will be run for all types in thread local storage. For
/// example, there are a number of known caveats where destructors are not run:
///
/// 1. On Unix systems when pthread-based TLS is being used, destructors will
///    not be run for TLS values on the main thread when it exits. Note that the
///    application will exit immediately after the main thread exits as well.
/// 2. On all platforms it's possible for TLS to re-initialize other TLS slots
///    during destruction. Some platforms ensure that this cannot happen
///    infinitely by preventing re-initialization of any slot that has been
///    destroyed, but not all platforms have this guard. Those platforms that do
///    not guard typically have a synthetic limit after which point no more
///    destructors are run.

This makes it sound like rust relies on safeguards the platform itself may or may not provide?

jgarvin · December 13, 2020, 7:54pm

So you are proposing a guard-based interface for TLS? Note that this won't give you 'static / 'thread references either; the references you get out of a RefCell are always shorter-lived than the RefMut / Ref guard object.

I'm trying to see if there is any possible way to make things more ergonomic Something like the thread lifetime I originally proposed would still be better if it could be made to work, but even just getting rid of the closure wrapping would be a big improvement.

Yeah, and auto trains are a huge pain for things like semver compatibility. Traversing the type structure is pretty much a no-go from that perspective.

I'm not familiar with why they are a pain. It seems to me like it's always the case for semver compatibility that there are some things machines can check for us and some things that require human review. This would definitely be in the category of things that machines can check though ("Were there any public types in the old version of the crate that were Send/Sync/NotThreadLocalRef? Did that change?") so I don't see the pain point. How does this usually cause people grief?

Thread-local statics are globally named, so one thread-local's destructor can access another thread-local simply by name, without holding any reference. See this example posted above, where there are no references being stored in any type.

You're right, good point.

You'd also have to check that X's destructor does not access variables that are already destructed.

See my other reply that I'm not sure rust actually enforces this.

RalfJung · December 14, 2020, 5:03pm

Maybe they get reinitialized, but how does that help? The way they get reinitialized is through the with function that you are proposing to get rid of.

That's fair. I suggest digging up the original discussions from when the current TLS design was done; I am sure the alternative of using guards instead of a closure was considered and had to be dropped.

My hypothesis is that: the point of the with method is to ensure that TLS dtors do not begin running while this method is being executed. A guard cannot guarantee this, since it can be stored in another thread-local variable and thus live across the point in time where the dtors start running, which is unsound.

It enforces this dynamically. Quoting from the with documentation:

This function will panic!() if the key currently has its destructor running, and it may panic if the destructor has previously been run for this thread.

On some platforms, instead of enforcing "no access after dtor", it re-initializes and then later runs the dtor again, but that doesn't change anything fundamental (and arguably it is worse since the thread-local values is reset to its initial state, forgetting all changes that happened before).

I am not sure if there is a writeup of this, but the short summary is that auto traits leak implementation details of a library that the library author might not have wanted to leak. There's a reason Copy needs to be implemented explicitly. We do have a few auto traits, but the bar for adding a new one is extremely high at this point -- the cost of each naw auto trait is considered to be rather large (basically every library author needs to be aware of all of them and take defensive measures in their library to ensure they do not expose implementation details that were meant to remain unobservable), so the benefit needs to outweigh that cost.

jgarvin · December 21, 2020, 6:37pm

This is really a whole separate topic but why not just only propagate auto traits up to types on the crate boundary and then stop? Then crate authors would have to opt-in only for their top level public types.

RalfJung · December 22, 2020, 4:54pm

That sounds like an interesting avenue to explore, but I suspect it will require quite a bit of design work to get right.

atagunov · December 22, 2020, 7:01pm

Are there any auto traits
existing or suggested
such that not propagating them
would cause unsoundness
or another kind of trouble?

RalfJung · December 26, 2020, 11:44am

I mean, right now literally everything would break if you stopped propagating Send/Sync, since suddenly each crate would have to explicitly state that it wants its types to be Send/Sync.^^

cuviper · December 26, 2020, 5:52pm

I expect many would also get it wrong. The traits are unsafe for a reason, and being automatic means users don't have to understand that aspect to start threading their code.

Topic		Replies	Views
Request for prioritization: fast thread locals language design	17	3732	May 9, 2021
Caller stack lifetime language design	9	1396	March 10, 2021
Fast thread locals: TLS model language design	4	1061	October 16, 2022
'Type-safe multithreading in Cyclone' and Rust language design	4	2251	March 25, 2019
Pushing the usage of TLS for TyCtxt compiler	3	835	June 3, 2019

Thread lifetime for TLS

Related topics