Per-type static variables (take 2)

Hi folks,

one of the features I miss the most in Rust is per-type global variables (one instance of a variable for type). Something similar to C++:

template <typename T> void foo() {
  static T t; // each instance of `foo` gets own copy of `t`
  ...
}

C++ has this feature and it is very convenient for certain high perf use cases (various per-type counters and caches without the overhead of using Mutex<HashMap<TypeId, V>>>).

I have proposed this idea a couple months ago.

The feedback was, it won't work on Windows because of the way DLL linking working on Windows (it does not deduplicate globals).

I don't know if

  • Windows DLL issue is a big deal
  • is there a good way to solve this riddle with DLL linking

However, there's a simple and relatively cheap work around this issue: for static variable of type T store not the T itself, but a pointer *mut T, allocate memory for T with malloc, and cache a pointer to allocated memory in that *mut T.

Or in pseudocode:

static t: T = ...;

let x = &t;

compiled to (pseudocode):

static t_pointer: *mut T = ...;

let x = {
    if t_pointer == null {
        t_pointer = allocate_static::<T>();
    }
    t_pointer
}

Last time I came emptyhanded, now I have a sketch implementation and examples.

Sketch: last commit in this branch. This commit exposes alloc_static<K, T>() function to access per-type global memory.

Examples using this imlpementation:

What do you think?

Happy New Year!

1 Like

Your examples have UB in them, for example

// lazy_static.rs
pub fn lazy_static<T: Sync + 'static, F: FnOnce() -> T + 'static>(init: F) -> &'static T {
    // .. snip ..

    unsafe {
        let holder = &mut *::std::alloc::alloc_static::<F, LazyStaticHolder<T>>();
        // .. snip ..
    }
}

Here you are creating aliasing unique references due to a race condition (if two threads call lazy_static at the same time)

You should do, this instead

pub fn lazy_static<T: Sync + 'static, F: FnOnce() -> T>(init: F) -> &'static T {
    struct LazyStaticHolder<T> {
        init: AtomicU8,
        t: UnsafeCell<T>,
    }

    unsafe {
        let holder = &*::std::alloc::alloc_static::<F, LazyStaticHolder<T>>();
        let mut lock = holder.init.load(Ordering::SeqCst);
        loop {
            if lock == 2 {
                break
            }
            if let Err(next_lock) = holder.init.compare_exchange_weak(0, 1, Ordering::SeqCst, Ordering::SeqCst) {
                lock = next_lock;
            } else {
                ptr::write(holder.t.get(), init());
                holder.init.store(2, Ordering::SeqCst);
                break
            }
        }
        return &*holder.t.get();
    }
}
1 Like

How does this solve the problem? Isn't the pointer still duplicated? When one copy is assigned a non-null value, won't the others still be null?

This is a feature I've missed myself (also coming from C++). I don't have the background to evaluate the specific solution you've developed, but I appreciate you taking the time to work on this!

1 Like

AFAIU this is technically an UB, but won't lead to any problems in modern compilers, right? Thanks anyway, applied your suggestions!

Well, technically right now Rust likely won't misoptimize it due to bugs in LLVM (so the uniqueness property of &mut T won't be exploited aggressively), but this will almost certainly change in the future.

5 Likes

The intrinsic alone doesn't solve the problem.

Library function std::alloc::alloc_static_impl manages one global map from TypeId to T (this function is not generic, it is instantiated only once in std).

Intrinsic provides a pointer to a T* (not T) which is a cache for memory returned by std::alloc::alloc_static_impl.

Different crates may have copies of T* (not T), and pointers point to the same memory area: this logic is implemented by a library function std::alloc::alloc_static.

When one copy is assigned a non-null value, won't the others still be null?

Technically, yes. However, field (T*) is not accessed directly, but using std::alloc::alloc_static which will initialize the second copy to the same value as first copy.

@stepancheg You can indeed use the method linked in your post to implement generic statics which works in the presence of shared libraries. It should be done with a lock-free linked list per generic static so no allocations or platform support is required. It would come with the limitation that constant expressions cannot borrow generic statics since the canonical address wouldn't be known until run-time, but that's not very restrictive.

Sorry, I did not get it. How different instances of a variable would find each other without global heap allocated map? Can you elaborate, please?

Say you have a generic static:

static BAR<T: const Default>: T = T::default();

And a use of it in another crate:

fn foo() -> &'static bool {
    &BAR<bool>
}

The static would be lowered to:

static BAR: LinkedList<Instance> = LinkedList::new();

And the use would be lowered to:

static BAR_bool_node: LinkedListNode<Instance> = {
    LinkedListNode::new(Instance {
        type_id: TypeId::of<bool>(),
        address: &BAR_bool_storage as *const bool as *mut,
    })
};
static BAR_bool_storage: bool = bool::default();
static BAR_bool_address: AtomicPtr<bool> = AtomicPtr::new();

unsafe fn BAR_bool_get() -> &'static bool {
    let ptr = BAR_bool_address.load(Relaxed);
    if !ptr.is_null() {
        return &*ptr;
    }
    let ptr = __rust_register_generic_static(&BAR, &BAR_bool_node) as *const bool;
    BAR_bool_address.store(ptr, Relaxed);
    &*ptr
}

fn foo() -> &'static bool {
    unsafe { BAR_bool_get() }
}

With the runtime containing:

/// A generic static instance
struct Instance {
    type_id: TypeId,
    address: *mut (),
}

pub fn __rust_register_generic_static(
    list: &'static LinkedList<Instance>,
    node: &'static LinkedListNode<Instance>,
) -> *mut () {
    loop {
        let state = list.state();

        for instance in list {
            if instance.type_id == node.type_id {
                // There is a registered instance already
                return instance.address;
            }
        }

        // Try to insert `node` to the list. This will fail if the list changed.
        if list.state(end, node) {
            return node.address;
        }
    }
}
1 Like

I see:

  • instead of a global Map<TypeId, *mut ()>, there could be the same "map" per each generic static
  • it is a linked list of statically allocated nodes instead of HashMap
  • instead of heap-allocation of storage, each crate statically allocates own copy, and who registers first, wins

Although LinkedList search is linear, the map can be implemented as a tree to make search logarithmic (play, this impl is not lock-free, but it should be relatively easy to make it lock-free)

Sounds great!

But what about unloading? If the winner (=first) is a DLL and not the main module, and that DLL is unloaded afterwards, then other acessors will have a pointer into the static part of the (first) DLL.

But how could this be safe at all (until Rust has an ABI and ways of checking ABI stability across executable images?)

Yes, to make unloading safe something need to be heap allocated. Thank you for the pointer!

Well, at least if everything is compiled using the same compiler version, it should be safe.

I wonder if it could be gated on whether 'alloc' is present: With 'alloc' available, generic statics could use variations on the helpers you wrote. Without alloc, generic statics should be local to each loaded module, and the dylib target could not be used. Doesn't that strike a fair balance between safety and flexibility?

That requirement/assumption, in my opinion, is sooo much more dangerous than any inconvenience offered for dylib+no_std usage.

@jespersm Rust does not support dynamically unloaded libraries. They are unsound precisely because you can unload a library with 'static data, but 'static is supposed to last the lifetime of the program.

3 Likes

I thought the most common reason to even consider dynamic linking (for release/production builds) in the first place is that, for some use cases, “use the same toolchain for everything” is simply not an option. Do you have some other use case in mind here?