Hi folks,
One of the things I miss in Rust is lack of static data depending on generic types parameters.
The text below is my proposal.
I’d like to hear feedback.
Type-dependent statics
TL;DR
Implement either:
-
static data: T = ...
whereT
is a function type parameter - or
allocate_static<T>()
intrinsic
The problem
Currently, it’s not possible/expensive to write generic code with per-type storage.
Something like this C++ code is not possible in Rust:
template <typename T>
const T& singleton() {
static T t;
return t;
}
int main() {
cout << "pointer to empty string: " << singleton<string>() << "\n";
cout << "pointer to zero int: " << singleton<int>() << "\n";
return 0;
}
singleton
function can be implemented in Rust using a global hash map
which has a type as a key. This is inefficient because requires locking.
I think Rust can gain from being able to implement singleton()
function efficiently.
Example use cases
I can think of several use cases for that feature:
Singleton function
Sometimes a pointer of a default object of any type is desired.
Something like Default for &T
for arbitrary <T>
.
fn singleton<T: Sync+Default>() -> &'static T { ... }
struct MyData<T> {
message1: Option<Message<T>>;
}
impl<T> MyData<T> {
fn get_message_or_default(&self) -> &Message<T> {
self.message1.as_ref().unwrap_or(singleton());
}
}
Lazy-init can be implemented without macros and can be used in generic code
There’s a lazy-static crate which uses macros (it cannot be used in generic code to create static data per type parameter).
With proposed API lazy_static
-like functionality can be implemented
as a library function:
// `K` type parameter is used here to avoid clashing
// between different functions using the same object type,
// e. g. `HashMap<String, u32>`
fn lazy_init<T: Sync, K>(init: FnOnce() -> T) -> &'static T {
// Lazily initialized storage
static data: AtomicPtr<T> = AtomicPtr::new(0);
if data.is_null() {
let t = Box::new(init());
if data.compare_exchange(0, t.raw()) == 0 {
forget(t);
}
}
unsafe { data.get() as &'static T }
}
struct FooMarker;
fn foo() {
let my_map = lazy_init::<_, FooMarker>(|| {
let mut m = HashMap::new();
m.insert(0, "foo");
m.insert(1, "bar");
m.insert(2, "baz");
m
});
// my_map is of type `&HashMap<u32, &str>` here
// it is initialized on first function invocation
}
Global small object allocator
Small objects of fixed size can be allocated more efficiently than with malloc (faster and with smaller overhead).
/// Allocator for for type `<T>`
struct Arena<T> { ... }
impl Arena<T> {
/// Allocate a memory of
fn allocate(&self) -> *mut T { ... }
fn deallocate(&self, ptr: *mut T) { ... }
}
/// Similar to `Box<T>` but uses `Arena<T>` for allocation.
struct SmallObjectBox<T>(*mut T);
impl SmallObjectBox<T> {
/// Allocates a new box without need to pass a pointer to arena
fn new(t: T) -> Self {
let t_ptr = singleton::<Arena<T>>().allocate();
ptr::write(t_ptr, t);
SmallObjectBox(t_ptr);
}
}
impl Drop for SmallObjectBox<T> {
fn drop(&mut self) {
ptr::drop_in_place(self.0);
singleton::<Arena<T>>().deallocate(self.0);
}
}
singleton<Arena<T>>()
allows fast access to the global per-<T>
allocator.
Per type object counter
This simple tool can be used for debugging memory leaks (e. g. too many pooled objects).
/// Holds a number of objects of type `<T>`
struct ObjectCount<T> {
construct_count: AtomicUsize,
drop_count: AtomicUsize,
marker: marker::PhantomData<T>,
}
/// Place this struct inside an object you are counting
/// Note it has zero memory overhead
struct ObjectCounter<T>(marker::PhantomData<T>);
impl<T> Default for ObjectCounter<T> {
fn default() -> Self {
/// Increment counter on construction
singleton::<ObjectCout<T>>().construct_count.fetch_add(1);
ObjectCounter(marker::PhantomData)
}
}
impl Drop for ObjectCounter<T> {
fn drop(&mut self) {
/// And increment another counter on drop
singleton::<ObjectCout<T>>().drop_count.fetch_add(1);
}
}
impl<T> ObjectCounter<T> {
/// Allocated object count
fn constructed_count() -> usize {
singleton::<ObjectCout<T>>().constructed_count.load()
}
/// Live object count
fn object_count() -> usize {
singleton::<ObjectCout<T>>().constructed_count.load()
- singleton::<ObjectCout<T>>().drop_count.load()
}
}
struct MyPreciousResource {
object_counter: ObjectCounter<MyPreciousResource>,
}
struct MyOtherResource {
object_counter: ObjectCounter<MyPreciousResource>,
}
fn how_many_objects() {
println!("precious resource: {} constructed, {} live",
ObjectCounter::<MyPreciousResource>::constructed_count(),
ObjectCounter::<MyPreciousResource>::object_count());
println!("other resource: {} constructed, {} live",
ObjectCounter::<MyOtherResource>::constructed_count(),
ObjectCounter::<MyOtherResource>::object_count());
}
This simple low-overhead tool can be useful to analyze performance problems when specialized tools are not available or not applicable.
Proposed API 1: allow statics inside functions
There are (at least) two alternative APIs which could provide desired functionality.
First API is a language extension similar to C++ statics.
Code similar to C++ should be possible:
fn foo_bar<A, B>() {
static mut data: A = A::new();
// work with data
}
An instance of data
field is created for each set of function type parameters
(for each function instantiation).
However, this can be confusing for users, e. g. some may assume that these invocations:
foo_bar::<String, u32>();
foo_bar::<String, u64>();
create only one data
static, while compiler should create two statics
because a function is instantiated twice.
The main drawback of this approach is another Rust language specification complication.
Another drawback is backward incompatibility: currently static cannot depend on function type parameters and created only once for all function instantiations (play).
The main advantage of this approach is its zero overhead.
Proposed API 2: allocate_static intrinsic
The previous approach is user-friendly but requires significant language changes.
Alternatively, desired feature can be implemented as a simple compiler intrinsic:
fn allocate_static<T>() -> *mut T;
During instantiation, this function allocates a memory of size T
in the mutable part of the data section.
The function returns a pointer to a memory area which is zero-initialized on the first invocation.
Described above singleton()
function can be implemented like this:
struct SingletonHolder<T> {
// 0: initial
// 1: locked
// 2: initialized
state: AtomicU8,
data: T,
}
fn singleton<T: Default+Sync>() -> &'static T {
let holder = allocate_static::<SingletonHolder<T>>();
loop {
if holder.state.load() == 2 {
return &holder.data;
}
if holder.compare_exchange(0, 1) == 0 {
// lock aquired
ptr::write(&mut holder.data, T::default());
// release lock
holder.store(2);
}
}
}
I think this API does minimal changes in Rust and relatively easy to implement.