In systems programming using partial initialization is often unavoidable (for example the linux kernel has a lot of self referential structs), at the moment rust facilitates partial initialization through the std::mem::MaybeUninit<T>
type. However the relevant functions of this interface are unsafe
making partial initialization error prone and in general more difficult to deal with than other safe parts of rust.
I found these attempts to fix this problem, but none of them were implemented:
- the &uninit T pointer RFC
- partially initialized types, again discussed here
I want to suggest a similar approach to the partially initialized types, because i think that this is a case for the type system. However I think an approach, which does not introduce new syntax, is not only possible, but beneficial. This feature is very important for the ergonomics of some applications. But adding syntax does not seem to add any real improvement over the approach that i found. And leaving the syntax as is, also makes integration with existing code much easier.
The code to add to the stdlib[1]:
//// In core::marker
/// This union is used to mark that a field should be uninitialized,
/// because it will be initialized later and this will be checked by
/// the compiler through the type system.
pub union Uninit<T> {
_value: std::mem::ManuallyDrop<T>,
empty: (),
}
impl<T> Uninit<T> {
/// Create a new Uninit this function is necessary because Uninit<T>
/// is a union, if we could use a struct instead, it might be more
/// ergonomic to write just `Uninit`
pub fn new() -> Self {
Uninit { empty: () }
}
}
/// This trait marks all types, which do not contain an Uninit.
/// It helps facilitate soundness when using partially initialized
/// types, because having multiple points in the code, that could create
/// mutable references may change the type (by initializing parts of it)
/// of the data present behind them, thus invalidating the type of other refs.
pub unsafe auto trait FullyInit {}
/// Uninit does not implement FullyInit and thus all partially initialized
/// types generated by the compiler will also not implement this trait.
impl<T> !FullyInit for Uninit<T> {}
//// -----
//// In core
/// Compiler built in macro in type position that creates (and reuses if
/// occurring multiple times) a partially initialized type replacing all
/// fields in the list by Uninit<$field>.
#[macro_export]
macro_rules! partial_uninit {
(($($field:ident),* $(,)?) init($($init:ident),* $(,)?) $typ:ty) => {
/* compiler built-in */
};
(($($field:ident),* $(,)?) $typ:ty) => {
/* compiler built-in */
};
}
//// -----
//// In alloc::sync
/// additional FullyInit bound to disable cloning Arc<!FullyInit>, because
/// this way you could get two Arc<Mutex<T>> and
/// Arc<Mutex<partial_uninit!((...) T)>> to refer to the same data, which
/// is not sound, because you could initialize it a second time (not dropping the
/// value present)
impl<T: ?Sized + FullyInit> Clone for Arc<T> {
fn clone(&self) -> Self {
// unchanged...
}
}
//// -----
//// In alloc::rc
/// additional FullyInit bound to disable cloning Rc<!FullyInit>, because
/// this way you could get two Rc<RefCell<T>> and
/// Rc<RefCell<partial_uninit!((...) T)>> to refer to the same data, which
/// is not sound, because you could initialize it a second time (not dropping the
/// value present)
impl<T: ?Sized + FullyInit> Clone for Rc<T> {
fn clone(&self) -> Self {
// unchanged...
}
}
//// -----
how it would look like to use this api:
// you can use any struct type with this approach.
struct Foo {
a: u64,
b: String,
}
// create Foo step by step
fn make_foo1() -> Foo {
let mut foo = Foo {
// this is a special type in std::marker. The compiler automatically coerces any type
// that contains it to the correct uninitialized variant.
a: Uninit::new(),
b: "i am already initialized".to_owned(),
};
println!("{}", foo.b); // printing initialized field is ok
foo.a = 10; // assigning to uninit field is ok, this does not drop foo.a
foo
}
// naming the partially initialized type is done using a type position macro containing the list of
// uninitialized fields, only fields accesible to this module can be mentioned here.
fn make_partial_foo1() -> partial_uninit!((b) Foo) {
Foo {
a: 0,
b: Uninit::new(),
}
}
// then another function could take a partially initialized Foo and fully initialize it
fn init_foo1(mut foo: partial_uninit!((b) Foo)) -> Foo {
foo.b = "hello world".to_owned();
foo
}
// multiple uninitialized fields
fn init_foo2(mut foo: partial_uninit!((a, b) Foo)) -> Foo {
foo.a = 42;
foo
}
// initializing through a pointer requries additional information about what you are initializing
// thus transforming the pointer (in this case to &mut Foo)
fn init_foo3(foo: &mut partial_uninit!((a) init(a) Foo)) {
foo.a = 4242;
// forgetting this assignment (or having at least one path where it is not initialized) would
// produce an error along the lines of "foo.a is not initialized, foo.a needs to be initialized
// in this function due to this `init(a)`"
}
// the type variant is also suitable for use with generics
fn init_foos(foos: Vec<partial_uninit!((b) Foo)>) -> Vec<Foo> {
foos.into_iter().map(init_foo1).collect()
}
// also calling generic functions with it is safe
fn create_boxed_foo() -> Box<partial_uninit!((b) Foo)> {
Box::new(Foo {
a: 0,
b: Uninit::new(),
})
}
// even initializing through Arc<Mutex<_>> is fine
fn init_foo4(foo: Arc<Mutex<partial_uninit!((a, b) init(a, b) Foo)>>) -> Arc<Mutex<Foo>> {
foo.lock().unwrap().a = 0;
foo.lock().unwrap().b = "bar".to_owned();
foo
}
// you can also write explicit impl blocks for the type variant:
impl partial_uninit!((a) Foo) {
// this self type also is special, you are allowed to perform a
// pointer transformation!
fn init(self: &mut partial_uninit!((a) init(a) Foo)) {
self.a = 0;
}
}
// even implementing traits is allowed!
impl Debug for partial_uninit!((a) Foo) {
fn fmt(&self, f: &mut Formatter) -> Result {
write!(f, "Foo {{ a: <uninit>, b: {} }}", self.b)
}
}
The problem that this function could create
fn bad() {
let foo: Arc<Mutex<partial_uninit!((a, b) Foo)>> = Arc::new(Mutex::new(Foo {
a: Uninit::new(),
b: Uninit::new(),
}));
let foo2: Arc<Mutex<partial_uninit!((a, b) Foo)>> = Arc::clone(&foo);
let foo = init_foo4(foo);
// now this is bad, we have references to the same
let _: Arc<Mutex<Foo>> = foo;
let _: Arc<Mutex<partial_uninit!((a, b) Foo)>> = foo2;
}
Is mitigated by the introduction of the FullyInit
marker trait.
While it is too strict to ask for T: FullyInit
to allow Arc<T>: Clone
, this requirement may be softened later without breaking compatibility. You still need to allow the creation of Arc<!FullyInit>
, because you may want to pin something using a reference count and then set its fields, kinda like this:
/// Binding to some C struct:
#[repr(C)]
pub struct DoubleListNode {
next: *const (),
prev: *const (),
}
#[repr(C)]
pub struct SomethingInAList {
node: DoubleListNode,
data: u64,
}
impl SomethingInAList {
pub fn new(data: u64) -> partial_uninit!((node) SomethingInAList) {
SomethingInAList {
node: Uninit::new(),
data,
}
}
}
impl partial_uninit!((node) SomethingInAList) {
pub fn init(self: Pin<&mut partial_uninit!((node) init(node) SomethingInAList)>) {
unsafe {
// SAFETY: We are pinned, which means node is also pinned.
let mut node = Pin::map_unchecked_mut(self, |s| &mut s.node);
node.next = &mut *node as *const DoubleListNode as *const ();
node.prev = &mut *node as *const DoubleListNode as *const ();
}
}
}
fn main() {
let sial = Arc::pin(Mutex::new(SomethingInAList::new(42)));
unsafe {
// SAFETY: sial is pinned in the Arc and we never misuse the Mutex (the mutex api is very
// bad for this purpose, but imagine if it would play nicely here [we would not need this
// unsafe])
Pin::new_unchecked(&mut *sial.lock().unwrap()).init();
}
let _: Pin<Arc<Mutex<SomethingInAList>>> = sial;
}
Benefits of this approach:
- requires no new syntax.
- makes simple partial initialization safe and easy.
- gives more complex tools to those who have more complicated paths for uninitialized data.
- having a compiler supported way of partial initialization avoids having to write unsafe constructor functions with the invariant, that another init function needs to be called before the use of the returned type.
Problems to tackle:
- how does the compiler do its builtin magic? I do not know if it even is possible to achieve the described type coercion of pointers mid-function. And what parts of the compiler would need changing.
- would it be beneficial/easy to try to generalize the required properties of this feature to provide support for even more compiler based checking? (for example tagging raw pointers with a designated cleanup strategy, so C FFI writers would be able to know which pointers need to be freed and which would need
Box::from_raw
etc.) in the context of this issue i think speed of implementation matters more than immediate generalization, so separating those two would be a good idea, if the general approach is needed and feasible. - better syntax for
partial_uninit!
- improving names
- is
FullyInit
a sound concept? because it is rather arbitrary that types likeRc
andArc
need to handle these, because only in combination withRefCell
/Mutex
unsoundness arises. -
FullyInit
makesArc
's too restrictive, because you cannot store anArc<!FullyInit>
even if you cannot get a&mut
inner reference (e.g. your application stores a partially initialized struct in some state accessible by multiple threads for some time of your program [because they need to read it], at some point you take control of all theArc
's, drop all but one and then callinto_inner
after which you fully initialize your struct and use it. with the current implementation this would result in a compile Error even though it would actually be sound[2])
I did not mark this as a pre-RFC because i have never even written a pre-RFC and thus know very little about it. If no big issues arise and most of the problems can be formalized, then i would gladly try to create this as my first RFC.
-
The documentation on these elements is nowhere near sufficient/expressive, i added it to explain each of the new items in stdlib. ↩︎
-
you could fix this using an additional trait implemented for
Mutex<T>
/RefCell<T>
(probablyUnsafeCell<T>
)where T: !FullyInit
and then disallow clone for that trait, but this seems like a workaround with a high chance of unsoundness, so i think having a tighter restriction at the beginning is safer. ↩︎