Hi, folks, I'm a PhD student interested in Rust type systems and operational semantics, and would like to do some research about self-referential types in Rust. Here are some preliminary ideas about how to get the self-referential types worked. (Since this is my first research project in my PhD years, if you find any errors in my thoughts, I would appreciate it if you could give some feedback or nice suggestions)
It is moved from Zulip.
Motivation
Historically, there were no ways to restrict that certain types cannot be safely moved, but since coroutines and generators were added to Rust, this could be a problem as self-referential types started to be more and more common after then.
Pin
was introduced in rfc#2349 to tackle this issue. It works fine yet to express a mutable reference to an immovable value (though there are some inconveniences due to lack of field projections when using Pin
). However, there is no such way to initialize an immovable type in place.
Inspired by a couple of macros in the Rust-for-Linux project, I think that initialization of immovable types is more like defining a closure that initialize a value in a certain place, instead of directly construct a value somewhere else and copy that to the expected address.
Guide-level explanation
The core goal of this proposal is to introduce a way to initialize an immovable type safely in Rust, define its operational semantics and prove its soundness, i.e. any safe code cannot cause undefined behaviours.
As a first step, I wouldn't like to address how self-referential types can be expressed safely yet.
The most straightforward idea of defining an initialization is a closure. The closure has several advantages over the direct construction of an immovable type:
- we don't need to care about the relocation problem, since values are just initialized in place;
- it is clear to use closures for indirect initialization, such as initializing a value on the heap;
- closures are reusable and easy to compose with other closures when initializing a nested struct with a field of immovable type;
Consider the following self-referential struct Foo
.
struct Foo {
a: u32,
b: *mut u32,
}
We can define a closure that initializes a place named foo
.
|foo| {
foo.a = 0_u32;
foo.b = &raw mut foo.a;
}
Then we have to:
- specify the type of
foo
, which is uninitialized at the beginning of the closure, and must be fully initialized at the end of the closure; - specify the operational semantics of this field-by-field initialization, especially those intermediate partially initialized states;
- specify how this closure could be used, and what kind of values could be passed to the parameter
foo
; - consider its extensibility, such as:
- How does it interact with panicking and unwinding?
- How to handle fallible initialization?
- Can it be extended to support
enum
s, tuples or arrays?
Reference-level explanation
The Output reference type
The type of the place argument of the initializer closure needs to be an output reference. Here I choose &'a out T
for the syntax of the output reference type. The basic rules of an output reference are:
- as the lifetime starts, i.e. when an output reference is created, the place with subplaces (places created by field projections, constant indexing or subslicing) in it is uninitialized;
- during the lifetime is alive, the place with subplaces in it is writable, and any subplace is readable only since it has been written a value;
- as the lifetime ends, the place should be fully initialized, or else a compile error should be emitted to report that.
The way to create an output reference is borrowing (the syntax is &out place
), similar to the ways of creating an immutable or mutable reference.
An output reference can be created by (output) borrowing from an uninitialized place (including the moved places), or by (output) borrowing unsafe
ly from a dereferenced raw pointer.
Initializer closure
An initializer closure is a closure of type impl FnOnce(&out T)
. Conseratively for now, we can assume that the output references are only allowed to be used in the initializer closures. They cannot be used in ADTs or as an output parameter of a function, which I think will make it much easier to define its behaviour. (We may define a special trait Init
for initializer closures that avoid the user writing &out T
)
Here is an example of using an initializer closure:
// an immovable self-referential type
struct Foo {
a: u32,
b: *mut u32,
}
// define an initializer closure
let init: impl FnOnce(&out Foo) = |foo| {
foo.a = 0_u32;
foo.b = &raw mut foo.a;
};
// calls an initializer closure on an uninitialized value
let foo: Foo;
init(&out foo);
// use the initialized value
println!("{}, {}", foo.a, unsafe { *foo.b });
Aliasing
The output borrows are exclusive. Similar to the mutable borrows, during an output borrow is alive, the place
cannot be borrowed in any way again (immutable, mutable, or output borrows are all disallowed).
Coercions
Once initialized, an output reference &out T
can be coerced to &mut T
or &T
.
Panicking and unwinding
Any panicking during initializer closures should leave the output reference to an uninitialized state (any initialized fields are dropped during the cleanup process), and inaccessible to the caller during unwinding.
Drawbacks
It introduces a fresh new reference type and borrow kind, which has a risk of increasing Rust's complexity.
In addition, it is hard to extend the initializer closure to fallible ones. Consider such a fallible initializer closure impl FnOnce(&out T) -> Result<(), E>
. The initialization state of the output reference after calling the closure depends on the discriminant of the returned Result
, but there are no simple ways to restrict that the user must inspect the discriminant value of the returned Result
. They can simply ignore the result if no extra rules are added to type checking.
Rationale and alternatives
Uninitialized reference type
rfc#2534 proposed an uninitialized reference &uninit T
to support partial initialization. However, the meaning of the uninitialized reference type is unclear, because it is expected to be initialized after a certain point in the program.
&mut MaybeUninit<T>
vs. &out T
The former type has the same abilities to write values into a given place, but there are still some disadvantages of using &mut MaybeUninit<T>
other than &out T
,
- it is not forced to be fully initialized at the end of the initializer closure;
- we cannot create an
&mut MaybeUninit<T>
from an uninitializedlet x: T;
, and we have to work on alet x = MaybeUninit::<T>::uninit()
instead, which requires an additional move to get a value of typeT
;
Prior arts
In C++, it is possible to construct a value in place through the the placement new expression. However, the address passed to the placement new is not guaranteed to be properly initialized after that.