Pre-RFC: Partial Initialization and Write Pointers

This is a large proposal, so I would like to run it by here before posting this RFC.


  • Feature Name: parital_initialization_and_write_pointers

  • Start Date: 2018-08-26

  • RFC PR: (leave this empty)

  • Rust Issue: (leave this empty)

Summary

This RFC aims to allow direct initialization for optimization, and partial struct and enum initialization, for ergonomics. It will do so through the usage of a two new reference type, &out T and &uninit T (the name is not important to me).

Motivation

The builder pattern was created as a way to try and solve the issue of not having partial initialization, but it has problems with large structs, and that the *Builder struct must necessarily be larger than the target struct, null-ptr optimizations not-withstanding. Also, it is very expensive to move large structs, and relying on the optimizer to optimize out moves isn’t very good, &out T could serve as a way to directly place things into the desired memory location and to maintain write-only memory. &uninit T will serve the purpose of partial initialization and direct initialization.

Guide-level explanation

&out T is a write-only pointer to T, where T: !Drop. The bound is necessary as it is not safe to overwrite Drop types without invoking the destructor. Does not run destructors on write.
&uninit T is a write-once pointer to T, after the first write, it is allowed to read the values (behaves exactly like &mut T). Does not run destructors on the first write. \

For all examples, I will use these two structs

struct Foo { a: u32, b: String, c: Bar }
#[derive(Clone, Copy)]
struct Bar { d: u32, e: f32 }

impl std::ops::Drop for Foo {
    fn drop(&mut T) {
        println!("Dropping Foo {}", foo.b);
    }
}

&uninit T

Using &uninit T, we can do partial initialization and directly initialize.

let x: Foo;

*(&uninit x.a) = 12;
*(&uninit x.b) = "Hello World".to_string();
*(&uninit x.c.d) = 11;
*(&uninit x.c.e) = 10.0;

This works because when we take an &uninit to x.a, we are implicity also taking an &uninit to x, and the dot operator will not attempt to read the memory location anywhere in x.

For ease of use, you can simply write

let x: Foo;

x.a = 12;
x.b = "Hello World".to_string();
x.c.d = 11;
x.c.e = 10.0;

and the compiler will infer that all of these need to use &uninit, because x was not initialized directly.

Restrictions

Storing

You cannot store &uninit T in any way, not in structs, enums, unions, or behind any references. So all of these are invalid.

fn maybe_init(maybe: Option<&uninit T>) { ... }
fn init(ref_to_write: &mut &uninit T) { ... }
struct Temp { a: &uninit Foo }

Conditional Initialization

One restriction to &uninit T is that we cannot conditionally initialize a value. For example, none of these are allowed.

let x: Foo;
let condition = ...;

if condition {
    x.a = 12; // Error: Conditional partial initialization is not allowed
}
let x: Foo;
let condition = ...;

while condition {
    x.a = 12; // Error: Conditional partial initialization is not allowed
}
let x: Foo;

for ... {
    x.a = 12; // Error: Conditional partial initialization is not allowed
}

Because if we do, then we can’t gaurentee that the value is in fact initialized.

Note, that this is not conditionally initializing x.e, because by the end of the if-else block, x.e is guaranteed to be initialized.

let x: Bar;

x.d = 10;

if { ... any condition ... } {
    x.e = 1.0;
} else {
    x.e = 0.0;
}

Using partially initialized variables

let x: Bar;
x.d = 2;

// This is fine, we know that x.d is initialized
x.d.pow(4);
if x.d == 16 {
    x.e = 10.0;
} else {
    x.e = 0.0;
}
// This is fine, we know that x is initialized
assert_eq!(x.e, 10.0);

Functions and closures

You can accept &uninit T as arguments to a function or closure.

fn init_foo(foo: &uninit Foo) { ... }
let init_bar = |bar: &uninit Bar| { ... }

But if you do accept a &uninit T argument, you must write to it before returning from the function or closure.

fn valid_init_bar_v1(bar: &uninit Bar) {
    bar.d = 10;
    bar.e = 2.7182818;
}
fn valid_init_bar_v2(bar: &uninit Bar) {
    // you must dereference if you write directly to a &uninit T
    // This still does not drop the old value of bar
    *bar = Bar { d: 10, e: 2.7182818 };
}
fn invalid_init_bar_v1(bar: &uninit Bar) {
    bar.d = 10;
    // Error, bar is not completely initialized (Bar.e is not initialized)
}

fn invalid_init_bar_v2(bar: &uninit Bar) {
    bar.d = 10;
    if bar.d == 9 {
        return; // Error, bar is not completely initialized (Bar.e is not initialized)
    }
    bar.e = 10.0;
}

If a closure captures a &uninit T, then it becomes a FnOnce, because of the write semantics, the destructors will not be run the first time.

let x: Foo;

let init = || x.a = 12; // init: FnOnce()  -> ()

Note on Panicky Functions: If a function panics, then all fields initialized in that function will be dropped. No cross-function analysis will be done.

&out T

Using &out T, we can directly initialize a value and guarantee to write only behavior.
That would add give a memory location to write to directly instead of relying on move-elimination optimizations.

use super::os::OsFrameBuffer;

#[derive(Copy)]
struct Rgb(pub u8, pub u8, pub u8);
/// This abstraction that exposes a Frame Buffer allocated by the OS, and is unsafe to read from
struct FrameBuffer( OsFrameBuffer );

impl FrameBuffer {
    fn new(&uninit self) {
        self.0.init() // initialize frame buffer
    }

    fn write_to_pixel(&mut self, row: usize, col: usize) -> &out Rgb {
         self.0.get_out(row, col)
    }
}

This could be used like this

let buffer;
FrameBuffer::new(&uninit buffer);

*buffer.write(0, 0) = Rgb(50, 50, 255);
*buffer.write(10, 20) = Rgb(0, 250, 25);
/// ...

Constructors and Direct Initialization

Using &uninit we can create constructors for Rust!

struct Rgb(u8, u8 ,u8);

impl Rgb {
    fn init(&uninit self, r: u8, g: u8, b: u8) {
        self.0 = r;
        self.1 = g;
        self.2 = b;
    }
}

let color: Rgb;
color.init(20, 23, 255);

and we can do direct initialization

impl<T> Vec<T> {
    pub fn emplace_back(&mut self) -> &uninit T {
        ... // magic to allocate space and create pointer
    }
}

and maintain write-only buffers

struct WriteOnly([u8; 1024]);

impl WriteOnly {
    pub fn write(&out self, byte: u8, location: usize) {
        self.0[location] = byte; // currently not possible to index like this, but we could imagine a IndexOut, that will handle this case
    }
}

Reference-level explanation

NOTE This RFC does NOT aim to create new raw pointer types, so no *out T or *uninit T. There is no point in creating these.

Rules of &uninit T

&uninit T should follow some rules in so that is is easy to reason about &uninit T locally and maintain soundness

  • &uninit T follows the same rules as &mut T for the borrow checker
  • &uninit T can only be assigned to once
    • After being written to &uninit T are promoted to a &mut T
  • Writing does not drop old value.
    • Otherwise, it would not handle writing to uninitialized memory
    • More importantly, dropping requires at least one read, which is not possible with a write-only pointer
  • You cannot reference partially initialized memory
let x: Bar;

fn init_bar(bar: &uninit Bar) { ... }
fn init_u32(x: &uninit u32) { ... }

x.e = 10.0;

// init_bar(&uninit x); // compile time error: attempting to reference partially initialized memory
init_u32(&uninit x.d); // fine, x.d is completely uninitialized.
  • Functions and closures that take a &uninit T argument must initialize it before returning
    • You cannot return an &uninit T
  • You can take a &uninit T on any T that represents uninitialized memory, for example: only the first is ok.
let x: Foo;
let y = &uninit x;
let x: Foo = Foo { a: 12, b: "Hello World".to_string() };
init(a: &uninit Foo) { ... }
init(&uninit x); // this function will overwrite, but not drop to the old value of x, so this is a compile-time error

Rules of &out T

&out T should follow some rules in so that is is easy to reason about &out T locally and maintain soundness

  • &out T follows the same rules as &mut T for the borrow checker
  • Writing does not drop old value.
    • Dropping requires at least one read, which is not possible with a write-only pointer
  • You can take a &out T on any T: Copy
    • because destructors are never run on write, T: Copy is necessary to guarantee no custom destructors. This bound can be changed once negative trait bounds land, then we can have T: !Drop. Changing from T: Copy to T: !Drop will be backwards compatible, so we can move forward with just a T: Copy bound for now.

Coercion Rules

&T - (none) // no change
&mut T - &T, &out T if, T: Copy
&out T - (none) // for similar reasons to why &T does not coerce
&uninit T - &out T if T: Copy and &T or &mut T once initialized.

self

We will add &uninit self and &out self as sugar for self: &uninit Self and self: &out Self respectively. This is for consistency with &self, and &mut self

Panicky functions in detail

Because we can pass &uninit T and &out T to functions, we must consider what happens if a function panics. For example:

fn init_foo_can_panic(foo: &uninit Foo) {
    foo.b = "Hello World".to_string();
    foo.a = 12;
    
    if foo.a == 12 {
        // When we panic here, we should drop all values that are initialized in the function.
        // Nothing could have been initialized before the function because we have a &uninit T
        panic!("Oh no, something went wrong!");
    } 

    foo.c = Bar { d = 10, e = 12.0 };
}

fn out_bar_panics(foo: &out Bar) {
    // When we panic here we drop here we don't ever drop any value behind a &out because &out can never have a destructor, it doesn't matter
    panic!("Oh no, something went wrong!");
}

let x: Foo;

init_foo_can_panic(&uninit x);

let x: Bar;

out_bar_panics(&out x); // when we take a &out, we are asserting that the old value doesn't need to drop, and doesn't matter. This is fine because Bar is Copy and does not have a destructor.

Drawbacks

This is a significant change to the language and introduces a lot of complexity. Partial initialization can be solved entirely through the type-system as shown here. But this does have its problems, such as requiring an unstable feature (untagged_unions) or increased size of the uninitialized value (using enums).

Rationale and alternatives

T: !Drop for &out T

Once negative trait bounds become stable, the bounds for &out T will change to T: !Drop

Allow Drop types to be partially initialized

Then they would only be dropped if all of their fields are initialized

Placement-new

Placement new would help, with initializing large structs.

As sugar

This could be implemented as sugar, where all fields of structs that are partially initialized are turned into temp-variables that are then passed through the normal pipeline.

For example

let x: Bar;
x.d = 10;
x.e = 12.0;

would desugar to

let x: Bar;
let xd = 10;
let xe = 12.0;
x = Bar { d: xd, e: xe };

But this would not be able to replace placement new as it can’t handle &uninit T through function boundaries. Also this would not solve the problem of direct-initialization.

Prior art

Out pointers in C++, (not exactly the same, but similar idea)

&out T in C#

Unresolved questions

N/A (I don’t have any right now)


edit: Added Panicky Function sub-section due to @rkruppe’s insights

added &out T by C# to prior arts and alternative syntax due to @earthengine’s suggestion

removed lots of unnecessary spaces and newlines

edit 2:

Incorporating @gbutler’s proposal of splitting &uninit T into &out T and &uninit T

edit 3:

Used @gbutler’s example of FrameBuffer that interfaces hardware for &out T

1 Like

Aside from partial initialisation, work has been done to guarantee return value optimisation in Rust (#47954).
I don't know its current status.

1 Like

This seemingly needs to include not panicking (if not, please explain how the problems caused by returning without initializing don't apply to unwinding), but we have no panic analysis in the language, so it seems hard-to-infeasible to enforce in the compiler.

3 Likes

Oh, that reminds me, I haven’t thought of return values, I don’t think it would make sense to allow write pointers to be returned from a function. As they are supposed to be pointing to uninitialized memory, and they shouldn’t live long enough to be returned anyways. But I could see someone trying something like this.

struct Foo { a: i32, b: String } 
impl Foo {
    pub fn set_a(&mut self) -> &write i32 {
        &mut self.a as &write _
    }
}

I think this should be an anti-pattern. It should be written like:

impl Foo {
    pub fn set_a(&mut self, setter: impl FnOnce(&write i32)) {
        setter(&mut self.a as &write _)
    }
}

This way the pointer is contained, and we get guarantees that it will be set in the closure.

Thinking more on this, I also need to see how to handle panics.

Would it be possible to keep track of all the variables that are initialized before the panic, and only drop those values?

For local variables that’s exactly what happens, but when crossing function boundaries, I don’t see a way to avoid a “was initialized?” flag, and at that point it has the same runtime costs as &mut Option<T> (which is way more flexible).

2 Likes

Well, since it should not be possible to take a reference to a partially initialized value, you should not be able to pass those off to functions, for example, this is invalid.

struct Foo { a: u32, b: u32, c: String }
fn init(init_me: &write Foo) {
    init_me.a = 0;
    init_me.b = 0;
    init_me.c = "".to_string();
}

let foo: Foo;

init(&write foo);

let bar: Foo;

bar.a = 10;

init(&write bar); // Error: Taking a reference to a partially initialized value 'bar'

So inside the function, you can assume that the entire &write T is uninitialized, and must be initialized. So if anything panics, only the values initialized in the function need to be dropped.

1 Like

You would properly want to add a “piror art” section about C#'s “out” argument, as the rules are similar.

This also gives an alternative name to be &out T rather than &write T.

So, I thought about it a bit more, and we can allow taking a write pointer to a partially initialized value, but we cannot drop any of the old values when doing so, to do that we would need to drop the old values manually. Maybe using a function like this:

unsafe fn drop_partial<T>(value: &mut T) {
    use std::mem::{uninitialized, replace};
    drop(replace(value, uninitialized()));
}

and use it like this

struct Foo { a: u32, b: String }
fn some_fn(x: &write Foo) { ... }

let x: Foo;
x.b = "Hello World".to_string();

// we can take a mutable ptr here because x.b
// has been initialized in the current function
// for the same reason it is safe to drop, also we a writing to x.b through x
// in the very next line, due to the rules of &write T
// so the uninitialized value stored in x.b will be overwritten with something safe 
unsafe { drop_partial(&mut x.b); }
some_fn(&write x);

edit: forgot to put example of usage

let a: Foo; f(&write a.b);

?

I don’t understand what you’re asking.

It looks like this proposal attempts to create quasi-linear types for one particular case, which to me personally looks a bit ad-hoc.

I had in mind a less complex &out T, which will be essentially &mut T, except:

  • It will be a write-only reference, in other words you will not be able to dereference it, or get &T and &mut T in a safe code.
  • It will be possible to create &out T only for Copy types. Well, the right bound is !Drop, but until we get negative trait bounds, Copy bound will do. This is needed to prevent issues with leaking data (or dropping uninitialized data), as we will not be able to read data from &out T. (maybe this restriction can be relaxed in an unsafe code)
  • It will be safe to create &out T to an uninitialized memory.

No other restrictions will be applied to &out T, i.e. you will be able to store it in structs and do other usual stuff. This way we will get “read-only” (&T), “read-write” (&mut T) and “write-only” (&out T) references, so I think it will be quite natural addition, and of course you will be able to write &out self as well.

&mut T can be automatically coerced to &out T, in the same way as it does with &. Thus changing &mut T to &out T in function signatures will be a backwards compatible change.

The most notable use-case can look like this:

let reader: impl io::Read = ...;
let buffer = Uninit::<[u8; 128]>::uninitialized();
// note `get_out` is safe and returns `&out [u8; 128]`
// `read_exact` here takes `&out [u8]` (to which `&out [u8; 128]` can be converted)
reader.read_exact(buffer.get_out())?;
// the only unsafe line, which is safe, as we are sure that buffer is fully written
let data = unsafe { buffer.into_inner() };

Yes, this approach provides less guarantees than one proposed in the OP and relies on documentation to determine if reference was fully written or not. But on the other hand it is significantly less complex, easier to implement and explain. Plus it can be used in existing APIs without breaking changes.

2 Likes

This seems nice, but I would like to be able to use this feature without using unsafe code, also this doesn’t solve the problem of expensive moves, (when you call buffer.into_inner() you move a potentially large object around). Also, I would like to allow Drop types and a number of rules are there so that Drop types are safe, and not confusing.

This is a fairly advanced feature, and goes into the realm of micro-optimization, and is in general not needed. But in the cases that it is needed, it would be good to have Drop types, to be able to directly initialize into the allocated memory and to be completely unsafe free.

But I do want to note, we don’t have to be completely safe around Drop types, because Rust does not guarantee that destructors run, we even have a type ManuallyDrop, and a function forget that explicitly allows you to not run destructors. This would just be really confusing to new people, so everything related to drop types in this rationale can just be lints that are error-by-default.

Rationale for Rules

&write T can only be assigned to once
This rule prevents unintended writes and allows for Drop types to be used safely, without fear of memory leaks.

After being written to &write T are promoted to a &mut T
This is mostly just ergonomics, to allow free usage of the value we just created.

Writing does not drop old value.
This is necessary for handling uninitialized memory. Note the semantics are different from &mut T, so it would cause confusing bugs to allow this coercion if Drop types are allowed.

You can only reference partially initialized memory through a &write T
This is also to ensure that uninitialized memory does not get read from.

&write T follows the same rules as &mut T for the borrow This note is so we have a xor relation between many reads and one write, to ensure memory safety, for the same reasons as &mut T

If you access fields of T using a &write T

  • T: !Drop , this is only initially to ease of implementation (don’t have to worry about drops)
  • This restriction could be removed
    This is just in-case there was some unsoundness with &write pointers to Drop types

You cannot conditionally assign to a &write T
This removes the need for unsafe code.

Functions and closures that take a &write T argument must initialize before returning
This also ensures that we don’t need unsafe code, and prevents bugs where you take a &write reference, but you don’t need it.

&writeT cannot be coerced to and from another type
This would be confusing if drop types are allowed, otherwise, it is unnecessary

  • You can take a &write T on any T
    Just a note, I don’t know if I need to say it, but I will.

This is a big restriction! It almost destroy the main use cases of this feature: pass as function argument and expect the function to initialize it.

Why not: &write T or &out T can only be uninitialized until first write. So the compiler will ensure there is no "old value" to drop. And the following:

let v: &out u32;
...
v=10;
v=20;

desugar to

let v: &out u32;
...
v=10;
let v = v as &mut u32; 
v=20;

I'm not following this. Why would initializing a value behind a reference require dereferencing it or converting it to & or &mut?

I think @newpavlov’s suggestion is a lot nicer as it seems generally simpler and less ad-hoc.

We could allow safely coercing &out T safely to &mut T in the occasion &out T is successfully written to, but the mechanism to ensure that sounds hard to come by, especially if it requires reasoning about partiality. Similar to Pin, just providing an unsafe interface for now seems like good enough for getting started with the feature.

Ah, yes, neat. Could &out T also be used to represent write-only memory buffers (like for example that are needed for OpenGL types of things in some cases)? For example, there are some low-level video buffer API's where the buffer is write-only and if you attempt to read it, it causes a page-fault/segmentation-violation from the kernel. Could &out u8[] slices be used to represent that for example?

Yes, absolutely!

1 Like

Perhaps two different types of references would be most useful:

  • &out T is a write-only reference to a !drop type
  • &uninit T is a write-once, then readable reference to any type

&out T borrowing rules would be that you could have any number of &out T references simultaneously, but, no concurrent &mut T or &T references only have a single &out T reference in effect, the same as &mut T. &uninit T would have the same borrowing rules as &mut T and you could not have an &uninit T concurrent with an &mut T.

Once fully initialized you could coerce an &uninit T to an &mut T or an &T. A !drop &uninit T could be coerced at any time to an &out T. A drop &uninit T could not be coerced to a &out T. You could never coerce an &mut T to an &uninit T.

So, we would then have 4 reference types:

  • &T - that we all know and love (read-only, multiple-concurrent)
  • &mut T - that we all know and love (read/write, exclusive)
  • &out T - (write-only, multiple-concurrent, !drop only types)
  • &uninit T - (write-once, then read, exclusive, drop|!drop types)

Valid coercions would be:

  • &T (none)
  • &mut T -> &T or &out T
  • &out T -> &T or &mut T (no, this would be unsafe)
  • &uninit T -> &T or &mut T once fully initialized (until then, may not be coerced) or -> &out T if it is a !drop type (at any time)

Caveat: Allowing coercions from/to &mut T and &T for &out T kind of defeats the real benefit of &out T (being write-only memory). So, it would be nice to NOT allow that so the coercions (safe coercions that is) would be:

  • &T (none)
  • &mut T -> &T or &out T
  • &out T (none, once an out always an out)
  • &uninit T -> &T or &mut T once fully initialized (until then, may not be coerced) OR to &out T if it is a !drop type (at any time)

It seems like having both a new &out T and a new &uninit T would be useful for different orthogonal purposes. &out T would be especially useful for things like write-only video buffers or other type of specially mapped pages. &uninit T would be useful for in-place initialization without copy/move.

1 Like

if you have a struct with multiple fields, can you pass an write pointer to its subfields, for initialization? or do you need to pass an write pointer to the whole thing?