[Pre-RFC] Partially Initialized Types

there’s been a lot of talk about &uninit references, MaybeUninit, placement-new…

I’d like to propose an alternative to all of them: Partially Initialized Types. They’re, arguably, the cleanest approach to all of this.

Consider types:

struct Foo { a: i32, b: i32 }
struct Bar { a: i32, b: Foo }

Then, we can have:

let x: Bar(b(a)) = Bar { b: Foo { a: 1 } };

However, passing these to functions is mostly pointless:

fn foo(x: &mut Bar()) {
    // can't do anything with x here
}
foo(x); // but can still call it

So we need a way to specify type/state mutations:

fn bar(x: &mut Bar() -> Bar(a)) {
  x.a = 2;
}
bar(x);

assert_eq!(x.a, 2);
assert_eq!(x.b.a, 1);

What you write and how the compiler looks at it are slightly different: if we accept - to mean uninitialized, ~ to mean ignored, and + to mean initialized, we generally have:

Foo() = Foo(~a, ~b)
Foo(a) = Foo(+a, ~b)
fn foo(x: Foo() -> Foo(b)) = fn foo(x: Foo(~a, -b) -> Foo(~0, +1))
fn foo(x: Foo(b) -> Foo()) = fn foo(x: Foo(~a, +b) -> Foo(~a, -b))

With ~ being taken as - in object position (non-references, basically - return types, let bindings, etc). So:

let x: Foo(a) = ...;
let y: Foo() = x; // deinitializes x.a, without dropping x.

You can then have, with specialization:

fn place_back() -> Place<T()> {
    // allocate stuff
}
impl Drop for Place<T()> {
    fn drop(&mut self) {
        self.vec.deallocate();
    }
}
impl Drop for Place<T(..)> {
    fn drop(&mut self) {
        // it's initialized, don't do anything
    }
}

Finally, this is the discussion that led to this: https://github.com/rust-lang/rfcs/pull/2534

Somewhat related, I’ve been looking at a collection of issues involving possible UB from partially-uninitialized arrays in arrayvec, smallvec, and BTreeMap.

Unfortunately, without a complex dependent type system, it seems difficult to encode these particular partially-initialized values as types…

1 Like

Rust had structural records once. If they came back and played nicely with FRUs then they’d serve this purpose for structs. Annotating the missing fields sounds interesting twist.

What are FRUs?

For example, this program prints "true" if optimizations are enabled, when built with current rustc:

fn main() {
  let x: Option<[&u8; 1]> = unsafe { Some(std::mem::uninitialized()) };
  println!("{}", x.is_none());
}

What?!? I understand that std::mem::uninitialized() is UB, but it seems wrong to me that Some(UB).is_none() can ever be true. I, for one, would consider this a compiler bug.

“Functional record update” is the jargon for the ..foo syntax you can use at the end of a struct expression:

1 Like

This intuition is unfortunately incompatible with a layout "optimization" we really want and need to do. Option<&T> should be, and in fact is guaranteed to be, the same size as a pointer with None as null. Then there's no way around Some(mem::zeroed::<&T>()) producing the exact same bytes in memory as None, so there's your first instance of Some(UB).is_none() == true. Uninitialized memory is trickier to reason about but even if we ignore all the undefs and poisons and the reasons for introducing them and naively treat uninitialized memory as "memory initialized to a non-deterministic bit string", obviously that bit string can happen to be zero, so the possibility of Some(mem::uninitialized()).is_none() is inevitable.

I would also reject any demand of the form " should at least be well-behaved in regards to X" on principle but with this example it's especially undisputable we cannot have our cake and eat it too.

PS: The above is not really how LLVM ends up producing the machine code it does, but I want to avoid digging up and religitating those specifics – "Some(UB).is_none() should never be true" is already impossible due to (guaranteed!) layout decisions rustc does, and that's that.

3 Likes

UB isn't contained. There isn't UB that's "not that bad so it's ok". All UB can cause anything to happen. (If there were different kinds, we'd want different keywords for them to keep them separate.)

I really appreciate that rust can done a good job of drawing the careful line of "I know you need to be extra careful using this, but misusing it isn't UB, so it's not marked unsafe".

3 Likes

What are structural records?

I believe this refers to the distinction between “structural typing” (if two types have the same structure, they’re the same type, full stop) and “nominal typing” (two types with the same structure are nonetheless different types as long as they have different names). Where exactly you draw the line between the two often devolves into semantics, but it’d be reasonable to say that Rust is mostly nominally typed, with a few exceptions like tuples and arrays that are structurally typed. And “records” in some functional languages are basically the same thing as structs, so “structural records” is probably equivalent to what any non-functional programmer would call “anonymous structs”.

Oh. This is absolutely not anonymous structs.

Also there are no conflicts with ..foo, as that goes on the value, not on the type.

Ok, thanks!

Another problem I have with this method is that it makes changing private field names a breaking change, and it leaks internal information such as private fields. For example,

struct Foo { a: i32, pub b: &'static str }

impl Foo {
    fn init_a(foo: &mut Foo() -> (a)) {
        Foo.a = unknown_number_generator();
    }

    fn get_a(&self(a)) -> i32 {
        self.a
    }
}

// in a different crate

// use Foo

fn main() {
    let foo = Foo {};
    Foo::init_a(&mut foo);
    // what is the type of foo here?
    // and what would we be able to do
    // with foo
    handle_foo(&mut foo);

    // continue on
}

fn handle_foo(foo: &mut Foo(a) -> Foo(a, b)) {
    if foo.get_a() < 0 {
        foo.b = "a is negative";
    } else {
        foo.b = "a is positive";
    }
}

How would we be able to write something like handle_foo without leaking private types? This doesn’t seem like too much to ask.

One way would be to split the function like this:

fn handle_foo(foo_a: i32, foo: &mut Foo() -> Foo(b)) {
    if a < 0 {
        foo.b = "a is negative";
    } else {
        foo.b = "a is positive";
    }
}

and pass foo.a to the function, but then we have no guarentees that foo_a is the same as foo.a. It’s too trusting of the users of this function.

Outside code cannot name private fields, and instead has to rely on the compiler to know stuff.

It gets exposed as _ and may only be used with functions that access those (because it’d be inconstructible otherwise).

e.g. Foo(_, _) (not to be confused with how rustdoc displays private fields) could mean any 2 private fields, and only the compiler can tell which is which.

Without those functions, external code can only rely on any combination of public fields, or Foo(..) (fully initialized). e.g. Foo() Foo(bar) Foo(baz) Foo(bar, baz) Foo(..) if Foo has public fields “bar” and “baz” and any number of private fields.

Alternatively, for simplicity, we could do an “private type in public API” hack, as it’s easier - using private fields in PITs in public APIs would lead to a compilation error.

(IIRC tuple structs have a similar issue to them? rustdoc definitely leaks them)

A little idea: we could have “constructor fns” of the form:

struct Foo {
}
impl Foo {
    fn foo(&mut self() -> Self) {
    }
}

This could be invoked either as Foo::foo(), such that things like Vec::new can be converted to this form without affecting existing code, or on an uninitialized variable/uninitialized memory, so that things like placement new just works.

How is foo implemented if you have fields on Foo, some of which need arguments to initialize?

You just set them…?

How would you call the function? Sorry if this sounds stupid, I just want to see details.

If you have something like

let foo = Foo::foo();

the compiler translates this to

let foo: Foo();
foo.foo();

i.e. all struct constructors would implicitly and necessarily use in-place construction.

As for fields on Foo, if you have something like:

struct Foo {
    bar: u16,
}
impl Foo {
    fn foo(&mut self() -> Self, bar: i32) {
        self.bar = bar;
    }
}

You’d be forced to have that “self.bar = bar” (or self.bar = something_else) line in the constructor.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.