Pre-RFC: Allow passing uninitialized values to functions


#1

There is much code which needs to pass a pointer to a ffi function which initializes it. For example:

enum Foo{}
extern {
    fn init_Foo(foo: *mut Foo);
}
let foo: Foo = unsafe { mem::uninitialized() };
unsafe { init_Foo(&mut foo as *mut Foo); }

If rust allowed passing uninitialized values to functions it could become:

enum Foo{}
extern {
    fn init_Foo(inits foo: *mut Foo);
}
let foo: Foo;
init_Foo(&mut foo as *mut Foo);

This has the pro that you can not accidentialy use a uninitialized var.

Detailed design

A function which has inits before the binding name takes a uninitialized mut ref or mut raw ptr. If you pass a initialized value to such function it will be dropped. Functions must initialize all bindings with inits. inits bindings are only allowed in function signatures.

Drawbacks

More complex language

Unresolved questions

  • Keep the name inits or chose a other name.
  • Maybe extend to allow partially uninitialized values.

#2

There was RFC #98 “Uninitialized Pointers” which was postponed due to Rust 1.0 (tracking issue: #268).


#3

This is different from the &uninit proposal in at least one significant way: It doesn’t introduce a new kind of pointer. Having yet another pointer type was one of the concerns on RFC PR #98. But this difference also has downsides:

  • The Pre-RFC includes inits arguments of type &mut T, but such an argument would not really be a value of type &mut T. It is in fact entirely incompatible with &mut T since writing through &mut T assumes there is an existing value and drops that, while an inits pointer doesn’t drop existing contents (but only on the first write ಠ_ಠ). The following points assume we get rid of this and restrict inits to *mut T.
  • Instead of a new pointer type, it introduces a “tag” in the signature (which must necessarily be part of the function type, b/c you can do different things with it) that doesn’t come from any of the argument or return types. Not unprecedented (unsafe fn) but in theory creates a similar combinatorial explosion problem as a new pointer type, and feels weirder.
  • Taking a &mut to an uninitialized variable becomes legal, but the resulting pseudo-&mut T must only be used in certain ways (at the very least coercions to *mut T must be allowed). Speccing what is allowed and what not sounds like “fun”.

#4

Changing the behavior of the language depending on the name of the function you call is (hopefully) never going to be accepted.


#5

In D language there’s a similar feature, the “out” arguments, that are initialized when you enter a function, and they are references:

void foo(out uint x) {}
void bar(out uint x) {
    x = 5;
}
void main() {
    import std.stdio;
    uint y = 10;
    foo(y);
    writeln(y); // Prints: 0
    bar(y);
    writeln(y); // Prints: 5
}

But compared to this D feature, your proposal is better, more elegant and avoids some troubles.


#6

This proposal is for syntax at the binding site, not part of the name of the function.


#7

Pointers (like all types) can be nested and this awkwardly doesn’t support that. At the very least that means closures won’t work since its a tuple under the hood.


#8

We badly need some way to talk about initialized-ness at the type level. I think we could do this by splitting &mut pointers into 4 different types of pointers: &mut, &out, &in and &uninit. &mut pointers are initialized at both the start and the end of the borrow. &out pointers are initialized at the start of the borrow but the data has to be moved out before the end of the borrow. &in pointers are uninitialized at the start of the borrow but have to have data moved in before the end of the borrow. &uninit pointers are uninitialized at both the start and end of the borrow.

This would allow you to write code like this:

let x: u32;
write_to_u32(&in x);
// x is now initialized  

The problem is, this requires linear types so that we can enforce things like “this pointer must be written to before the end of the borrow”. And in order to enforce linearity we’ll also need some form of unwinding-awareness built in to the language.


#9
  1. If I understand correctly, &out sounds similar &move (RFC tracking issue #998 — postponed #965 “&own”, postponed #1617 “&move”, closed #1180 “Interior<T>, open #1646 “DerefMove”)?

  2. Aside from completeness, is there any reason for an &uninit pointer according to this scheme?

(Minor nit: your naming &in and &out are opposite to previous languages having similar feature like C# and D (out = uninit → init). That is going to be confusing)

Why? MIR already knows where to insert drop, it can also be used for enforce linearity.


#10

If I understand correctly, &out sounds similar &move (RFC tracking issue #998 — postponed #965&own”, postponed #1617&move”, closed #1180Interior<T>”, open #1646DerefMove”)?

Yeah, I’m definitely not the first person to make suggestions along these lines, I was just too lazy to dig out all the old RFCs and see which ones most closely match the way I’ve been thinking about it. Having read those links now, &own and &move look exactly the same as my &out, although &in and &out have also been suggested before.

Aside from completeness, is there any reason for an &uninit pointer according to this scheme?

Someone may want a region of memory that they could temporarily keep something in so long as it’s gone before the memory is returned. This would be useful if you wanted to have a safe memory allocation API. I’m thinking that you could you move something into an &uninit pointer then re-borrow it as &out to toy with the data in-place (or re-borrow it as a &mut but then you’d still be obliged to take the data out again after dropping the &mut).

(Minor nit: your naming &in and &out are opposite to previous languages having similar feature like C# and D (out = uninit → init). That is going to be confusing)

True, what would better names be? &take and &place?

Why? MIR already knows where to insert drop, it can also be used for enforce linearity.

Suppose we have these two functions:

fn bar(x: &in MyType) { ... }

fn foo() {
    let x: MyType;
    bar(&in x);
}

Now suppose bar panics. foo needs to know whether there’s data in x or not. Maybe in this situation we could say that a function which takes an &out/&in/&uninit always returns an empty pointer (dropping the data if need be).


#11

Aside from completeness, is there any reason for an &uninit pointer according to this scheme?

Actually, thinking about this a bit more, I think &uninit would often be more useful than &in. Consider a read function that doesn’t require passing it initialized data to be written-over. We can make it take either an &unint or an &in.

fn read_uninit(buffer: &uninit [u8]) -> io::Result<(&out [u8], &uninit [u8])>;
fn read_in(buffer: &in [u8]) -> io::Result<(&mut [u8], &in [u8])>;

It has to return the written and the still-uninitialized data as two seperate slices. read_uninit we could use like this:

let foo() -> io::Result<()> {
    let buffer: [u8; 1024];
    let (x, _) = read_uninit(&uninit buffer[..])?;
    println!("got {:x}", x);
    drop(*x);
    Ok(())
}

Presumably we wouldn’t even need the drop(*x) because the compiler could automatically add drops for any &out pointers that haven’t been moved out of yet at the end of a function. The read_in version would be harder to use though.

fn foo() -> io::Result<()> {
    let buffer: [u8; 1024];
    let (x, y) = read_in(&in buffer[..])?;
    println!("got {:x}", x);

    // What do we do here?

    Ok(())
}

In this version we’re left with a &mut pointer that we can’t move out of and an &in pointer that we’d obliged to write something to before we can drop it. So we end up having to initialize the rest of the buffer anyway before the end of the function so that it can be clearly in either an entirely-initialized or entirely-uninitialized state.


#12

Or swap in and out. I’m against using &out for purpose incompatible with the notion of “output parameter”.

Since we are not reading y, the compiler should allow us to not initialize it? That is, it is fine to keep a &place pointer uninitialized before the first “read” operation, and MIR should not insert “drop” calls if there is no “write” operations. This feels like reintroducing type-state though.

(BTW this example means we need five split_at methodssplit_at, split_at_mut, split_at_in, split_at_out, split_at_uninit. That does not look good unless we have generic mutability (RFC #976, closed).)


#13

Since we are not reading y, the compiler should allow us to not initialize it? That is, it is fine to keep a &place pointer uninitialized before the first “read” operation, and MIR should not insert “drop” calls if there is no “write” operations. This feels like reintroducing type-state though.

The problem is if you drop x and y then what state is buffer left it? It’s neither initialized or uninitialized. This is why you have to write to the &in pointer.

In general, if the compiler finds an unfulfilled obligation on an &out pointer it can always just fulfill the obligation itself by adding a drop. But it doesn’t work the other way around for an &in pointer because the compiler needs something to write to it.


#14

@Ericson2314’s unduly neglected Stateful MIR proposal seems highly relevant here.

Personally (this is the naming scheme I was advocating since before 1.0):

  • &move for owned references you can move out of. Intuitively: & (a read-only reference) -> &mut (a reference you can mutate) -> &move (a reference you can move out of), in order of increasing capability.

  • &out for references that refer to an uninitialized value and need to be initialized, by analogy with precisely the “out parameters” of C# and D referenced above.

But I think mahkoh at some point also used the same in/out naming as canndrew (or did they have &own?), and IIRC gereeter used &uninit for what I’m referring to as &out, and canndrew as &in, so more or less every possible name has been used for every possible concept by this point, which is presumably our penance for having tried to build the tower of Babel.

Coming back down to Earth, I don’t think e.g. &out would be backwards compatible: I’ve literally seen code with &out in it, being a shared reference to a variable named out. Presumably the same thing hypothetically applies to &uninit as well.


#15

so more or less every possible name has been used for every possible concept by this point

Scrap all of them and invent three completely new words? How about &zep, &fim and &gob? It doesn’t matter which one is which. I’m only being semi-sarcastic.

Coming back down to Earth, I don’t think e.g. &out would be backwards compatible: I’ve literally seen code with &out in it, being a shared reference to a variable named out. Presumably the same thing hypothetically applies to &uninit as well.

Ah this ol’ chestnut. Looks like another case for contextual keywords - if &out is followed by another identifier then it’s taking an &out reference to a variable (as in &out foo), otherwise it’s taking an immutable reference to variable named out.


#16

The read thing is possible to do in a crate: buffer.