"out" function arguments?


#1

This topic was discussed few times in past. Probably some of the notes written below are wrong or sub-optimal.

&T and T function arguments are “in”.

&mut function arguments are “inout”.

So “out” arguments are missing.

In a function a &out argument “x: &out u32” is like an unintialized reference variable:

fn foo(x: &out u32) {
    println!("{}", x); // Error, x is not initialized yet.
}

fn foo(data: &out [u32; 5]) {
    data = [0; 5]; // OK.
    // Optionally use data here.
}

trait Foo {
    fn bar(&out self); // OK.
}

But are “out” arguments common enough in Rust to justify this type system feature (and increased complexity)? Rust has tuple return types that avoid some of the usages of out arguments in other languages:

fn foo(x: u32) -> (u32, u64);


An example usage. Given a simple helper function:

#[inline]
fn fill_u8(data: &out [u8], value: u8) {
    unsafe {
        std::ptr::write_bytes(data.as_mut_ptr(), value, data.len());
    }
}

User code like this:

arr[0] = b'a';
fill_u8(&out arr, b'0');

Gives a warning like:

warning: value assigned toarr[0]is never read

It’s a situation like:

fn main() {
    let mut x;
    x = 1;
    x = 2;
    println!("{}", x);
}

Statically verifying that code like this is correct is not obvious:

fn foo(v: &out Vec<usize>) {
    for (i, el) in v.iter_mut().enumerate() {
        *el = i;
    }
}

fn foo(v: &out [usize]) {
    for i in 0 .. v.len() {
        *v[i] = i;
    }
}

Here v.len() is read before it’s written.


Even if you add new features to a type system there are usually other coding patterns left that the type system can’t model (well enough) yet. This a common Rust code pattern, to reduce the number of heap allocations buffers are given to a function that’s called many times. The contents of the buffers are ignored, just their capacity is important. It’s kind of opposite of “out”:

fn foo(buffer1: &mut Vec<u32>, buffer2: &mut HashMap<u32, u32>) {
    buffer1.clear(); // Implicit at entry?
    buffer2.clear(); // Implicit at entry?

    // ... Uses buffer1 and buffer2 here.
}

D language has “out” arguments too, its semantics is different from the one explained above:

import std.stdio;
void foo(out uint[] v2, out uint[3] a2) {
    writeln("B: ", v2, " ", a2);
}
void main() {
    uint[] v1 = [1, 2, 3]; // Heap-allocated.
    uint[3] a1 = [10, 20, 30];
    writeln("A: ", v1, " ", a1);
    foo(v1, a1);
    writeln("C: ", v1, " ", a1);
}

Outputs:

A: [1, 2, 3] [10, 20, 30]
B: [] [0, 0, 0]
C: [] [0, 0, 0]

So dynamic arrays get their lenght set to zero, and fixed-size arrays get zeroed out at the beginning of the foo() function. This costs more run-time work compared to the design above (that just tags an out reference variable as uninitialized and requires the programmer to initialize it) but it makes both the compiler and the semantics simpler.

The Ada design of “out” arguments instead is more similar to the Rust ideas shown above:

https://en.wikibooks.org/wiki/Ada_Programming/Subprograms


This is another more niche pattern not covered by Rust “&out”, given:

fn perm swap(&mut self, i: usize, j: usize) {...}

A function like this:

fn shuffle(data: &mut [u32], rng: &mut XorShift128) {
    for i in (1 .. data.len() - 1).rev() {
        data.swap(i, rng.uniform(0, i));
    }
}

Could be annotated like:

fn shuffle(data: &perm [u32], rng: &mut XorShift128) {

&perm means that you can read data freely, but you can’t write data items directly, you can only swap them, so after sort the items of data are the same, just in a different position. Swap-based sorting and permutating functions are natural examples of this not too much common coding pattern. At this level of specificity probably it’s better to use dependent typing, liquid typing, or code proofs instead.


#2

This too reads the field length before it’s written…

I’ve found only few instances where I could use an “out” argument in my Rust code, so perhaps the answer is negative.


#3

A few discussions on out/uninit references that are arguably still active:

My impression is also that out/uninit references are lacking compelling motivation. In particular, &mut MaybeUninit<T> seems to cover a lot of the same use cases.

The discussion on that RFC I linked above left me with a vague impression that &out/&uninit don’t have any advantages over &mut MaybeUninit<T> after all, but I’m not entirely sure I understood it so cc @cramertj and @Yato


#4

PITs could have readable length and unreadable elements, altho I haven’t fully worked out how PITs are supposed to interact with arrays.


#5

At the risk of duplicating one of those existing conversations:

What’s the advantage of using PITs for this as opposed to [MaybeUninit<T>; 42]?


#6

PITs are statically checked (partial) (un)initialization state, whereas MaybeUninit is pretty much just “good luck, hope your code is correct, see you at runtime”.


#7

Could you provide a concrete example of incorrect code that would compile with (sensible usage of) MaybeUninit and fail to compile with PITs? I haven’t seen one in any of these existing threads and can’t think of one myself; all I’ve seen so far is syntax and some dependent typing speculation.


#8

You can’t safely do emplace_back with MaybeUninit.

Yet, it’s trivial with PITs.

struct Vec<T> {
// whatever
}

impl Vec {
  fn place_back<'a>(&'a mut self) -> PlaceBack<'a, T()> {
    self.reserve(1);
    debug_assert!(self.capacity() > self.len());
    unsafe {
      // whatever
    }
  }
}

impl Drop for PlaceBack<'a, T()> { // uninitialized or partially initialized
  fn drop(&mut self) {
    unsafe { mem::replace(self.value, mem::uninitialized()) } // or something, we need to explicitly drop here
  }
}
impl Drop for PlaceBack<'a, T(..)> { // fully initialized
  fn drop(&mut self) {
    let len = self.vec.len();
    unsafe { self.vec.set_len(len + 1) }
  }
}

(It’s also safe to leak the PlaceBack, either uninitialized or fully initialized, it’ll just maybe leak memory)


#9

One thing that I like about Rust is that I’ve never needed out parameters due to move semantics, Result, and safe composable types. I find out parameters to be an anti-pattern and avoid them even in C++ where it is harder to.

So I hope we can do without them.


#10

Thank you for posting the link to my RFC.

The motivation for &out and &uninit is to create a safe way to create write-only contracts and to express uninitialized values. I don’t particularly like MaybeUninit, because it is unsafe, and it is already possible to do everything MaybeUninit does just using other unsafe code. Also I think MaybeUninit is trying to limit the usage of core::mem::uninitialized, which is wildly unsafe, and hard to use correctly.

Currently, there is no safe way to handle uninit, without overhead (of at least an Option), or to enforce a write-only contract, which is useful when interfacing certain hardware, as a powerful design tool, or as proof of intent. All of these are useful and important. I believe having a safe way to do this is a net positive for Rust.


This is something I love about Rust, but sometimes it is helpful to show intent in a compiler enforceable way, which &out could help with. Also &uninit can help build abstractions like the Placement New, and the box syntax as just library items with a little sugar, which I think is good.


Placement New (with Vec)


impl<T> Vec<T> {
    fn emplace_back<F: FnOnce(&uninit T)>(&mut self, init: F) {
        /// This code is taken from the Vec push implementation is the std lib
        /// and adapted to use &uninit to show how it will be used for placement new
        if self.len == self.buf.cap() {
            self.reserve(1);
        }
        unsafe {
            let end: &uninit T = &uninit *self.as_mut_ptr().add(self.len);
            init(end); // this line has been changed for the purposes of placement new
            self.len += 1;
        }
    }
}

and with some sugar, for placement new, this could be called like this

vec.emplace_back() <- value;

This implementation is panic safe because even if the function init panics, the uninit won’t be dropped by the Vec, because self.len hasn’t been incremented yet, so the uninit won’t be dropped. Also due to the rules around &uninit (that I laid out in the RFC), the function init is guaranteed to initialize the uninit if it doesn’t panic, so this implementation is safe.


#11

you can do this with std::slice::swap, or std::mem::swap


#12

Why do you prefer this over the PIT version? this has a closure that the PIT version does not! It’s also not as nice for passing around. (the PIT version can be returned, whereas this cannot)


#13

Because adding &uninit to Rust is a smaller, more manageable, and easier to learn as compared to PIT. In addition, &uninit is more limited, and therefore less likely to lead to confusing code.

I have a few problems with PITs and your write-up of them

  • I think it is easy for PITs to make confusing code easy to write
  • It looks like it would have to leak implementation detail in order to be effective
  • The syntax is alien, and looks a lot like function calls or definitions, even though it has nothing to do with functions
  • Most importantly, I don’t see enough details on how PITs will work for me to trust them
    • Whenever I see you talk about them, you tend to gloss over the details about how they would work
    • For example, what are you not allowed to do when PITs
    • How are panics handled?

With respect to the Placement New implementation

My implementation only requires one additional function, whereas yours requires a separate type, with a Drop impl. Also, closures aren’t evil, they can be quite helpful.

How? You can pass generic functions around like normal functions. playground link


#14

Okay, let’s go over each of these:

I think it is easy for PITs to make confusing code easy to write

While arguably true, I do believe PITs add something important to the language: the ability to specify, inspect, and interact with the intermediate types from partially initialized values. E.g.

let x: Foo;
// can't do anything with x
x.a = Bar;
// can't do anything with x, but can interact with x.a!
x.b = Baz;
// can now use x, x.a and x.b.

PITs define the first line as Foo(), the second as Foo(a), and the third as Foo(..) or Foo(a,b), and allow you to interact with x in all three.

It looks like it would have to leak implementation detail in order to be effective

This is an error:

mod foo {
pub struct Bar {
    a: i32
}
}
let x: foo::Bar(a) = /* whatever */; // error: a is not visible in this context

Instead, use one of Bar() or Bar(..), for uninitialized or fully initialized respectively. This leaks no more implementation details than tuple structs, or moving/copying/etc structs with private fields (something you can’t do with C opaque types).

The syntax is alien, and looks a lot like function calls or definitions, even though it has nothing to do with functions

I simply picked the first syntax that came to mind and didn’t conflict with existing syntax. The syntax is very much open to discussion.

For example, what are you not allowed to do when PITs

Uh, huh? “When PITs?”

How are panics handled?

Just like any other type - they Drop.

let x: Foo;
x.a = Bar;
panic!(); // calls Foo(a)::drop (if any) XOR Foo()::drop (if any) (in that order), then drops x.a
// XOR means it won't ever call both, altho I'm not sure if Foo()::drop is a good idea - should you be able to impl Drop for uninitialized types/values? probably not.

#15

Oops, I meant what are you not allowed to do with PITs. More clearly, what restrictions are there on using PITs. For example, with &uninit you can’t conditionally partially initialize anything, and you can’t store &uninit.


#16

Situation:

Lets say I have this struct and functions in crate A

/// Crate A

struct Foo {
   a: u32,
   b: String
}

impl Foo {
    fn init_a(&mut self() -> Self(a)) {
        self.a = 10;
    }

    fn init_b(&mut self() -> Self(b), string: String) {
        self.b = string;
    }

    fn a(&self(a)) -> u32 {
        return self.a;
    }

    fn b(&self(b)) -> &str {
        return &self.b;
    }
}

And I am using this in crate B

/// Crate B

fn large_function() {
    /// ... code ...
    
    let foo = Foo {};
    foo.init_a();
    
    /// ... code ...

    /// this code is complicated and we want to refactor
    /// it so that it is easier to maintain (this is a bit contrived)

    let string = fetch_string_from_somewhere_using_id(foo.a());

    foo.init_b(string);
    
    /// ... code ...
}

We want to move initializing b to another function so that the code is cleaner, and because we are using this pattern in multiple places and we want to factor it out.

Question:

How would I factor out

let string = fetch_string_from_somewhere_using_id(foo.a());
foo.init_b(string);

Into a function in Crate B, without modifying Crate A. Using PITs, this seems like something reasonable to do, could I do it? If so, how?

I don’t think you could do this sort of refactoring because you can’t name the field a, which is needed to get the string.


#17

If you have branches, you must initialize the same things on both branches. E.g.

// this is fine
let foo: Foo;
if x == 1 {
    foo.a = bar;
} else {
    foo.a = baz;
}

// this errors
let foo: Foo;
if x == 1 {
    foo.a = bar;
}
// error: expected Foo(a), got Foo(). (or something, the actual error message needs some work)

As for storage: it’s not exactly allowed, but it’s not exactly disallowed either; You can have

struct Foo {
    a: Bar()
}

But you’re never allowed to initialize Foo.a. At the same time, you can have:

struct Foo {
    a: Bar
}
let foo: Foo(a()) = Foo {};
foo.a.b = Baz;

But in this case, the PIT is part of let foo, not struct Foo.

There’s also some “subtyping” stuff, e.g.

struct Foo {
    a: Bar(b, c) // let's say Bar has a Bar.d as well, so that this is a PIT
}
let foo: Foo(a(b)) = Foo { a: Bar { b: Baz } }; // valid

#18

Oh. Hmm…

I see the problem! It’ll probably be a while until I can come up with a proper solution…

How do you like pub(name) field: Type? Hmm…

If you could use type aliases with PITs, and required all public interfaces to use public types…

/// Crate A

pub struct Foo {
   a: u32,
   b: String
}

pub type Foo_A = Foo(a);
pub type Foo_B = Foo(b);

impl Foo {
    pub fn init_a(&mut self() -> Foo_A) {
        self.a = 10;
    }

    pub fn init_b(&mut self() -> Foo_B, string: String) {
        self.b = string;
    }

    pub fn a(&self: Foo_A) -> u32 {
        return self.a;
    }

    pub fn b(&self: Foo_B) -> &str {
        return &self.b;
    }
}

(something to make the PITs publicly nameable without directly exposing the fields…)


#19

The problem is more fundamental, even if you have a solution to this, I don’t think it could be accepted because it would have to leak private information to a public interface.

This is similar

pub type FooExtern = Foo;

struct Foo;

When this is compiled you get a warning,

warning: private type `Foo` in public interface (error E0446)
 --> src/lib.rs:2:1
  |
2 | pub type FooExtern = Foo;
  | ^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: #[warn(private_in_public)] on by default
  = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
  = note: for more information, see issue #34537 <https://github.com/rust-lang/rust/issues/34537>

Note this

this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!

Due to this, I don’t think any solution that leaks private information, even behind aliases would work.


#20

Hmm. How would we benefit from partially initialized private fields in public interfaces?