[lang-team-minutes] feature status report: placement in and box

Executive summary: We are not ready to stabilize the placement in and box forms yet. We would like the forms to have a better user experience before stabilization.


Bullet point summary:

  • overloaded box EXPR is not implemented yet
  • Consider desugaring from HIR to MIR (rather than source to source) to address inference issues with box EXPR.
  • Is box EXPR too easily confused with Box<T>? Should we change the syntax?
  • Probably need to add autoref to <- to support vec <- 22, vec.back() <- 22, and arena <- 22
  • Consider reworking Placer in light of generic associated types (aka “Associated Type Constructors”).
  • Allocator integration?

Status Report: Placement In and Box

During the language team meeting on January 19th, the team discussed the box EXPR and placement in PLACER <- EXPR syntactic forms.

(To see the original goals for these forms, see RFC 809.)

In particular, the key question we wanted to answer was: Where are these forms on their respective paths towards stabilization? Are we happy with their current semantics and should stabilize them as is, or do they need further revision before stabilization? (Or perhaps are they not pulling their weight and should be deprecated?)

Current implementation status: Support for <- and the traits for overloading the operators has landed. But box EXPR is not yet overloaded; it is tied solely to Box<T>.

Box Syntax

The reason that box EXPR was not yet overloaded is that the desugaring-based implementation that works for <- does not generalize nicely; box EXPR, due to type-inference issues (the inference ends up trying to assign unsized types to stack locals in the desugaring).

  • We may want to switch from a source-to-source desugaring to a HIR->MIR desugaring. This may help address the type inference issues.

While discussing overloading box EXPR, a common concern was that this form is simply too confusing, because box sounds like Box, and so the current non-overloaded semantics is in fact the one that everyone expects.

  • An alternative might be to syntactically-overload <-
  • e.g. things like let x: Rc<_> <- EXPR being the new way to say let x: Rc<_> = box EXPR

Placement In Syntax and Protocol

While placement PLACER <- EXPR is implemented, it is agreed that the current semantics is not ready for stabilization. In particular, the placement protocol is designed to always consume the PLACER. This consumption semantics works for some placers (like &Arena since it is Copy), but it is likely to rule out desirable uses.

  • See RFC Issue 1286 for further discussion of the protocol’s consumption of the placer.

  • The discussion at RFC Issue 1286 focused largely on changing the trait definitions rather than the desugaring. However, the discussion did expand to include options like doing auto-ref on the PLACER in PLACER <- EXPR. This way we could support vec <- 22; and vec.back() <- 22; and arena <- 22; which require &mut self, self, and &self respectively.

Related to the placer-consumption question, the lang team discussed whether or not making vec <- 55; “work” (as a way to push that integer onto the end of the vector) is desirable.

  • Currently one must instead write vec.place_back() <- 55;,
  • (Requiring place_back seems pretty onerous, in terms of convincing new-comers that this matches “best practices”,
  • (Obviously supporting vec <- 55; requires the change that the placement protocol not consume the PLACER; but that implication is only one-directional.)

Finally, it is possible that future language features like generic associated types may yield much better designs for the Placer trait.

Allocator Integration

Some people have asked about how placement in will integrate with allocators. Landing allocators stalled somewhat, but now that #[may_dangle] support has landed, @pnkfelix plans to put up a PR for allocator-support in libcollections soon.

It remains to be seen how well the placement syntax and protocol will mesh with allocators (RFC 1398.)

Time Line for Work

It is not clear if we will put effort into placement-in or box in 2017, since it is not really on the roadmap (except perhaps if one regards the syntax v <- 55 as a big boon for usability).

13 Likes

Thank you for the thorough update! I agree that this is not really on the 2017 roadmap. <- doesn’t feel like a huge usability bump to me.

3 Likes

I agree that it’s confusing for box and Box to mean two different things. What about sticking to a single syntax for both placement-in and boxing, and using Box <- EXPR, Rc <- EXPR, etc.? To me this feels like the most obvious answer.

For Rc this would require creating a new definition called Rc in the value namespace, which would be automatically imported along with the type per existing rules.

For Box, since the struct is currently defined as tuple-like, the name exists in the value namespace as a constructor, but that’s only a private implementation detail (the field is private so any attempt to use Box as a value, even without calling it, will produce a privacy error). Box could be changed to a non-tuple-like struct and then the same applies as above.

(One might ask what would happen to box patterns. For now, Box <- EXPR could be special-cased as a pattern. Eventually, this could turn into a way to overload the pattern matching system. Imagine if EXPR <- PAT were a valid pattern (maybe with special rules to avoid syntactic ambiguity), and you had something like like trait PlacementPattern<RHS: Pattern> { fn try_match(...) -> Option<...>; }…)

4 Likes

The one thing of note for this feature is the performance benefit, as the box syntax allows for direct placement on the heap without a copy on the stack, if I remember correctly. Feel free to correct me.

@GuillaumeGomez would know as well.

I confirm!

While it is a big win for performance, it is also a win for correctness and portability.

Currently putting a big array on the heap always works in release builds (due to optimizations) but sometimes panics or segfaults in debug builds because it is first put on the stack (which overflows) and then copied to the heap (and whether this works, panics, or segfaults, is platform dependent).

With placement in this always works.

What is the motivation for wanting vec <- 22 to be sugar for appending? C++ requires an explicit call to emplace_back for such a behavior, and although I’m not saying we shouldn’t aim to do better than C++, it seems unprecedented. Couldn’t we just change the definition of Vec::push to use placement-new internally?

C++ can do it by calling constructors on arbitrary memory locations. So emplace_back roughly just has to allocate its memory location, then forward constructor arguments to new(ptr) ValueType(args...).

Vec::push can’t deal in constructor arguments - we don’t even have constructors. It only takes the value to be pushed, so that value has to be created as an argument to the function before the function ever gets a chance to do something fancy with it.

Given Rust’s model of construction, I don’t know how you could “emplace” without language-level support.

(Hmm, I may have totally misunderstood @bstrie’s context, as @jethrogb pointed out and then deleted…)

I think without concrete uses in, say libcollection, this feature should not be established.

This may be a foolish question, but why is new syntax even needed in the first place? My understanding is that the purpose of box EXPR is to force an optimization that will occur anyway in release mode. Why not add an attribute that can be specified on Box::new, Vec::push, etc. that forcibly inlines the argument?

For box EXPR/Box::new, that may be possible, but it would have to be a rather ad-hoc, magical attribute. And it would presumably come with some weird rules for the implementation of new to guarantee that the optimization actually applies (remember that this is intended to apply not just to Box but also to Rc, Arc, and third-party types). So I doubt whether that feature would actually be easier to implement or less of an addition to the language. Then there’s the fact that box EXPR is simple more ergonomic, and nicely matches box patterns which are desirable too.

With “placement in” syntax, e.g. as replacement for Vec::new (edit: Vec::push, sorry for the confusion), it’s not really possible to avoid new syntax, see this old FAQ (particularly question #4).

Box patterns? That would be a definite usability improvement for working with boxes…

2 Likes

I suppose what I’m asking is whether we’ve considered some equivalent mechanism to how C++ forwards arguments (I’m not 100% on how this pans out in C++, so this is just wishful thinking on my part).

With "placement in" syntax, e.g. as replacement for Vec::new (edit: Vec::push, sorry for the confusion), it's not really possible to avoid new syntax, see this old FAQ

I remain unconvinced that we cannot simply use closures for emplacement, i.e. do this: vec.emplace(|| LargeValue { ... })


The FAQ (A12) claims that this is incompatible with fallible constructors, i.e. the new syntax allows one to write vec <- try!(run_code()) which will bail out from the current function on error, whereas vec.emplace(|| try!(run_code()) form cannot do that, obviously.

The catch here is that in order for run_code() to be fallible, it must return a Result<T,E>, which wraps the value we want to emplace. Well, guess where this intermediate value will be stored? That's right, - on the stack. Which defeats the whole purpose of emplacement. The vec <- try!(run_code()) form will appear to work, but will not in fact do what the programmer had intended!

What would work with arrows is this: vec <- run_code(try!(compute_argument())). However computation of arguments is pretty easy to factor out of the construction expression:

let arg = try!(compute_argument());
vec.emplace(|| run_code(arg));

So... do we gain enough by adding all that placement machinery into the language, when a simple solution works nearly as well?

4 Likes

A12 is about desugaring PLACE <- EXPR (or whatever the syntax would be) to something like PLACE.emplace(|| EXPR). Such a desugaring would silently change the meaning of EXPR w.r.t. control flow. It’s true that people could write out the closure form themselves and then they’d have no one to blame but themselves if they tried to return (of break, or …) from the closure.

So, I misspoke when I said new syntax is strictly necessary. If we require people to write out closures manually, that could generate the desired code. For that matter, we could also have a macro that does the current desugaring (as long as we decided that we don’t want auto-ref). One of the placement RFCs in fact gave an example implementation in the form of a macro.

So I am again reduced to an appeal to ergonomics: Placement is strictly more efficient than other forms of construction, so it should be more convenient than those other methods, to ensure it is used over them.

2 Likes

AFAIK, C++ uses:

  1. Variadic template methods that basically accept any possible combination of arguments.
  • Perfect forwarding (RValue-References + Reference collapsing + std::forward) to forward the arguments to constructor
  • The fact that constructors are already a special case, not just static methods that return a value.

I think especially point 1 is problematic. It means unrestricted overloading with variable arguments and without any constraints (traits), which is very far from how generics work in Rust.

2 Likes

Yes, especially that this in a C++ constructor is a pointer to as-yet-uninitialized memory.

I think a fairly direct parallel in Rust would be something like:

struct Vec<T> {
    fn emplace_back<C>(&mut self, ctor: C)
        where C: FnOnce(*mut T)
    {
        let this: *mut T = self.reserve_uninitialized_slot();
        ctor(this);
        // ... plus some panic handling
    }
}

The closure's environment can take the place of arbitrary variadic templates, but making the initialization happen in place is the tricky part to me.

Whatever solution is chosen must make the following program always correct (playground):

const N: usize = 512;  // works in my platform for 256 but not 512 =/
type FixedSizeMatrix = [[f64; N]; N];
let u : Box<FixedSizeMatrix> = Box::new([[0.; N]; N]); 

The current issue is that inside Box::new(EXPR), result of EXPR is allocated on the stack, and then passed to new, which moves it to the heap. It works in release because the optimizer removes the unnecessary stack allocations.

IIUC placement in is a language construct (analogous to placement new) to force the result of EXPR to be allocated at a concrete memory location, which solves this problem (and others). I do not want to run into, what Bjarne describes as: "Something must be done. This is something. Therefore we must do this".

The original placement box RFC says:

This provides a way for a user to specify (1.) how the backing storage for some datum should be allocated, (2.) that the allocation should be ordered before the evaluation of the datum, and (3.) that the datum should preferably be stored directly into the backing storage (rather than allocating temporary storage on the stack and then copying the datum from the stack into the backing storage)

After re-reading the RFC, I see that there are many use-cases being discussed. But if I focus on the simplest example shown above, I don't feel that the following question has been answered with sufficient clarity in the RFC and associated discussions:

Why cannot the example above be made to work correctly and reliably without extra syntax?

That is, without focusing on other usages / applications of a possible placement in language feature, could the language require that when the result of an expression is directly consumed by another expression, the expression must be inlined when possible (for some definition of possible) and the temporary storage guaranteed to be eliminated?

1 Like

Not in the presence of panicking and other divergent expressions. We want the heap storage to be allocated before the expression is evaluated. If the expression panics, the heap storage needs to be freed without executing the destructor. Maybe it can be done without custom syntax, but it can’t be done without a placer interface that the containers have to implement.

I think there’s two seperate issues - what the syntax looks like for the user and what the syntax looks like for the library.

Since user code is more common, obviously we should optimize for that case. In particular, it would be great if you could just write Box::new(expr) or Rc::new(expr) or vec.push(expr) and have the compiler do the right thing.

As far as I can tell, the only reason this optimization can’t be done completely automatically is order of evaluation issues in the case of placement (note that there is no such concern for placeless allocation like Box::new and Rc::new).

In the case of Vec::push, doing the allocation changes the state of the vec, and there’s interesting questions surrounding what will happen if EXPR involves borrowing vec, as well as if it panics or completes abruptly. This sounds similar to the Non Lexical Lifetime issue, so maybe we should wait to see how they decide to handle stuff like vec.push(vec.capacity()) first.