This post is meant to collect questions (and answers) about Rust's placement syntactic form, the placement protocol, and the traits that data types use to implement their support for the form.
@pnkfelix will collect questions into this top description as they come up, and revise the entries (both questions and answers) in response to public discussion in the comment thread. (You can also suggest changes to him via other channels.)
Q(1): What is the strange acronym-like "NWBI<-" in the post title?
A(1): Summary: That is Felix's attempt at humor.
Our placement form has quite a history with a variety of syntactic
keywords associated with it (new
, box
, in
, <-
). Felix
formed the pseudo-acronym "NWBI<-" from that series of tokens.
Furthermore, if the <-
token looks vague like an "e", then one
might well pronounce the character string "NWBI<-" as "newbie."
Funny, right? (Well, ... there's a reason Felix is a developer and not a comedian.)
Anyway, there is a reason I put this answer first: since the syntax has gone through so many iterations, it can be hard to tell whether people are talking about the same or different things when one says "left-arrow" and another says "placement-in` and another says "box".
This document will usually use the phrase placement-in
when
talking about this feature in the context of Rust (even though it
no longer uses the in
keyword), and use the phrase
placement-new
when talking about the feature in the context of
C++. We will occasionally also mention "placement-box
", since
Rust is likely to offer an overloaded form of the "box" operator,
and it will serve a purpose similar to that of placement-in
.
Nearly all of the code examples in this document will use the
"current or expected future" syntax for a feature. This means that
placement-in
will be written PLACE <- EXPR
, and
placement-box
will be written box EXPR
.
Q(2). What do the placement forms do? (that is, what are their semantics?)
A(2). In the abstract, given types P
and T
, where P
implements the Placer
trait, and two Rust expressions PLACE
and VALUE
(of types P
and T
respectively), the PLACE <- VALUE
expression works by:
-
evaluate
PLACE
to a placerp
, -
ask the placer
p
to allocate memory suitable for storing an instance ofT
, -
evaluates
VALUE
into the previously allocated memory (that is, do not put it onto a temporary stack slot), -
finally, if
P
supports it, thePLACE <- VALUE
expression returns a handle-value that acts as a reference to (or, potentially, owner of) the allocated value.
The handle-value has type associated with <P as Placer>
;
if the placer does not support such handles, then its
associated type will be the unit type ()
.
The placement-box
expression is similar, except that it is only used when the
place-allocation does not need any separate input to dictate from where
the place is allocated. (The most obvious example is the Box<T>
type,
which allocates memory off of the global heap.)
Given types B
and T
where B
implements the Boxed
trait, the
Rust expression VALUE
of type T
, and an expression context C[]
expecting a value of type B
, the box VALUE
expression when evaluated
in context C[]
works by:
-
ask the type
B
to allocate memory suitable for storing an instance ofT
, -
evaluate
VALUE
into the previously allocated memory (do not put it onto a temporary stack slot), -
finally,
box VALUE
returns its handle-value: an instance ofB
that internally holds a reference to, or potentially owns, the allocated memory.
A final detail (but a very important one) is how these forms deal
with cleanup when panic!
occurs within the VALUE
expression.
Ideally the temporary memory allocated by PLACE
would be
deallocated as the stack unwinds. The protocol defined by the
Placer
API support this, by having the temporary memory for
storing T
kept in a Place
wrapper struct that implements
Drop
.
If the evaluation does not panic!
, then the Place
is
forgotten (via mem::forget
) so that its destructor does not
run. If the evaluation of VALUE
or the final construction of the
handle-value does panic, then the destructor for the Place
is
in charge of rolling back the transaction (e.g. deallocating the
reserved memory if necessary, as well as any other cleanup).
Q(3). What are some examples of where one would use the placement forms?
A(3). Below are some thoughts
-
Arenas are a good match for placement-
in
: you want to allocateT
into memory managed by the arena itself, and return&T
from the expressionarena <- T_EXPR
. -
Vec<T>
is capable of supporting placement-in
as a more efficient alternative tovec.push(T_EXPR)
; the latter is defined to evaluateT_EXPR
into a temporary stack slot, and then pass that value into thevec.push
invocation. -
Box<T>
is the current user of thebox VALUE
form. -
Rc<T>
could also usebox VALUE
. Instead of doinglet x = Rc::new(T_EXPR)
, you would writelet x: Rc<T> = box T_EXPR;
. This would avoid an intermediate stack allocation, and it also may yield more concise code in cases where the compiler can infer the expected typeRc<_>
.
Q(4). Does Rust need a placement feature at all?
In other words: Can we not just rely on a compiler to do this kind of optimization for us,
so that we continue writing vec.push(EXPR)
, and compiler optimizations
remove the intermediate temporary stack storage?
A(4). We cannot in general rely on the compiler to do this for us.
As said by @eddyb:
the evaluation order differs between emplace and non-emplace. LLVM cannot and AFAIK Rust does not want to reorder side-effects such as allocations.
To be concrete, consider the suggest example. The vec.push(EXPR)
method is defined to evaluate in this order:
-
Evaluate
vec
andEXPR
(into a temporary stack slot) -
Invoke the
Vec::push
method -
The
Vec::push
method body will do the allocation of storage, (if capacity exceeded).
In particular, the side-effect of doing storage allocation is defined
as coming after the evaluation of EXPR
.
In some cases LLVM is able to inline and optimize to such a degree as to remove the intermediate stack slot, and we hypothetically could try to leverage that for low level routines.
But this does not resolve the more general problem. In the general case, we want users to be able to place allocations into arenas that they define, and in those cases, the side-effects of maintaining the arena storage cannot be automatically optimized by LLVM.
Q(5). Should protocol stabilization wait for feature X ?
The placement-in
protocol is defined as using various
unsafe
methods and invariants that implementors of the protocol
are responsible for maintaining. Should we instead add more
general purpose features like &uninit
references instead?
(See e.g. this comment from reem
and this comment from glaebhoerl)
A(5).
-
We are not sure that adding
&uninit
would pay for itself (even after hypothetical improvements to thePlacer
API. -
Even if we had
&uninit
, we would still need to address partial cleanup (i.e., the place destructor, as discussed in "What do the placement forms do?"
Q(6). How did we decide on the syntax PLACE <- EXPR
for placement-in
? Why not <alternative syntax here>
?
A(6). The form has gone through a number of iterations, and during each change, a large number of variants to the syntax have been proposed.
The iterations that were approved by the designers have been:
box (PLACE) VALUE
, where thePLACE
is optional, and the whole(PLACE)
could be omitted ifVALUE
does not start with a parenthesis.in PLACE { VALUE }
PLACE <- VALUE
(I do not currently plan to list all of the suggested variants, though I would be happy to throw them into an appendix if there is demand. Nor do I plan to list the full pro/cons list for every variant.)
I will state the constraints we are trying to meet (which tend to favor our current syntax), as well as the drawbacks to the current syntax (as stated by commenters, not necessarily the FAQ author).
Constraints
Here are some of the constraints we wanted to meet:
-
don't add new keywords,
-
backwards compatible: don't change how old + stable programs parse,
-
don't introduce parsing ambiguities (this rules out for example
in PLACE VALUE
) -
since
PLACE
will be evaluated beforeEXPR
, havePLACE
come before (i.e. to the left of)EXPR
in the syntax, -
furthermore, there was a strong argument that the syntax should be "lightweight" (where
P <- V
is more lightweight than saybox (P) V
orin P { V }
). The reasoning (such as presented in this comment from @petrochenkov) is that the placement form may become the preferred way to e.g. push onto the end of a vector, and therefore the syntax needs to be at least as easy to write asvec.push(value)
.
Drawbacks
Here are some stated drawbacks of the PLACE <- VALUE
syntax. I do not currently plan to rebut any of these in this document; I just want to list them so that it does not seem like I am trying to claim that the syntax is flawless.
-
It is so similar to an assignment
lvalue = rvalue
that we'll have to explain the difference between the two (comment) -
The combined form
let x = y <- z;
is ugly (comment) -
in particular, can lead the reader to think that
y
is being assigned tox
, when it isz
(or at least a handle to a boxedz
) that is assigned tox
(comment) -
It is an operator sigil that is (mostly) unused in other languages
-
It may end up in code that looks like line noise (comment)
-
The
<-
operator is not likely to produce useful results from a search engine (as compared to a syntax with a dedicated keyword, likein
)
For further reference, see:
Q(7). Why does placement-in
use an expansion-based implementation?
A(7). Largely due to simplicity of implementation. Many of the other language constructs (e.g. for
and while
loops) also use an expansion-based implementations.
However, we are not strictly wedded to an expansion-based implementation. It may be that we will need to switch to something that is not-expansion based. For example, if we were to support for auto-ref (so that PLACE <- VALUE
will automatically turn into &mut PLACE <- VALUE
if necessary), then that might be difficult to do in an expansion-based implementation.
Q(8). Why does the Placer API have Placer::make_place
take self
rather than &mut self
?
(That is, taking self
forces the programmer to insert &mut
-borrows; wouldn't using &mut self
allow the &mut
-borrow to be automatically injected by auto-ref on method dispatch?)
A(8). (This question is actually a bit ambiguous as written. It could be complaining about the protocol API in terms of the traits one must implement, or it could be complaining about the need to write &mut
in the client code. The current answer assumes the former; I hope to revise it or fork off another question to address the latter.)
The short answer is: It would indeed be nice if Placer::make_place
took &mut self
rather than self
. There are reasons for the current API that passes self
. The first issue probably will not matter in the long term, but the second issue is likely to be flummox potential redesigns of the API.
- The current implementation relies on UFCS in the macro-expansion, and UFCS does not use method-call syntax. In other words, the emitted code is
Place::make_place($PLACER)
, rather than the$PLACER.make_place()
that you might expect.
The fact that the `$PLACER` is passed in the argument list means that you would not get auto-ref for that argument. So the distinction between `self` and `&mut self` is more significant than one might think, in the current implementation.
(Furthermore, placement-`in` as originally envisaged also encouraged the use of constants as Placers to support using placement-`in` syntax for `Box<T>`, i.e. something like `let b = (BOX_PLACER <- VALUE)`. Switching from `self` to `&mut self` would in fact introduce a *new* place that would now require a borrow, i.e. the above would have to be written `let b = ((&mut BOX_PLACER) <- VALUE);`. Note that this again is a consequence of the implementation's use of UFCS.)
But, perhaps a future placement-`in` expansion could correct for this, so this is currently more of an excuse for why `self` is acceptable, rather than a reason for why we *cannot* use `&mut self`. Let us move on to explore what problems `&mut self` cause.
-
To pass
&mut self
instead ofself
probably requires higher-kinded types (HKT).I hope to elaborate more on this answer, but for now, you can refer to my comment here.
Q(9). Which stdlib datatypes currently support placement-in
?
A(9). None, currently.
We are still finalizing the protocol API. We have not added Placer
support to any of the standard library types.
I believe the plan is to add support to the library types, revising the protocol if necessary (and/or possible, in the case of clear improvements) as we go along, and only stabilize the protocol after we have concrete evidence that it has the right semantics and performance characteristics.
(Hopefully this answer will be revised in the relatively near future.)
Q(10). Which stdlib datatypes should support placement-in
?
A(10). Obvious candidates here:
-
placement-
in
:Vec
TypedArena
-
placement-
box
:Box
,Rc
,Arc
-
also, the forthcoming
Allocator
API may add a further twist, such as perhaps combining allocator-parametricBox
with placement-in
.
Q(11). What are some potential future language features that we should consider integrating with placement-in
, if possible?
A(11). For now, here are some off-the-cuff thoughts on this
-
If we add higher-kinded types then that might allow nice changes to the protocol. (But we may not want to wait for that.)
-
If we add
&uninit
references then that might allow nice changes to the protocol. (But we may not want to wait for that.) -
Specifying memory as non-moving types may be a good match for placement-
in
. [RFC issue 417: "Support for intrusive data structures and unmoveable types"][RFC issue 417]In particular, one might want to use a special form like
PLACE <- VALUE
for initializing such memory (since such types by definition cannot move).Note that in practice we may want to revise/extend the protocol in such cases so that we feed the target address into the construction of the
VALUE
itself.Update (2015/12/19): a recent conversation with members of the Compiler and Servo teams led me to realize that (I think) we can already accomplish the above with the current protocol, or nearly so, if you twist your mind accordingly.
- The main idea is this:
let handle: Box<T> = place <- kernel;
would be the way to initialize non-moving memory of typeT
atplace
, based on a kernel value of typeK
(note: notT
). - The key is that you would allocate the memory, and the
place
would know where that is, and it would also have somewhere to stash the kernel value. - Then, the
finalize
method of theplace
would be in charge of actually initializing the final memory forT
, and that code has access to the address of where theT
is located. - Its not the prettiest thing in the world, but I was super surprised when it was revealed, because I had been assuming that we would need to change/extend the Placer protocol to support this use case. The main question here is: Can one readily write such a
finalize
method in a way that still deals properly with any panics that it encounters from its subroutine invocations. i don't know yet, but I don't have an immediate counter-example either, so that's good news, right?
- The main idea is this:
Q(12). Why can't we leverage closures rather than add new syntax and/or this complex trait-based protocol?
In particular, why not use a simpler desugaring that would wrap the in a once-function closure; this way, you would still force the expected order-of-evaluation (do the allocation first, then run the closure, letting return value optimization handle writing the result into the backing storage).
A(12). Ignoring the issue that this relies on return-value optimization actually kicking in (which @pnkfelix found to not be generally reliable, and in any case we wouldn't want debug builds to differ so much in runtime behavior that they could e.g. stack overflow with ease) ... the most obvious place where this completely falls down is that it does not do the right thing for something like this:
let b: Handle<T> = placer <- try!(run_code()
because under the once-function desugaring, that gets turned into something like:
placer.make_place(|| -> T { try!(run_code()) })
which will not do the right thing when run_code
returns an Err
variant, and in fact will not even type-check.
Q(Ω). What kind of FAQ is this? You left out my question ...!
A(Ω). If you think the FAQ is missing an entry, or if one of the questions/answers needs expansion, feel free to add a note to this comment thread. @pnkfelix will try to keep the top-most part up to date for as long as the FAQ lives on this internals thread.
(At some point the FAQ may migrate to another location, like a wiki or perhaps a book like the rustonomicon -- at that point the text here will be amended or replaced with a forwarding pointer to the new home.)
References:
[RFC issue 405: "box
syntax"][RFC issue 405]
[RFC issue 405]: `box` syntax · Issue #405 · rust-lang/rfcs · GitHub
RFC PR 470: placement box with Placer trait for overloading, and its Pre-RFC.
- Side note: There is a lot of useful information in this RFC PR that did not get copied into later RFC texts, which probably represents a failure somewhere on the part of @pnkfelix and/or the RFC process itself. In particular, I found very few mentions of the once-function desugaring (and why it was abandoned) in the later RFC texts, even though it was one of those ideas that needs to be documented because it seems like a great solution until you actually implement it ... i.e., exactly the kind of variation that deserves documentation.
[RFC PR 809: "overloaded-box
and placement-in
][RFC PR 809]
[RFC PR 809]: https://github.com/rust-lang/rfcs/pull/809
[RFC 809 current text][RFC 809 text] [RFC 809 text]: rfcs/text/0809-box-and-in-for-stdlib.md at master · rust-lang/rfcs · GitHub
[RFC PR 1228: "Place left arrow syntax (place <- expr
)"][RFC PR 1228]
[RFC PR 1228]: https://github.com/rust-lang/rfcs/pull/1228
[RFC 1228 current text][RFC 1228 text] [RFC 1228 text]: rfcs/text/1228-placement-left-arrow.md at master · rust-lang/rfcs · GitHub
[RFC PR 98: "Uninitialized Pointers"][RFC PR 98] [RFC PR 98]: RFC: Uninitialized Pointers by gereeter · Pull Request #98 · rust-lang/rfcs · GitHub
[RFC issue 417: "Support for intrusive data structures and unmoveable types"][RFC issue 417] [RFC issue 417]: Support for intrusive data structures and unmoveable types · Issue #417 · rust-lang/rfcs · GitHub