So here’s some details since there seems to be some confusion of how this would work:
For now I will just assume there’s a String::from_literal
constructor and we’re just interested in supporting this as a manual optimization. The merits and details of any coercion are orthogonal.
These are what I consider the major benefits:
- improved performance for some code
- a path forward to coercions for improved ergonomics
These are what I consider the major concerns:
- breaking the pointer-stability assumptions of tricky unsafe code
- introducing latency in critical sections of code
- how to expose
capacity
in these semantics
- whether Vec should also support this optimization
- wanting to introduce Cow/Rc-like APIs so people can probe the state
But before I dig into those, we need to understand the implementation.
Implementation
String is currently a wrapper around a Vec<u8>
(ptr, len, cap), and this detail is exposed at the API-level by as_mut_vec
. Because they’re defined in the same library, String can mess around with Vec’s fields all it wants, and we can just make sure Vec handles it correctly.
&str
is a “fat pointer” (ptr, len).
To store a String literal in itself, String will set Vec’s (RawVec’s) cap
field to 0.
RawVec already has to check for cap = 0
in its Drop implementation, so this will incur no overhead there. Vec may need to add its own cap = 0
check if we want to make Vec itself Cow to avoid dropping the contents. However this will easily be optimized into the same branch RawVec has, at no cost (RawVec needs to keep the branch for its other consumers).
Many Vec APIs already check for len == cap
for if they should reallocate. This should be changed to len >= cap
, which will incur no overhead.
However the reallocation code currently won’t do the right thing. double
will need to be updated to take a used_cap
argument, and check for it in the already existing cap = 0
branch. This will be basically free, since it will be adding a branch to an already-cold path.
The reserve functions will also need to update their fast-path branches. This is relatively sensitive since reserve evaporating has historically been very important for optimizing iterator loops into memcopies. I’m cautiously optimistic since it will just be changing cap - len >= requested_extra
into something like cap != 0 && cap - len >= requested_extra
, which is hopefully “obvious” to the optimizer due to cap increasing by a usize.
This leaves the mutating operations which don’t increase capacity, like pop
and DerefMut<[T]>
. These need new checks for cap=0
to do the right thing. Currently they guard on len
. We can either implement this logic at the level of Vec, so that it becomes Cow<'static, [T]>
, or we can implement this logic at the level of String
. The latter will probably be more error-prone, but is easier to justify (see other sections for discussion).
So pretty much all cost will be concentrated in extra branches in functions like pop
. No special bitflags are needed. 0 cap is already a sentinel in Vec.
Performance
There are a few places where this optimization has an opportunity for significant performance wins. If code doesn’t fall into these cases, then it’s mostly just moving the cost of allocating a fresh string a bit further down the road.
-
Bundling allocations. String::from("hello").push_str("!!!")
will perform two allocations today (if you want to argue for LLVM optimizing this away, imagine the two ops are far away). With this optimization it will only need to be one allocation, as the COW operation will know that it also needs some slack space for the “!!!”. String::from("hello").clear()
will be able to do zero allocations. String::from("hello").truncate(1)
will still do 1 allocation as it does today, but can save a bunch of copying, and use a smaller allocation.
-
Code that requests Strings because it needs to store the data longterm, but doesn’t actually mutate them. This code could be changed to use Box<str>
without any issue. However requiring Box<str>
can itself be a performance issue, as in the cases where one actually needs to dynamically construct a String, one might (probably?) have some slack space in the allocation, and that means reallocating to construct a Box<str>
. Cow<'static str>
could also be used for this case; see below.
-
Code that requests Strings, but usually doesn’t need to mutate them. This code could be changed to use Cow<'static, str>
. However Cow
being a general construct can’t do interesting type-specific optimizations. In particular, every access will need to branch on “str or String” (although this is a particularly cacheable/optimizeable branch). Building this logic into String will eliminate an extra branch for almost every operation (as it will be unnecessary, or folded into an existing check).
The operations that would probably need a new branch are:
- as_mut_vec
- remove
- drain (when it isn’t equivalent to truncate()/clear())
(edit: some things were removed here, operations like truncate/pop/clear are actually fine)
Related to &mut str
:
- as_mut_str
- DerefMut::deref_mut
- make_lowercase_ascii
- make_uppercase_ascii
- IndexMut::*
Clone::clone would probably gain an extra branch as a performance optimization (no-op if cap = 0).
My general rule of thumb here is that it’s basically fine to add a predictable branch before an O(n) operation, but sketchy to add one to an O(1) operation (as it may be tossed in a tight loop). So basically only pop
, truncate
, and clear
qualify, but truncate
and clear
aren’t things you would do in a tight loop. So basically only pop
is even kind-of concerning.
In general there aren’t many operations you can perform on a String in place, because basically every unicode operation can lead to a change in byte-wise length. &mut str
is basically “hello yes I would like to do ASCII things”.
By contrast, &mut [T]
is a really useful type, and I’d be very concerned to make the Vec -> &mut [T]
transformation expensive.
Pointer Stability
The “operations which will need a branch” are also those that will now be able to trigger a reallocation, which couldn’t before. Any unsafe code which was relying on this not happening will be broken if it’s passed a String containing a string literal.
It’s not clear to me if our documentation actually guarantees we never shrink allocations when items are removed, but it’s certainly a design decision we’ve long held while implementing the Rust standard library. Certainly I wouldn’t be surprised to see e.g. Servo relying on it.
@Kimundi’s StableAddress trait currently assumes all bets are off if you take a mutable reference, so that’s encouraging.
@arielb1’s DerefPure trait will be pseudo-violate by this – DerefMut will be observably pure on its own, but Deref followed by DerefMut is observably impure.
Pointer stability is the aspect that worries me the most. I could be convinced it’s not a big deal, though.
Latency Concerns
Some operations which are supposed to be O(1)
will suddenly sometimes be O(n)
. This implies an O(n) operation was turned into an O(1)
somewhere else, but it can sometimes matter where the expensive operation was. For instance, if the expensive operation suddenly starts happening in a latency-sensitive section of code, that’s a problem.
I’m pretty ambivalent about this. If someone Really Really cares they can do some dummy operation that triggers a COW. If this is significantly desirable we can provide an operation that for doing precisely this (I think reserve(0)
will probably always work).
Exposing Capacity
The trickiest piece of implementation appears to be exposing capacity in a way that:
- does the right thing if you use it to calculate the size of target buffers (cap = len)
- does the right thing when fed into things like from_raw_parts (cap = 0)
edit: please see discussion below, this is the most difficult constraint, and may be intractable.
These appear to be fundamentally conflicting goals. I’m inclined to say that the first use-case is probably doing something wrong, and can be broken with abandon (in which case, we simply expose the capacity as stored – 0).
Our reserve
APIs to take additional space, and not absolute space. This is because the overwhelming use-case is vec.reserve(size_of_thing_i_want_to_insert)
. So there’s no reason to include capacity
in such a calculation.
If you want to know how big of a buffer you need to reserve for the contents of a buffer, you should be using len
, and not capacity
.
In the remaining cases, I expect cap = 0
will probably just do the right thing, in that it will tell you that you need a new buffer? I’m a bit concerned about underflow bugs.
If you reject this and want cap = len
, then we need to add a into_raw_parts
API that does cap = 0
, and tell everyone using from_raw_parts
to be sure they got cap
from into_raw_parts
. That’s a bit of a mess, IMO.
Should Vec Do It Too?
As noted in previous sections; the increased utility of O(1) operations on Vec makes me less comfortable with Vec itself supporting this operations. That’s about it.
Should String Provide is_unique
/make_unique
?
I’m too tired from writing this. Whatever.