So here’s some details since there seems to be some confusion of how this would work:
For now I will just assume there’s a
String::from_literal constructor and we’re just interested in supporting this as a manual optimization. The merits and details of any coercion are orthogonal.
These are what I consider the major benefits:
- improved performance for some code
- a path forward to coercions for improved ergonomics
These are what I consider the major concerns:
- breaking the pointer-stability assumptions of tricky unsafe code
- introducing latency in critical sections of code
- how to expose
capacity in these semantics
- whether Vec should also support this optimization
- wanting to introduce Cow/Rc-like APIs so people can probe the state
But before I dig into those, we need to understand the implementation.
String is currently a wrapper around a
Vec<u8> (ptr, len, cap), and this detail is exposed at the API-level by
as_mut_vec. Because they’re defined in the same library, String can mess around with Vec’s fields all it wants, and we can just make sure Vec handles it correctly.
&str is a “fat pointer” (ptr, len).
To store a String literal in itself, String will set Vec’s (RawVec’s)
cap field to 0.
RawVec already has to check for
cap = 0 in its Drop implementation, so this will incur no overhead there. Vec may need to add its own
cap = 0 check if we want to make Vec itself Cow to avoid dropping the contents. However this will easily be optimized into the same branch RawVec has, at no cost (RawVec needs to keep the branch for its other consumers).
Many Vec APIs already check for
len == cap for if they should reallocate. This should be changed to
len >= cap, which will incur no overhead.
However the reallocation code currently won’t do the right thing.
double will need to be updated to take a
used_cap argument, and check for it in the already existing
cap = 0 branch. This will be basically free, since it will be adding a branch to an already-cold path.
The reserve functions will also need to update their fast-path branches. This is relatively sensitive since reserve evaporating has historically been very important for optimizing iterator loops into memcopies. I’m cautiously optimistic since it will just be changing
cap - len >= requested_extra into something like
cap != 0 && cap - len >= requested_extra, which is hopefully “obvious” to the optimizer due to cap increasing by a usize.
This leaves the mutating operations which don’t increase capacity, like
DerefMut<[T]>. These need new checks for
cap=0 to do the right thing. Currently they guard on
len. We can either implement this logic at the level of Vec, so that it becomes
Cow<'static, [T]>, or we can implement this logic at the level of
String. The latter will probably be more error-prone, but is easier to justify (see other sections for discussion).
So pretty much all cost will be concentrated in extra branches in functions like
pop. No special bitflags are needed. 0 cap is already a sentinel in Vec.
There are a few places where this optimization has an opportunity for significant performance wins. If code doesn’t fall into these cases, then it’s mostly just moving the cost of allocating a fresh string a bit further down the road.
String::from("hello").push_str("!!!") will perform two allocations today (if you want to argue for LLVM optimizing this away, imagine the two ops are far away). With this optimization it will only need to be one allocation, as the COW operation will know that it also needs some slack space for the “!!!”.
String::from("hello").clear() will be able to do zero allocations.
String::from("hello").truncate(1) will still do 1 allocation as it does today, but can save a bunch of copying, and use a smaller allocation.
Code that requests Strings because it needs to store the data longterm, but doesn’t actually mutate them. This code could be changed to use
Box<str> without any issue. However requiring
Box<str> can itself be a performance issue, as in the cases where one actually needs to dynamically construct a String, one might (probably?) have some slack space in the allocation, and that means reallocating to construct a
Cow<'static str> could also be used for this case; see below.
Code that requests Strings, but usually doesn’t need to mutate them. This code could be changed to use
Cow<'static, str>. However
Cow being a general construct can’t do interesting type-specific optimizations. In particular, every access will need to branch on “str or String” (although this is a particularly cacheable/optimizeable branch). Building this logic into String will eliminate an extra branch for almost every operation (as it will be unnecessary, or folded into an existing check).
The operations that would probably need a new branch are:
- drain (when it isn’t equivalent to truncate()/clear())
(edit: some things were removed here, operations like truncate/pop/clear are actually fine)
Clone::clone would probably gain an extra branch as a performance optimization (no-op if cap = 0).
My general rule of thumb here is that it’s basically fine to add a predictable branch before an O(n) operation, but sketchy to add one to an O(1) operation (as it may be tossed in a tight loop). So basically only
clear qualify, but
clear aren’t things you would do in a tight loop. So basically only
pop is even kind-of concerning.
In general there aren’t many operations you can perform on a String in place, because basically every unicode operation can lead to a change in byte-wise length.
&mut str is basically “hello yes I would like to do ASCII things”.
&mut [T] is a really useful type, and I’d be very concerned to make the
Vec -> &mut [T] transformation expensive.
The “operations which will need a branch” are also those that will now be able to trigger a reallocation, which couldn’t before. Any unsafe code which was relying on this not happening will be broken if it’s passed a String containing a string literal.
It’s not clear to me if our documentation actually guarantees we never shrink allocations when items are removed, but it’s certainly a design decision we’ve long held while implementing the Rust standard library. Certainly I wouldn’t be surprised to see e.g. Servo relying on it.
@Kimundi’s StableAddress trait currently assumes all bets are off if you take a mutable reference, so that’s encouraging.
@arielb1’s DerefPure trait will be pseudo-violate by this – DerefMut will be observably pure on its own, but Deref followed by DerefMut is observably impure.
Pointer stability is the aspect that worries me the most. I could be convinced it’s not a big deal, though.
Some operations which are supposed to be
O(1) will suddenly sometimes be
O(n). This implies an O(n) operation was turned into an
O(1) somewhere else, but it can sometimes matter where the expensive operation was. For instance, if the expensive operation suddenly starts happening in a latency-sensitive section of code, that’s a problem.
I’m pretty ambivalent about this. If someone Really Really cares they can do some dummy operation that triggers a COW. If this is significantly desirable we can provide an operation that for doing precisely this (I think
reserve(0) will probably always work).
The trickiest piece of implementation appears to be exposing capacity in a way that:
- does the right thing if you use it to calculate the size of target buffers (cap = len)
- does the right thing when fed into things like from_raw_parts (cap = 0)
edit: please see discussion below, this is the most difficult constraint, and may be intractable.
These appear to be fundamentally conflicting goals. I’m inclined to say that the first use-case is probably doing something wrong, and can be broken with abandon (in which case, we simply expose the capacity as stored – 0).
reserve APIs to take additional space, and not absolute space. This is because the overwhelming use-case is
vec.reserve(size_of_thing_i_want_to_insert). So there’s no reason to include
capacity in such a calculation.
If you want to know how big of a buffer you need to reserve for the contents of a buffer, you should be using
len, and not
In the remaining cases, I expect
cap = 0 will probably just do the right thing, in that it will tell you that you need a new buffer? I’m a bit concerned about underflow bugs.
If you reject this and want
cap = len, then we need to add a
into_raw_parts API that does
cap = 0, and tell everyone using
from_raw_parts to be sure they got
into_raw_parts. That’s a bit of a mess, IMO.
Should Vec Do It Too?
As noted in previous sections; the increased utility of O(1) operations on Vec makes me less comfortable with Vec itself supporting this operations. That’s about it.
Should String Provide
I’m too tired from writing this. Whatever.