Small string optimization: remove as_mut_vec

Thus far my opinion on this issue is that its fine and maybe preferable for basic data structures to have clear and simple performance characteristics (like vec is always heap allocated). Adding clever optimizations under the hood that may or may not kick in is its own kind of hazard.

2 Likes

To validate/disprove claims about this crippling the String interface, can someone measure the performance impact of converting uses of as_mut_vec to into_bytes, plus a backwards move with String::from_utf8_unchecked when necessary? As currently implemented, that pair of moving transformations is little more than changing the facade on the same underlying buffer. It is reported that as_mut_vec is hardly being used, which is understandable given that it's unsafe and there almost always should be a safe way to mutate a String which works reasonably fast.

Iā€™m of the opinion that any data structure API that exposes the internals is a bad idea. I thought intuitively that Strings were SSO-able without a breaking change, so this was surprising to me.

3 Likes

Some of the points made by @strcat on GitHub need further evaluation, I think:

Small allocations take ~8ns on modern hardware.

Sorry, but without hard data backing it up against some realistic workload this is all as much hearsay as claiming that SSO/SVO is a magical speed bullet.

I don't think you realize how cheap small allocations are. Branch misses are much worse

So how many mispredicted branches are caused by frequent allocation and deallocation over varying block sizes? For workloads where SSO would be relevant, the predominant usage pattern of String would likely be something like: allocate, a few appends, a few sequential reads, deallocate. So the allocations are likely to be stacked fairly high compared to other operations, unlike workloads that tend to mutate bigger strings. It's hard to validate the claim that branches on accessing the buffer are the main culprit in most realistic use cases and not, for example, the chosen implementation of SSO, or suboptimal compiler behavior, without seeing any details of measurements made, which details have been scarce.

And the issue, as has been said in this discussion, is not about deciding to switch to SSO right now. It's about removing an unsafe, rarely used method that limits the options for changing the String implementation in a backwards-compatible way if a need to do so is discovered. There is only one such method so far identified, despite repeated claims to the contrary.

There is only one such method so far identified, despite repeated claims to the contrary.

from_raw_parts would have to be altered to only accept pointers that are at least 16-bit aligned (or possibly removed), so there are at least two. And that goes quite a bit deeper if we still allow Vec<u8> in other methods.

As far as strcat's performance numbers go: it's always best to validate, but he has contributed quite a bit to jemalloc, as well as Rust's allocator API, is writing his own allocator (which is benchmarking very well), and even got a Linux kernel patch in to improve allocator performance. I'd be inclined to trust his numbers, particularly in the absence of hard evidence demonstrating the opposite.

I think the whole question of ā€˜should we keep as_mut_vecā€™ is almost completely irrelevant to whether or not we will eventually do SSO. as_mut_vec can easily be adjusted to de-SSO the String if it is SSOā€™d because it takes &mut self, as @pepp noted above. I suspect that the performance regressions from that would be small compared with the performance regressions from requiring people to choose potentially slower alternatives to as_mut_vec. SSO is just an optimisation which is ideally used whenever it improves speed; losing one rare case where it could be used isnā€™t much of a problem in my eyes. Keeping as_mut_vec does not rule out SSO at all, so I donā€™t really see how it is a problem.

We are not really talking about C-strings but rather about std::string which also has it's length included. And I don't see why std::string would pay branches for the null byte. It could either calloc the buffer and just care about null bytes when reallocating or just write a null byte unconditionally every push.

Sure: It's pretty damn hard to evaluate this problem properly. One could analyse big open source projects of "similar" languages to get information about the usage pattern of strings. I think that even C++ application would give a good indication on how mutable strings are used. With that usage pattern one could create a benchmark and profile everything. And I really think that people use whats most easy to use and whats familiar. If you have to choose between String and SmallString you'd think that the latter is a special thing to do special things with. And again: That's fine if SSO is actually something that you would just use in special cases. Problem: We don't know yet and I hardly believe that we have some serious benchmarks before beta.

I absolutely agree. But in languages aimed to be low level standard design rules has to be thrown out the window sometimes I guess. But Vec is not necessary here. String could expose a rare pointer or something other that is not a library type. Creating those cross type dependencies is not a good thing IMO.

I think that this is a good question and just saying that these branches are worse than everything else is not a good idea. Often when we care about performance we have a lot of stuff and we iterate over it in a loop somehow. Remembering that CPUs are pretty damn good at what they are doing (read: branch prediction) we should think of how regular our branches are. And if our strings are usernames for example most strings will use SSO and therefore the branches are easily predictable. Same thing if we use strings that are mostly large. Only problem would be a scenario where our string lengths are somewhere around 23 with 50% above and 50% below and those strings are randomly distributed in our container. And yes, if we use branches we don't just pay for misprediction but for a few other things like preventing special optimization. But I don't think that's the real issue here.

Exactly! And yep, I am pretty sure as_mut_vec is the only method that really causes any trouble because it returns a reference to a Vec. Consuming the String and returning a new Vec wouldn't be a problem. Although I would say that we should remove all methods that include the Vec type from the string interface. Converting could be done via free function that use the unsafe "get-pointer" method of the String and the from_raw_parts method of the Vec. When doing this conversion with a StrVec it just needs to copy three pointers. Unconditionally. Should be fast enough I guess. And if this is not fast enough for your use case or you actually need a reference to the underlying Vec to mutate the buffer in a non UTF8 way, I would consider you the 0.1% of people that should write their own string implementations anyway.

So: Yes, this is not about "SSO is better", but about "this one method that is hardly used (and that could be replaced nearly without loss of performance with some other method) has huge implications and is preventing us from maybe doing smart stuff in the future".

EDIT:

I just don't see why he is "hiding" the hard data from us when he is good at such things.

Not really. To implement that method we would have to store the Vec inplace. Your idea, correct me if I misunderstood, is to just switch to the non-SSO way of storing the string like one would do when the string becomes larger than 23 bytes. Problem A: When we store the Vec inline we still need any indicator if we're using SSO. At least one bit. One easy way to do this is to use the LSB of the cap field as indicator and restricting the capacity to even numbers (which is not a huge restriction, especially when using doubling policy). We can't do that with Vec since Vec has it's own policy and we can't just use some bit's of it for other stuff. Problem B: Even if we would have solved A, we would potentially add even more branches to every method to check for the three cases (small and SSO, small but not SSO because as_mut_vec was called, big and not SSO).

There is no stated guarantee that from_raw_parts will use the passed buffer as is. Actually, does it even work for arbitrarily created buffers without leaking or corrupting the heap on them? This function needs some more explaining...

Also, tagged pointers are not the only scheme of encoding SSO.

Feel free to suggest a variant of SSO that doesnā€™t require at least one extra bit of informationā€¦ in the meantime, it seems pretty clear to me that this would be a backwards incompatible change to that function, since it doesnā€™t require alignment right now. The same goes for any functions allowing creation of a String from a Vec<u8>.

This is all assuming you want them to perform well, of course. If you have to make the existing methods all perform much worse in order to enable the optimization, it becomes much less attractive.

There is one linked up the thread that uses length <= capacity to encode in two bits, which is enough to also encode the length in the same byte.

And as said above, from_raw_parts permits backwards-compatible circumvention, unlike as_mut_vec.

And I mentioned another method in my post above in the "EDIT" section. Just want to make sure that you don't miss that spot since you all are apparently answering to fast for me ...

If you store length <= capacity in two bits, it is not a backwards compatible change, since you have now restricted the maximum string length to differ from the maximum vector length. On a 32 bit system, this is a pretty significant difference. Restricting the capacity to even numbers is also not backwards compatible if you want to maintain existing performance.

I'm not sure if you misunderstood the method or if I misunderstood you. But we don't actually "store" anything more. We just know that if the MSB of the len field is 1 the MSB of the cap field can't be 0. So we still have the same capacity and length.

I'd argue that it's hardly a difference. I don't think that allocators ever create blocks of odd length. So this additional byte shouldn't hurt performance at all. Has to be measured though :wink:

No restriction: it just cannot happen that the MSB of length is 1 while the MSB of capacity is 0. So these two bits are encoded in the first byte using a bit-mixed pattern for the fields of the non-SSO string/vector structure, and when these bits are "wrong", this means the string is in-place. It's kinky, but does not require the pointer to be aligned.

Speaking of pointers, if from_raw_parts is really only supposed to take buffers obtained from the standard allocator, that allocator could be guaranteed to always allocate word-aligned buffers, which is typically the case for other reasons as well.

I misunderstood the method. Okay, I agree that with the MSB method it would (seemingly, anyway) be backwards compatible (since it exploits some redundant information in the existing data structure).

What about renaming back to StrBuf and reserving String for (mostly) immutable string if the Box<str> thing does not turn out well?

Your idea, correct me if I misunderstood, is to just switch to the non-SSO way of storing the string like one would do when the string becomes larger than 23 bytes.

Correct me if Iā€™m wrong, but I think you misunderstand me. What I am proposing is that the current as_mut_vec implementation be changed to something like this:

fn as_mut_vec(&mut self) -> &mut Vec<u8> {
    if self.is_sso() {
        // This method changes the Stringā€™s representation to a non-SSO one;
        // that is, one using a pointer, length, and capacity.
        self.change_from_sso_to_normal_representation();
    }
    &mut self.vec
}

This would work regardless of how SSO gets implemented.

from_raw_parts is potentially backwards-incompatible to change, but it depends on the precise SSO implementation. Forcing a capacity with a multiple of 2 could be backwards compatible, as the Vec could simply be reallocated whenever the capacity is odd (I think). Itā€™s likely that most other SSO implementations could work with it, too.

@pepp The case of building a likely small string before sealing it in an immutable value ought to be supported in an easy-to-use, idiomatic way without making the developer think of different types and their tradeoffs. This could be done, e.g. with a box_from_iterator constructor and a macro:

let s: Box<str> = build_str!(base, "foo", suffix);

Well.. not exactly I think. To make this clear: We don't want our string to be larger than 3 words (ptr, len, cap). So a Vec has the same size as our String type which means that every bit of our type is used by the vector. Since you are returning a reference to that internal vector the user could call reserve on it with an odd number. And our String would then assume that it is using SSO because the LSB(it) is set. So this is not possible. Furthermore: If we want to use the method with the LSB(it) we have to make sure that the LSB(yte) is at the beginning or the end of our type. This is not always the case thanks to Endianness. So yeah... we have too many special restrictions on the cap, len, ptr fields (which are used by the vector) that it is impossible (I think) to do it this way.

@mzalaluev: The thing is that Box<str> currently does not work, is not promoted in The Book and we have no experience what ergonomics it will have. It can turn out that it will not stand to the task of being number one choice for strings and then the next best name String for suitable type will be taken by string type optimized for mutability which is arguably not the most common requirement.

I see several possibilities here:

  1. Status quo and do care about optimizations for String.
  2. Have &str, Box<str>, Rc<str> as primary types promoted in The Book and StrBuf, SmallString and others as domain specific types. Then str will need more work to stand to the task.
  3. Have &str and String as primary types promoted in The Book, where String is (mostly) immutable string type using whatever optimizations turn out to best for general case usage. Then current String will have to be renamed and part of it reused for new String for the starters.