Small string optimization: remove as_mut_vec

I needed to create this thread, since I haven’t found too much dicussions on that topic.

SSO is an important optimization in modern languages. To give an example: More than half of all allocations in the chromium browser are std::string allocations [1]. I don’t have a study or article around to prove the statement, but: You can easily imagine that many strings in applications won’t exceed a certain length. Storing those strings without allocating in the stack-object itself would be a huge gain in performance for applications using large numbers of short strings.

On 64-Bit architectures it’s possible to store strings with a length up to 23 bytes inside the object. 23 bytes is a whole lot when you consider what types of strings are shorter than that.

Therefore I think that Rust needs this optimization or at least the library should be able to use the optimization. I was told that optimization stuff doesn’t matter too much in this time, in which the language devs are trying to stabilize the language. BUT: The current interface of std::string::String prevents the usage of that implementation.

For example: The method as_mut_vec would prevent SSO. It returns a mutable reference to a Vec which means that this Vec has to be stored somewhere. This is not possible when using SSO. Every solution to this problem involves a heap allocation which is NOT what you want. I’m not sure if there are some other methods that would prevent SSO, but I will find out.

TL;DR: I propose to rethink the interface of std::string::String with SSO in mind. We can’t afford to prevent ourselfs from using this optimization.

And btw: I have a working implementation of std::string::String that uses SSO. It is missing a few methods and implementations but most functionality is ready.

[1] https://groups.google.com/a/chromium.org/forum/#!msg/chromium-dev/EUqoIz2iFU4/kPZ5ZK0K3gEJ

4 Likes

So we can ignore String here since it’s basically a wrapper around Vec. There are logically 3 kinds of Vec that we’re concerned with here:

  • Vec always on Heap
  • Vec always on Stack
  • Vec adaptively on Stack or Heap

You are proposing the third option be used unconditionally, I believe. But this would cause lost performance when Vecs really want to be pure stack (because they are known apriori to be small) or pure heap (because they are know apriori to be big, or need to be moved around a lot). Every access would have to branch on what “mode” the Vec is in.

I instead propose that we have 3 distinct types: HeapVec<T>, StackVec<T, n>, and HybridVec<T, n>. These can all three in turn be implemented on top of a AbstractVec<T, Buf> where Buf is a trait that abstracts over managing and accessing the buffer (ptr(), cap(), and reservation methods) with three implementers:

  • HeapBuf<T>
  • StackBuf<T, n>
  • HybridBuf<T, n>
struct AbstractVec<T, Buf> {
  len: usize,
  buf: Buf,
}

struct HeapBuf<T> {
  ptr: *mut T,
  cap: usize,
}

struct StackBuf<T, n> {
  elems: [T; n],
}

struct HybridBuf<T, n> {
  buf: Either<HeapBuf<T>, StackBuf<T, n>>,
}

We are however missing the type-system tooling necessary to implement StackBuf<T, n>, and when we will get them is unclear.

In the future it should be easy enough to make Vec<T> an alias for HeapVec<T> (what it is now), HybridVec<T, 7> or anything else we think is ideal.

3 Likes

I am proposing to use the third option unconditionally for the standard string type, yes. I think that one must think of how users will use the language. And users will use whatever is short to type and will show up in the first result of their google search. Therefore the standard string type is used nearly everywhere and should default to something that’s good in most situations. I think that defaulting to a always-heap-allocated string is not the right choice.

I am not saying that Rust needs SSO for sure. I am just saying that adding methods to the standard string interface that would prevent that optimization wouldn’t be wise.

I think collections should have a minimal interface and String as well as Vec have too many methods that are not important to the core functionality.

I don’t get why we need those as_vec methods anyway. String and Vec shouldn’t know of each other. If you want to copy the contents of a String into a Vec, you should use some iterator way to do it (or an unsafe raw-pointer way). Who made the decision to add this methods to the String type? Why would it be bad to remove as_mut_vec?

1 Like

In this context I recall one presentation from CppCon where one of the C++ standardization committee members discussed performance problems of collections in standard library.

He talked about two categories of problems:

  • inherent - std::map (binary tree), std::list (doubly-linked list) They lead to bad cache behaviour.

  • badly designed interface - std::dequeue (linked-list of arrays), std::unordered_map (hashmap) Badly specified interface that rules out better implementations. For example the unordered_map makes buckets part of the interface.

I understand @LukasKalbertodt’s proposal as I see SSO as very important. In fact I am really surprised that it is not already implemented.

Are as_vec and as_mut_vec methods actually useful? I would expect that building new Vec from String::as_bytes and then building new String from Vec wouldn’t be much slower and given how frequent this would be it is no problem in real applications.

On the other hand SSO can be implemented even with as_mut_vec as it is by upgrading optimized shord string to head-allocated one lazily. The as_vec method is problem here though.

2 Likes

There isn't actually a as_vec... I confused that with into_bytes which consumes self and returns Vec<u8>. This isn't really a problem because the Vec has to be created anyway (we don't return a reference to a Vec). But into_bytes also involves a heap allocation when SSO is used (= string is stored inplace).

And yes, one could implement as_mut_vec that way but it would add more branches to other methods because we can't go back to SSO once as_mut_vec was called. That would be chaotic...

But again: I don't see why it's necessary to have any methods returning Vec at all.

I supose that this discussion is related to this issue: https://github.com/rust-lang/rust/issues/20198 so I have made a comment there.

I saw that too, but there doesn’t seem to be activity lately so I assumed this would be a better place to discuss that…

And maybe it’s not just about as_mut_vec but about the whole String interface with as_mut_vec being the worst part.

Can you @strcat tell me if there is going to be small optimization for Vec?

String shouldn't necessarily be a wrapper around Vec, it may be (and it usually is) a wrapper around some other specialized byte container, supporting SSO. So, it's not necessarily to reduce the problem to Vec and enforce the small buffer optimization on Vec.

So, this ^^^ statement is incorrect.

This is certailnly a good idea, but

this is not how SSO is usually implemented.

There's no ideal general purpose HybridVec<T, N> - both the choice of N and the implementation of HybridVec involve a lot of tradeoffs. Vec should stay as it is - simple and predictable.

Yes you could implement HybridVec as an some complex psuedo-tagged union if you wished, but sufficient enum optimizations might make that unecessary.

I find it very hard to believe after looking at the string implementation in libc++

A whole lot of discussion took place here: `String::as_mut_vec` prevents small string optimization · Issue #20198 · rust-lang/rust · GitHub

I agree with that. A string is used for completely different stuff than a vector. Applying the same optimizations is usually just wrong.

I would disagree too. SSO can be crazy shit. I seriously doubt that a smart compiler would be able to do stuff like the SSO23 technique described here: GitHub - elliotgoodrich/SSO-23: Memory optimal Small String Optimization implementation for C++

And as a question for the experienced rust-devs: Do you think that one can still change the language in such a way? I don't know if you devs are just happy that String is stable and don't want to touch it anymore before the 1.0 release. So... still change possible?

Cross-posted to reddit to get more attention on this potential backwards-compatibility hazard.

I agree that it is important to remove potential roadblocks on the way to SSO. I’m for this change.

Edit Having read some of the arguments in the linked issue, I’m no longer so sure about this…

I haven't had a chance to really dig into the issues -- would you mind summarizing the points you found compelling?

That (1) strcat has measured it extensively and found it not to be a significant improvement in most real code, (2) projects that care a lot about string performance, like LLVM, rarely use it, (3) small allocations are extremely fast on modern hardware, and a onetime cost, while you pay for the extra branching and icache bloat every time a string is accessed, (4) for a variety of reasons (UTF-8 leading to limited mutability, for example) the small string allocation is unlikely to be as useful for Rust as it is for C++ anyway, (5) it is comparatively easy for a project that does need SSO to swap out string types compared to many other large-scale changes (which I was able to verify for myself when iota recently switched to a new backend representation of a gap buffer with very few code changes).

2 Likes

FWIW, I think removing the as_mut_vec is a feasible change before beta. It enabled us to cut down on the number of unsafe (non-utf8-ensuring) methods, so we'd need to do some API design work at the very least.

This is not to say that I think we should make such a change; I need to get more familiar with the details of the debate before having an opinion here.

It’s not clear to me that the optimizations applied to C-strings necessarily apply to length-included utf-8-encoded strings with no null byte. Regardless, the design I laid out is mostly just for instructive purposes.

It’s also not clear to me if you really want the SSO optimization to be totally transparent and “only” use available bits in the heap-allocating repr. I see nothing that makes 3-ptrs of bytes a particularly optimal amount of space to branch on stack vs heap allocation. The Hybrid design I layout allows programmers to identify a reasonable choice for n for their particular usecase. So if they identify that 180 bytes on the stack works really well for them, they can use that.

@pcwalton, any thoughts on this issue?

TLDR†: If SSO is useful, let there be a well-supported SmallStr type in the stdlib, backed by an equally well-supported SmallVec type. Make sure the docs are clear that String is definitely a wrapper around Vec<u8> and is specifically for amortising the cost of building strings. Once there exists a decent corpus of appropriate Rust code in the wild, re-evaluate based on said corpus, not hypothetical future issues.

I think it should stay. Rust is not C++, and doesn’t have to have the same problems. Even if I use the assumption that SSO is desirable, I’ve seen no solid data to indicate that it’s desirable in general (as thestringer pointed out, if a switch to SSO is made, everyone will pay the branching costs), nor can I think of a compelling reason why someone couldn’t just use a different “SmallStr” type.

That second point is especially true given that the “normal” way of passing around a string is via &str, not String.

The only way this makes sense to me is if String is renamed to VecStr (as the UTF-8 wrapper around Vec), and String becomes a very vaguely defined type insofar as implementation details are concerned: it’s whatever is “generally most efficient”.

But even then, how do we know what’s “most efficient”? Rust doesn’t really have the corpus to derive meaningful statistics for this, does it? How do we know people aren’t just going to go with what’s efficient based on the underlying representation? An alternative is that the majority of people are lazy and we end up with people using String in an inefficient manner that SSO would be better for… or it might not.

The more I think about this, the more convinced I am that this is all just guessing at future trends for maybe having a performance benefit, in exchange for a definite reduction in functionality today, which seems silly.

†: Too Long, Didn’t Read Rambling

1 Like