Short String optimization

I'm not sure that's true. I also wouldn't want to be in a situation where people are writing their own strings around Vec<u8> because they can't trust that the standard one is no-fail in the places they care about.

3 Likes

My recollection was that we decided would never do this because we wanted String and Vec to avoid complex optimizations that make their semantics more difficult to understand (preferring to keep them “canonical implementations”). Apparently this decision never translated into documentation guarantees.

4 Likes

Can you elaborate? I imagine an SSO String would have an explicit way to turn a &str or whatever into something on the heap. If they really, really care, a user could do this:

struct HeapString(String);
impl HeapString {
  fn new(str: &str) -> Self {
    let string = String::in_heap(str);
    Self(string)
  }
  fn into_inner(self) -> String { .. }
  unsafe fn new_unchecked(str: String) -> Self { .. }
}
impl Deref for HeapString {
  type Target = String; // or str or whatever
  // ..
}

I imagine that in most branches involving is_heap would predict true anyway...

Sure, I can imagine wanting that. I think Rust tends to treat String a lot more like e.g. Java StringBuffer than std::string anyways. I wonder if it's possible to measure whether this would even be a notable improvement...

Then all of the String functions would have to check whether the String is heap-allocated or inlined, despite the HeapString guaranteeing that the branch only goes one way. It's an anti-pattern called an abstraction inversion.

3 Likes

You can turn this optimization on a case-by-case basis by using SmallString which is based on SmallVec. I would argue that it is better than turning SSO for everything. In many cases SSO is pessimisation. It is in many ways similar to choosing between HashMap and BTreeMap.

The only argument that has any solid ground here is that someone measures that SSO will improve performance of majority of existing rust code.

2 Likes

Probably three years too late. I joined the discussion then myself arguing that it is safer and more future-proof to not define String as a wrapper around Vec and leave it as an implementation detail.

I think that without custom move constructors it will be a performance regression, not only because of the additional branching, but also because String moves will copy the whole struct, including often unused [u8; N]. Plus I don’t think that SSO will result in a cache friendlier code for most of Rust codebases, because thanks to borrow checker it’s idiomatic in Rust to pass around &str instead of pervasively cloning substrings as you’ll do in C++.

3 Likes

Yeah, there's no particularly good way to optimize those calls in today's compiler. I mumbled something about how is_heap should be annotated to predict true (Does Rust even have a way to do this? Clang has some kind of non-standard attr I think.) or whatever, but I have nothing concrete to say as to the performance hit.

Yeah, that's what I've been saying. Absent that, I don't think making std::string::String an SSO string is worth it.

I'm glad to see the discussion this has generated. I think the issue is cleared up for me!

Yep, as an unstable intrinsic: rust/src/libcore/intrinsics.rs at ae366637fedf6f34185e54fc7b2d725b1a458ff6 ¡ rust-lang/rust ¡ GitHub

@newpavlov String's size w/o the SSO can be the same, such that in both cases, when the whole struct is moved via a memcpy, the same amount of work is done, e.g.,

union String {
    (usize, usize, *mut u8),  // (ptr, size, capacity)
    ([u8; size_of::<usize>() * 3), [sso buffer..., sso size, sso bit]
}

where one:

  • uses 1bit to signal whether the SSO is active (e.g. the highest pointer bit),
  • some bits to store the size (either the second highest pointer byte, or one crawls this in the highest pointer byte along with the SSO bit),
  • the rest of the bits for the SSO buffer.

On 64-bit targets, if you use 1 byte for the SSO signaling bit, and 1 byte for the size, you get 22byte wide buffer. If you store the size and the sso signaling bit in the same byte, then you get 23 byte wide buffer, but you need to mask the signaling bit every time you want to extract the size.

With move constructors, moving the String could be less work when SSO is active (branch to detect SSO, branch on the SSB length, plus only moving the relevant part of the small string buffer) than when the buffer is heap allocated, but without move constructors moving the String when SSO is active is never more work than when the string is heap allocated unless one makes it larger than 3 pointers to get a larger small buffer size (AFAIK most C++ string implementations don’t do this).

Whether all the logic for deciding what to move and how to do so is more performant than just memcpying 3 pointers… is unlikely. This only becomes relevant if you use small strings with larger buffers, but the “beauty” of C++ SSO is that one can perform it without increasing the actual size of the std::string object.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.