Can representation of Vec and String be changed?

That's almost a thought experiment: Would it be possible to change internal representation of Vec or String?

Currently their layout is (ptr, capacity, len), and layout of slices is (ptr, len). If Vec/String were (ptr, len, capacity) instead, then &&[]/&&str could be created without copying, and maybe that would be slightly more efficient (especially in places like iter.filter where this double indirection happens).

But I expect that the answer is no, they can't be changed, because there may already be code assuming their current layout for tricks like small string optimization or FFI. In that case there's the next trick question: if their layout is effectively frozen, could it be blessed as the official ABI that crates can actually rely on?

4 Likes

We don't define the order of the fields. In fact -Zrandomize-layout may change the order to something like (capacity, ptr, len) or (len, <extra padding>, capacity, ptr), though not (ptr, len, capacity) due to ptr and capacity being part of a single RawVec field.

14 Likes

See this zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Can.20we.20stabilize.20the.20layout.20of.20Vec.3CT.3E.20and.20.26.5BT.5D.3F/near/375022031

TL/DR: No, we're not at a point where we'd be willing to stabilize Vec/String layout.

And its follow-up, https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Can.20we.20stabilize.20the.20layout.20of.20.26.5BT.5D.20and.20.26str.3F/near/394683395, that we can probably stabilize the layout of &[T] and &str.

2 Likes

Side note on SSO: the docs state String's "buffer is always stored on the heap."

TL/DR: No, we're not at a point where we'd be willing to stabilize Vec /String layout.

Isn't the answer "yes, Vec/String layout are not stabilized and we can reorder the fields if a different ordering is optimal"? (I'm assuming that any reordering would be an invisible implementation detail and not subject to stabilization) (edit: I now see you're probably responding to the "next trick question" at the end of the OP, as opposed to the question at the start of the OP; I'll leave my post as-is but apologies for conflating the two)

4 Likes

The layout has changed several times just this year and will likely change again. I'm slowly working on two things that may affect layout. One is to add a big niche to capacity, the other one is to add a niche to slice-length. Both are based on the property that they can't exceed isize::MAX. For vec/string this will automatically affect layout. For slice references I'm not sure if it'll happen automatically but even if it is hardcoded it might change the decision what is considered the optimal layout.

There's also the reference-alignment-niche RFC which won't directly affect the layout but it mentions platform-specific niches based on address-space restrictions in its future possibilities section

So imo we should be careful about stabilizing reference layout too, it doesn't seem settled yet.

though not (ptr, len, capacity) due to ptr and capacity being part of a single RawVec field.

This non-possibility is also an implementation detail for now. I recall and experimental PR that tried to put all three into a single struct so any possible reordering is on the table.

1 Like

Just to note, this one probably can't happen because of ptr::slice_from_raw_parts. And even without that, slices of ZST can be >size::MAX elements long and then that pointer as cast to a non-ZST pointee type.

Yes, this is an argument about pointers, not references. But it'd be very strange for them to have different byte validity requirements.

4 Likes

If length is never supposed to exceed isize::MAX, it'd be better to formalize it sooner than later, since it's already useful to abuse it for arbitrary pointer metadata:

I don't see how it would be very strange. Pointers are the wild west, references have very restrictive requirements. To me they're not the same at all.

Only for slice references to non-ZST types. Pointers or ZST-references would be unaffected by such an optimization. I'm not even 100% sure yet if this will be workable. But if alignment niches in the pointer part of the references are (which would also be type-dependent) then I assume so will be niches in the length part.

I've been wondering the opposite: why do we need 3 metadata locally, making a String always (on 64bits) a whopping 24 bytes. If we had only (ptr, capacity) then len would be at the beginning of the heap segment (only if capacity > 0).

Moreover, since Rust is non-mut very often, we can do away with len, because I guess there is no useful case where a non-mut would have len < capacity.

Making it a single pointer would be possible -- ThinVec in thin_vec - Rust does that -- but that has different trade-offs. It doesn't allow Vec::from_raw_parts as it works today, for example, and has different implications for people wanting to re-use Vec's allocation.

So this is one of those places where I think Rust is intentionally making Vec the simplest thing, and letting the others be crates.

This is called a slice :slight_smile:

(You can use Box<[T]> instead of Vec<T> for cases where you want len == capacity only.)

7 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.