Can representation of Vec and String be changed?

kornel · November 28, 2023, 4:42pm

That's almost a thought experiment: Would it be possible to change internal representation of Vec or String?

Currently their layout is (ptr, capacity, len), and layout of slices is (ptr, len). If Vec/String were (ptr, len, capacity) instead, then &&[]/&&str could be created without copying, and maybe that would be slightly more efficient (especially in places like iter.filter where this double indirection happens).

But I expect that the answer is no, they can't be changed, because there may already be code assuming their current layout for tricks like small string optimization or FFI. In that case there's the next trick question: if their layout is effectively frozen, could it be blessed as the official ABI that crates can actually rely on?

bjorn3 · November 28, 2023, 4:53pm

We don't define the order of the fields. In fact -Zrandomize-layout may change the order to something like (capacity, ptr, len) or (len, <extra padding>, capacity, ptr), though not (ptr, len, capacity) due to ptr and capacity being part of a single RawVec field.

scottmcm · November 28, 2023, 5:15pm

See this zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Can.20we.20stabilize.20the.20layout.20of.20Vec.3CT.3E.20and.20.26.5BT.5D.3F/near/375022031

TL/DR: No, we're not at a point where we'd be willing to stabilize Vec/String layout.

And its follow-up, https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Can.20we.20stabilize.20the.20layout.20of.20.26.5BT.5D.20and.20.26str.3F/near/394683395, that we can probably stabilize the layout of &[T] and &str.

mjbshaw · November 28, 2023, 5:16pm

Side note on SSO: the docs state String's "buffer is always stored on the heap."

TL/DR: No, we're not at a point where we'd be willing to stabilize Vec /String layout.

Isn't the answer "yes, Vec/String layout are not stabilized and we can reorder the fields if a different ordering is optimal"? (I'm assuming that any reordering would be an invisible implementation detail and not subject to stabilization) (edit: I now see you're probably responding to the "next trick question" at the end of the OP, as opposed to the question at the start of the OP; I'll leave my post as-is but apologies for conflating the two)

the8472 · November 29, 2023, 12:11pm

The layout has changed several times just this year and will likely change again. I'm slowly working on two things that may affect layout. One is to add a big niche to capacity, the other one is to add a niche to slice-length. Both are based on the property that they can't exceed isize::MAX. For vec/string this will automatically affect layout. For slice references I'm not sure if it'll happen automatically but even if it is hardcoded it might change the decision what is considered the optimal layout.

There's also the reference-alignment-niche RFC which won't directly affect the layout but it mentions platform-specific niches based on address-space restrictions in its future possibilities section

So imo we should be careful about stabilizing reference layout too, it doesn't seem settled yet.

though not (ptr, len, capacity) due to ptr and capacity being part of a single RawVec field.

This non-possibility is also an implementation detail for now. I recall and experimental PR that tried to put all three into a single struct so any possible reordering is on the table.

CAD97 · November 29, 2023, 5:08pm

Just to note, this one probably can't happen because of ptr::slice_from_raw_parts. And even without that, slices of ZST can be >size::MAX elements long and then that pointer as cast to a non-ZST pointee type.

Yes, this is an argument about pointers, not references. But it'd be very strange for them to have different byte validity requirements.

kornel · November 29, 2023, 5:25pm

If length is never supposed to exceed isize::MAX, it'd be better to formalize it sooner than later, since it's already useful to abuse it for arbitrary pointer metadata:

the8472 · November 29, 2023, 5:47pm

I don't see how it would be very strange. Pointers are the wild west, references have very restrictive requirements. To me they're not the same at all.

Only for slice references to non-ZST types. Pointers or ZST-references would be unaffected by such an optimization. I'm not even 100% sure yet if this will be workable. But if alignment niches in the pointer part of the references are (which would also be type-dependent) then I assume so will be niches in the length part.

daniel-pfeiffer · December 6, 2023, 9:59pm

I've been wondering the opposite: why do we need 3 metadata locally, making a String always (on 64bits) a whopping 24 bytes. If we had only (ptr, capacity) then len would be at the beginning of the heap segment (only if capacity > 0).

Moreover, since Rust is non-mut very often, we can do away with len, because I guess there is no useful case where a non-mut would have len < capacity.

scottmcm · December 6, 2023, 10:12pm

Making it a single pointer would be possible -- ThinVec in thin_vec - Rust does that -- but that has different trade-offs. It doesn't allow Vec::from_raw_parts as it works today, for example, and has different implications for people wanting to re-use Vec's allocation.

So this is one of those places where I think Rust is intentionally making Vec the simplest thing, and letting the others be crates.

This is called a slice

(You can use Box<[T]> instead of Vec<T> for cases where you want len == capacity only.)

system · March 5, 2024, 10:12pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`.into` for converting between Vec<u8>, Vec<char> and String language design	13	3300	November 29, 2022
Should Vec (and other contains) implement Display?	12	472	April 26, 2024
More about slice -> array conversions language design	3	654	July 28, 2020
A little proposal for string concatenation language design	13	1404	March 25, 2019
Another alternative to allocators for Vec and other collections libs	5	781	December 31, 2023

Can representation of Vec and String be changed?

Related Topics