[Pre-RFC] Fixed-capacity view of Vec

gnzlbg · November 29, 2018, 1:54pm

An interesting consequence of a fixed-capacity view of a vector, is that, because the vector cannot grow through the view, inserting element through it never invalidates references to elements in the vector. That is, one could iterate over the elements of such a view, while inserting elements through it, something that cannot be done with just Vec.

TimDiekmann · December 20, 2018, 4:10am

This could also be expanded to a fixed-capacity view of a String.

Additionally, there could be two variants: An owned and a borrowed one: FixedVec/String and FixedVec/StringView. The owned variant is constructed with a fixed capacity while the borrowed one is a view into another non-fixed one.

tbu · January 19, 2019, 11:25pm

I wrote a similar abstraction in the buffer crate. It allows you to pass uninitialized buffers of bytes around. This allows things like

let mut vec = Vec::with_capacity(1024);
if reader.read_buffer(&mut vec)?.len() != 0 {
    if vec[0] == 0 {
        // ...
    }
}

in safe Rust. It actually does have one reverse dependency now. But I use it extensively in crates not published on crates.io, it makes for a very good abstraction for 1) networking, 2) compression/decompression, 3) building protocol packets, all with no or minimal allocations. (You can use an uninitialized ArrayVec as well as a Vec with spare capacity.)

This all works without (or at least with little) monomorphisation cost, because each function taking a B: Buffer can immediately convert it into a BufferRef, so the function won’t be instantiated for each possible buffer type. The conversion makes sure that the underlying buffer gets resized before the function returns.

What do you think about this?

newpavlov · January 20, 2019, 5:50am

Can’t OP problem be solved by something like VecCapView? It will provide opaque view into capacity space and will allow to push data into it. Something like this:

let mut buf = Vec::with_capacity(12);
buf.extend_from_slice(b"hi");
// `cap_view` is somewhat similar to `split_at`, we also can add `cap_view_mut`,
// which will return `&mut [T]` instead of `&[T]`.
// Under the hood `VecCapView` contains mutable reference to the parent `Vec`
let (data: &[u8], mut cap_view: VecCapView) = buf.cap_view();
// values are written into memory, but not committed yet (i.e. `len` field
// is left unchanged, so if we'll forget the view, vector will not change),
// each method checks if there is enough capacity is left in the view
// returning an error otherwise
cap_view.push(b':')?;
cap_view.extend([b'f', b'o', b'o', b'='].iter().cloned())?;
cap_view.extend_from_slice(data)?;
// now it's time to "commit" data, this method consumes the view
// and simply bumps `len` field of the parent vector
cap_view.commit();

// assuming `data` is out of scope
assert_eq!(&buf[..], b"hi:foo=hi")

Shnatsel · January 20, 2019, 11:11am

I like the simplicity of the VecCapView idea. Why not automatically commit the length on dropping the view though?

Shnatsel · January 20, 2019, 11:21am

Buffer crate looks interesting. I don’t think you can actually read bytes from it without allocating another buffer and copying there, though? So you can’t implement RLE without copying?

Also, I have some safety concerns about BufferRef::new() - shouldn’t it be unsafe fn if setting the initialized parameter lets you view uninitialized memory? I’d split it in two constructors - make the current new() an unsafe fn and provide another safe one that always sets initialized to 0.

There’s also bytes crate that alread does all we need with BytesMut.split_to(), but has extra overhead due to atomics and reference counting.

newpavlov · January 20, 2019, 3:46pm

Yes, auto-commits are viable alternative, but I thought it could be useful to leave vector unchanged until we are sure that there is enough capacity for our data. I guess it's up to discussion.

tbu · January 20, 2019, 9:31pm

This is basically what the buffer crate does, except that it autocommits on drop and supports more than just Vec.

tbu · January 20, 2019, 9:37pm

I'm not sure we're talking about the same thing. Buffer allows you to pass an uninitialized buffer to a function, some space where it can write to. It's not a read-write-view of data, it always assumes its data is uninitalized, so you can't read from it.

It's not an unsafe function — if you pass it a valid &mut [u8] all the bytes are already initialized.

If you don't, that's also fine, but you're already in unsafe code by providing the slice.

tbu · January 20, 2019, 9:50pm

I suppose adding an unsafe constructor taking *mut [u8] might be good.

Shnatsel · January 27, 2019, 10:25pm

Okay, this is very interesting! I’d very much like to see one of the motivating examples rewritten to use the buffer crate to gauge safety and performance of this solution. But I already have two Rust projects in active development stage, so I’m not sure when I’ll be able to get around to it, if at all.

system · April 27, 2019, 10:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Make Vec::set_len enforce the len <= cap invariant	25	2156	April 14, 2019
Uninitialized memory	57	10756	March 25, 2019
Rust needs a safe abstraction over uninitialized memory language design	80	4804	March 21, 2025
MaybeUninit: consider supporting extract the value safely	47	1666	October 13, 2024
Canvas unsafe code in the wild Unsafe Code Guidelines	25	8270	March 25, 2019

[Pre-RFC] Fixed-capacity view of Vec

Related topics