[Pre-RFC] Fixed-capacity view of Vec

An interesting consequence of a fixed-capacity view of a vector, is that, because the vector cannot grow through the view, inserting element through it never invalidates references to elements in the vector. That is, one could iterate over the elements of such a view, while inserting elements through it, something that cannot be done with just Vec.

4 Likes

This could also be expanded to a fixed-capacity view of a String.

Additionally, there could be two variants: An owned and a borrowed one: FixedVec/String and FixedVec/StringView. The owned variant is constructed with a fixed capacity while the borrowed one is a view into another non-fixed one.

I wrote a similar abstraction in the buffer crate. It allows you to pass uninitialized buffers of bytes around. This allows things like

let mut vec = Vec::with_capacity(1024);
if reader.read_buffer(&mut vec)?.len() != 0 {
    if vec[0] == 0 {
        // ...
    }
}

in safe Rust. It actually does have one reverse dependency now. :open_mouth: But I use it extensively in crates not published on crates.io, it makes for a very good abstraction for 1) networking, 2) compression/decompression, 3) building protocol packets, all with no or minimal allocations. (You can use an uninitialized ArrayVec as well as a Vec with spare capacity.)

This all works without (or at least with little) monomorphisation cost, because each function taking a B: Buffer can immediately convert it into a BufferRef, so the function won’t be instantiated for each possible buffer type. The conversion makes sure that the underlying buffer gets resized before the function returns.

What do you think about this?

2 Likes

Can’t OP problem be solved by something like VecCapView? It will provide opaque view into capacity space and will allow to push data into it. Something like this:

let mut buf = Vec::with_capacity(12);
buf.extend_from_slice(b"hi");
// `cap_view` is somewhat similar to `split_at`, we also can add `cap_view_mut`,
// which will return `&mut [T]` instead of `&[T]`.
// Under the hood `VecCapView` contains mutable reference to the parent `Vec`
let (data: &[u8], mut cap_view: VecCapView) = buf.cap_view();
// values are written into memory, but not committed yet (i.e. `len` field
// is left unchanged, so if we'll forget the view, vector will not change),
// each method checks if there is enough capacity is left in the view
// returning an error otherwise
cap_view.push(b':')?;
cap_view.extend([b'f', b'o', b'o', b'='].iter().cloned())?;
cap_view.extend_from_slice(data)?;
// now it's time to "commit" data, this method consumes the view
// and simply bumps `len` field of the parent vector
cap_view.commit();

// assuming `data` is out of scope
assert_eq!(&buf[..], b"hi:foo=hi")

I like the simplicity of the VecCapView idea. Why not automatically commit the length on dropping the view though?

Buffer crate looks interesting. I don’t think you can actually read bytes from it without allocating another buffer and copying there, though? So you can’t implement RLE without copying?

Also, I have some safety concerns about BufferRef::new() - shouldn’t it be unsafe fn if setting the initialized parameter lets you view uninitialized memory? I’d split it in two constructors - make the current new() an unsafe fn and provide another safe one that always sets initialized to 0.

There’s also bytes crate that alread does all we need with BytesMut.split_to(), but has extra overhead due to atomics and reference counting.

Yes, auto-commits are viable alternative, but I thought it could be useful to leave vector unchanged until we are sure that there is enough capacity for our data. I guess it's up to discussion.

This is basically what the buffer crate does, except that it autocommits on drop and supports more than just Vec.

I'm not sure we're talking about the same thing. Buffer allows you to pass an uninitialized buffer to a function, some space where it can write to. It's not a read-write-view of data, it always assumes its data is uninitalized, so you can't read from it.

It's not an unsafe function — if you pass it a valid &mut [u8] all the bytes are already initialized.

If you don't, that's also fine, but you're already in unsafe code by providing the slice.

I suppose adding an unsafe constructor taking *mut [u8] might be good.

Okay, this is very interesting! I’d very much like to see one of the motivating examples rewritten to use the buffer crate to gauge safety and performance of this solution. But I already have two Rust projects in active development stage, so I’m not sure when I’ll be able to get around to it, if at all.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.