Make Vec::set_len enforce the len <= cap invariant

mcy · November 29, 2018, 5:01pm

(Writing this down quickly, so I don’t have time to dig up any of the citations mentioned).

It seems slightly insane to me that Vec::set_len doesn’t panic if the length is set past the allocated capacity. I find it very, very hard to imagine a world where this is useful (and, if it is, is a failing of Vec's other public APIs). I have vague memory of rather unpleasant bugs that have come up due to sloppy use of set_len in the standard library…

This API, as it stands, is so spectacularly unsafe (since it might cause drop to read unallocated memory!) that there isn’t even an equivalent for C++'s std::vector. I think it should either be removed (which would be unpleasant) or made so that pushing the length past reserved capacity is an unconditional panic. (Given that all array slicing is similarly instrumented, I don’t think this will be a performance regression, and will only break code that is breaking invariants in unsafe).

To elaborate on “spectacularly, hilariously dangerous”, being able to set set_len past the end of the allocated space opens us up to buffer overflows, whose prevention is one of Rust’s selling points. You never ever ever ever ever ever ever ever want to be able to look at unallocated memory, unless you’re something exotic like an allocator.

sfackler · November 29, 2018, 5:07pm

.get_unchecked(1..5)

mcy · November 29, 2018, 6:20pm

Sorry, almost all. I’m making the comparison with C++, where std::vector::operator[] is much more popular than std::vector::at, even though only at performs runtime bounds checks. I suppose it was an imperfect comparison, though hopefully I got my point across.

That said, I cannot imagine any situation in which doing an unchecked set_len on a Vec is at all acceptable. Marking it as unsafe and expecting non-std to not make mistakes (when, IIRC, even std makes mistakes) is not a solution.

Ixrec · November 29, 2018, 6:21pm

What’s the use case for set_len in safe code where a bounds check would be acceptable?

mcy · November 29, 2018, 6:33pm

There isn’t. set_len is already a hilariously unsafe function. In an ideal world we would remove it entirely, but I think that having a bikeshed about churn will delay this change. This change is intended to prevent the worst of potential bugs.

mbrubeck · November 29, 2018, 6:34pm

I agree that set_len would be a better API if it enforced this invariant.

I was a bit worried that this could break existing code that calls set_len first, then immediately grows the capacity. However, I don’t think there’s any supported way to do this. For example, this code:

let mut vec = Vec::new();
unsafe {
    vec.set_len(n);
    vec.reserve(1); // grow the capacity to n + 1
}

…doesn’t work because the reserve call will try to copy out-of-bounds memory into the new allocation, assuming it doesn’t fail some internal check first.

mcy · November 29, 2018, 6:36pm

This should be explicitly banned. Grow first, then extend the length. If it breaks people, it's because they were already doing something very dangerous.

cuviper · November 29, 2018, 6:41pm

I'm in favor of enforcing the invariant, but note that the documentation has an explicit example of this:

In this example, the vector gets expanded from zero to four items without any memory allocations occurring, resulting in vector values of unallocated memory:
let mut vec: Vec<char> = Vec::new();

unsafe {
    vec.set_len(4);
}

mcy · November 29, 2018, 6:52pm

I’m not sure which bothers me more… that such a spectacularly dangerous snippet is in the docs or that no one batted an eye on the commit that added that example.

cuviper · November 29, 2018, 7:02pm

The danger wasn’t noted on the pull request either, just a formatting nitpick:

In case it’s determined to be a breaking change to assert the invariant, an alternative would be to deprecate and recreate it. I don’t know what to call a new method though – checked_set_len? That’s a mouthful, and ideally the safer choice would be more ergonomic.

kornel · November 29, 2018, 11:31pm

I use this function relatively often to get Vec with uninitialized memory (usually to give its pointer to C for writing).

I’d love if it was less hilariously dangerous by checking it’s merely uninitialized, not out of bounds. I don’t think there’s a legit use case for out of bounds.

Alternatively, I’d be happy if there was a nicer explicit API for cheaply growing the Vec by writing to uninitialized memory.

mcy · November 29, 2018, 11:40pm

Agreed; I think something like this would be cool:

unsafe fn grow_and_initialize(&mut self, new_len: usize, initializer: impl FnOnce(&mut [T])) {
  self.reserve(new_len);
  f(self.raw_vec.as_mut());
  self.set_len(new_len);
}

I’d prefer if you could make this function safe, and pass an unsafe fn(&mut [T]), but the Fn traits aren’t that smart yet. =(

scottmcm · November 29, 2018, 11:53pm

I suspect that this will never be checked in release, as the optimizer would probably not be able to remove it, and a big point of this method is that it doesn’t do things like that.

I think people would welcome a debug_assert! for it, even though that won’t fire in user code right now because of how we ship liballoc. Precedent:

mcy · November 30, 2018, 12:02am

...What? How this this in any way a reasonable selling point? I refuse to believe that a predict-false blq is such a regression that you don't want it. If you need it for liballoc-internal reasons, add a method that's public to the crate only.

scottmcm · November 30, 2018, 12:12am

For panic-safety it’s critical to keep the length up-to-date, but that tends to be in a tight loop so you don’t want the branch and the landing pads and the LLVM-refuses-to-vectorize and similar.

This method is unsafe; it’s not the normal path. By the time you’re using it, you’re performance-optimizing something, so you care about small things. And the method isn’t any safer with the capacity check – it’s still trivial to get UB with it.

mcy · November 30, 2018, 12:18am

That has not stopped people from abusing this method. Something this close within reach should not be able to cause buffer overflows.

If "you" is liballoc, then you can just set self.len directly. If "you" is a non-std crate, you should probably be wrapping RawVec instead of fudging Vec's length.

This is not about UB. With the capacity check, it is impossible to have a buffer overflow. That is infinitely safer in my book. The security track record of set_len does not inspire confidence.

Buffer overflows are really really really really bad, right up there with UAFs. Like, "haha your key material has been leaked get ready to get pwn'd" bad. Like, ok, I admit that you can't smash the stack with Vec, so it isn't as amazingly, spectacularly bad as poorly-implemented VLAs, but it's still pretty spectacularly bad.

sfackler · November 30, 2018, 3:34am

RawVec is not publicly exposed, so non-std crates are not going to be using it.

Could you be more specific about set_len’s track record?

mcy · November 30, 2018, 2:26pm

Huh, I suppose I had a bad understanding of how the facade crates are put together. It seems odd that I can access unstable docs for alloc::raw_vec::RawVec.

In any case, there are surely crates that implement a length-less unsafe buffer like RawVec. I will stress again that, if you are doing work which benchmarks indicate needs to drop Vec invariants, there's a shiny new std::alloc API you can use. This whole thing falls into the "rethink your choices that lead to this situation" bucket.

I'm repeating what the security folks at work tell me from interacting with our Rust teams (I myself do not work on a Rust team). I'll try to get some citations out of them today, though I do recall something unpleasant about [impl Copy]::repeat and something about working around a miscompilation.

sfackler · November 30, 2018, 3:35pm

RawVec is not a shiny new std::alloc API. It's an internal implementation detail of Vec and HashMap that will never be stabilized.

mcy · November 30, 2018, 4:04pm

When I said std::alloc, I was referring to allocating things yourself in nightly with std::alloc::GlobalAlloc, which is not the same as poking at liballoc internals.

I also think that it’s hardly fair to call RawVec an implementation detail, when the nomicon describes its internals in detail (mind, not as API, but as a long-form example of safe unsafe usage), and when we get to touch other internal state of a Vec directly (i.e., the eponymous set_len).

Making alloc::raw_vec::RawVec inaccessible is fine. What I’m saying is that it’s possible to implement your own RawVec without allowing manipulation of Vec in extraordinarily dangerous ways. It’s pretty clear that people who have valid uses for set_len as it exists today can just implement a growable unsafe buffer themselves, instead of leaving a massive security footgun within arms’ reach.

Relatedly, I believe Scott made a point about panic-safety, which I get, but which completely misses my point. I already insist in abort-on-panic (something I can turn on for my binaries), so I don’t care about panic-safety here; I care about libraries, std or otherwise, reaching for set_len without understanding that possibly touching unallocated memory by mistake isn’t just UB, it’s asking to have your key material leaked and your service compromised.

(I do apologize for my rather rambling posts on this topic. Trying to make a point about security sometimes causes me to be a bit harsher than I intend to.)

Topic		Replies	Views
[Pre-RFC] Fixed-capacity view of Vec libs	51	10698	April 27, 2019
Uninitialized memory	57	10387	March 25, 2019
Rc is unsafe [mostly on 32-bit targets] due to overflow	72	10079	March 25, 2019
The Trusted Iterator Length Problem	29	7333	March 25, 2019
Types as Contracts: Implementation and Evaluation Unsafe Code Guidelines	20	3278	March 25, 2019

Make Vec::set_len enforce the len <= cap invariant

Related topics