Hey everyone! Recently I’ve been looking for vulnerabilities in unsafe code in Rust and trying to rewrite unsafe code in popular crates as safe without degrading performance. My goal with the rewrite attempts was not to try to fix the entire ecosystem single-handedly, but to identify missing safe abstractions, good or bad language practices, etc.
This is a proposal I came up with after discovering vulnerabilities and trying to eliminate the relevant unsafe code in png
and claxon
crates.
The problem
Following up on my post Auditing popular crates: how a one-line unsafe has nearly ruined everything I’ve taken another stab at eliminating the last unsafe block in png crate without degrading performance and pretty much hit a wall. There seems to be no way to accomplish what it needs using safe Rust only.
This is the code I would like to write (simplified compared to real code):
/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_ideal(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
buffer.reserve(num_bytes_to_fill); // allocate required memory immediately, it's faster this way
let slice_to_repeat = &buffer[buffer.len()-repeating_fragment_len..]; // figure out what to repeat
buffer.extend(slice_to_repeat.iter().cycle().take(num_bytes_to_fill));
}
This encodes the intent very well, and can be easily rewritten to use vec.extend_from_slice()
for optimal performance in case the compiler doesn’t figure it out.
Sadly, this doesn’t compile: we cannot append a slice of the vector to the vector itself using slices because appending to the vector might cause it to reallocate, invalidating the reference to our slice.
So in practice this was written instead:
/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_working(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
buffer.reserve(num_bytes_to_fill); // allocate required memory immediately, it's faster this way
for _ in 0..num_bytes_to_fill {
// byte_to_copy variable is needed because buffer.push(buffer[i]) doesn't compile
let byte_to_copy = buffer[buffer.len() - repeating_fragment_len];
buffer.push(byte_to_copy);
}
}
Except this was prohibitively slow (I’ve even tried to get this to go fast with various constraints on values and failed). Extending an existing vector so the memory would be zeroed before use was also prohibitively slow, so it was rewritten using unsafe function set_len()
and operated on a slice of the vector that could contain uninitialized memory:
/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_vulnerable(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
buffer.reserve(num_bytes_to_fill); // allocate required memory immediately, it's faster this way
unsafe { // set length of the buffer up front so we can set elements in a slice instead of pushing
buffer.set_len(buffer.len() + num_bytes_to_fill);
}
for i in (buffer.len() - num_bytes_to_fill)..buffer.len() {
self.buffer[i] = self.buffer[i - repeating_fragment_len];
}
}
This implementation has a security vulnerability: if you pass it repeating_fragment_len
set to 0, it will expose contents of uninitialized memory in the output. And for a PNG decoder this is a big deal.
But not only this implementation vulnerable, it is also slow! We’re still copying bytes one by one instead of doing a memcpy()
as we know we could!
It is the worst of both worlds, and this is what was actually used in practice.
And while this example may seem a little contrived, predicting future data based on past input is a common pattern in decompressors of all kinds. Image and video decoders rely on it especially heavily, as do general-purpose decompressors such as gzip.
Making things possible (and efficient)
If we can guarantee that a vector would not reallocate while we append to it, we could implement the above function both safely and efficiently.
Let’s introduce a fixed-capacity view of Vec that’s identical to Vec in every way except it panics on out-of-capacity append instead of reallocating, and a function as_fixed_capacity()
similar to as_slice()
to obtain such a view. Then decode_rle_ideal
above can be rewritten as follows:
/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_fixedcap(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
buffer.reserve(num_bytes_to_fill); // allocate required memory immediately, it's faster this way
{ // view buffer as fixed-capacity vector so we can append data to it while its contents are borrowed
let fixed_cap_buffer = buffer.as_fixed_capacity();
let slice_to_repeat = &fixed_cap_buffer[fixed_cap_buffer.len()-repeating_fragment_len..];
fixed_cap_buffer.extend(slice_to_repeat.iter().cycle().take(num_bytes_to_fill));
}
// fixed_cap_buffer went out of scope, we can use buffer normally again
}
What’s more, it’s fairly trivial to rewrite it using a more explicit slice copying that should directly map into memcpy()
:
/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_fixedcap_memcpy(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
buffer.reserve(num_bytes_to_fill); // allocate required memory immediately, it's faster this way
{ // view buffer as fixed-capacity vector so we can append data to it while its contents are borrowed
let fixed_cap_buffer = buffer.as_fixed_capacity();
let slice_to_repeat = &fixed_cap_buffer[fixed_cap_buffer.len()-repeating_fragment_len..];
for _ in 0..(num_bytes_to_fill/repeating_fragment_len) {
fixed_cap_buffer.extend_from_slice(slice_to_repeat);
}
let remaining_bytes_to_fill = num_bytes_to_fill - fixed_cap_buffer.len();
fixed_cap_buffer.extend(slice_to_repeat.iter().take(remaining_bytes_to_fill));
}
// fixed_cap_buffer went out of scope, we can use buffer normally again
}
Not only this is entirely safe, this is also faster than the real-world unsafe implementation! Now we get the best of both worlds!
Making things convenient
A really common thing to do in code handling binary data - decoders especially - is to preallocate a certain amount of memory that’s known in advance and write data to it. The common idiom seems to be creating a vector with the desired capacity, setting vector length equal to its capacity and then passing around slices of it. And once you sit down and do some profiling, you may find that zeroing the memory you’re going to overwrite anyway adds roughly 20% to the total execution time, so you swap that vec.resize()
for unsafe { vec.set_len() }
, and that’s where it all goes south.
In fact, this is what has happened in Claxon. It worked fine over the course of normal operation, but when given a specially crafted file it included contents of uninitialized memory in the output. This issue has been assigned RUSTSEC-2018-0004, see also my original report and the author’s explanation of why this bug occurred.
I don’t know why people choose to pass around slices instead of appending to the vector. I can only state that it’s a very common thing to do in multimedia decoding crates that I’ve looked at. However, after trying to rewrite Claxon using passing a vector around instead of passing slices of it, I can make an educated guess.
Consider this trivial function:
let mut buffer = Vec::with_capacity(block_size);
unsafe {buffer.set_len(block_size)};
// -- snip --
fn decode_verbatim<R: ReadBytes>(input: &mut Bitstream<R>, bps: u32, buffer: &mut [i32]) {
for s in buffer {
*s = extend_sign_u32(try!(input.read_leq_u32(bps)), bps);
}
}
Rewriting it without either unsafety or performance overhead due to zeroing memory got me this:
let mut buffer = Vec::with_capacity(block_size);
// -- snip --
fn decode_verbatim<R: ReadBytes>(input: &mut Bitstream<R>, bps: u32, buffer: &mut Vec<i32>, bs: u16) {
for _ in 0..bs {
buffer.push(extend_sign_u32(try!(input.read_leq_u32(bps)), bps));
}
}
This is safe, but worse for two reasons. First, it requires to explicitly pass around an extra length parameter; this not only adds boilerplate, but lets you mess up. In fact, in real-world code this has complicated the code sufficiently for me to mess up all this math somewhere, which is why the Vec-based rewrite doesn’t even work correctly. And two, unlike the iterator on the slice, it is a lot harder for the optimizer to elide bounds checks in this case, which makes this safe version slower.
Could we make a better safe abstraction that would solve these problems? I believe we can, using fixed-capacity vectors.
If we could set capacity explicitly when creating a fixed-capacity view of a vector, we could pass it around just as conveniently as a slice without exposing uninitialized memory. We could even iterate over it without jumping through extra hoops to elide bounds checks if we create methods fill()
and fill_with()
analogous to extend()
and resize_with()
that never go beyond the fixed capacity. Then we’d get something like this:
let mut reusable_buffer = Vec::with_capacity(block_size);
let buffer = reusable_buffer.as_fixed_capacity();
// -- snip --
fn decode_verbatim<R: ReadBytes>(input: &mut Bitstream<R>, bps: u32, buffer: &mut FixedCapacityVec<i32>, bs: u16) {
buffer.fill_with(|| extend_sign_u32(try!(input.read_leq_u32(bps)), bps));
}
}
Now this is safe and fast!
What’s more, it simplifies our rle_decode_fixedcap
from earlier considerably:
/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_fixedcap2(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
// view buffer as fixed-capacity vector so we can append data to it while its contents are borrowed
let fixed_cap_buffer = buffer.as_fixed_capacity(buffer.len() + num_bytes_to_fill);
let slice_to_repeat = &fixed_cap_buffer[fixed_cap_buffer.len()-repeating_fragment_len..];
fixed_cap_buffer.fill(slice_to_repeat.iter().cycle());
}
And the same goes for the manually optimized function:
/// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_fixedcap2_memcpy(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
// view buffer as fixed-capacity vector so we can append data to it while its contents are borrowed
let fixed_cap_buffer = buffer.as_fixed_capacity(buffer.len() + num_bytes_to_fill);
let slice_to_repeat = &fixed_cap_buffer[fixed_cap_buffer.len()-repeating_fragment_len..];
for _ in 0..(num_bytes_to_fill/repeating_fragment_len) {
fixed_cap_buffer.extend_from_slice(slice_to_repeat);
}
fixed_cap_buffer.fill(slice_to_repeat.iter());
}
Concerns, caveats, etc
So far I’ve been focusing on binary format and multimedia decoding because that’s where memory management bugs are most common and have the highest impact. This may not be relevant in other domains, but I still consider memory safety in this domain critical.
So far I have only looked in-depth into crates with actual vulnerabilities. So while this is based on real-world failures, I still only have two samples and this might not generalize well. If you have use cases for something like this, please speak up.
All names are subject to bikeshedding.
References to production code discussed: vulnerable function from png crate, my attempt to rewrite Claxon safely
Also, I’d like to note that memory disclosure vulnerabilities are probably under-discovered in Rust code compared to use-after-free or buffer overflows because the only tool that can discover them, Memory Sanitizer, currently does not work with anything that uses Rust standard library. I literally had to write my own tool to discover the bug in Claxon, I’ll probably make a blog post announcing it soon.
At this point I’m looking for feedback on this proposal. Does it sound reasonable? Do you have use cases for this thing? Do you see a better way to solve this problem? Let me know!