New type kind: RawData


Hi, folks.

I have an idea.

Rust has special traits (kinds), supported by the compiler: Send, Sized, Copy etc.

I think that one important type kind is missing: RawData (better name suggestions are welcome).

RawData means that it is safe to read uninitialized data of that type (in sense that it won’t crash the process) and it is safe to write random bits to it.

RawData types are:

  • all ints
  • tuples, structs and fixed size arrays of RawData

Pointers and enums are not RawData even if they are Copy.

Note, variables of type RawData still have to be initialized like any other variable.

RawData has two major applications.

1) To avoid initialization of a buffer before reading into it.

Currently to read data, user has to initialize it with zeros, which is unnecessary work:

let mut buffer: Vec<u8> = Vec::new();
// Unnecessary memset, because data
// is overwritten in the next line
buffer.grow(1 << 20, 0);

This problem can be solved by using unsafe set_len function, which is inconvenient, because code becomes, well, unsafe.

If Rust had RawData trait, Vec<T> could have a function:

impl <T : RawData> Vec<T> {
    fn grow_no_init(&mut self, len: uint) { ... }

That fn grow_no_init(..) solves performance issue without hurting application safety (in sense that application won’t crash, see below).

For example, with that or similar function, Reader::push_at_least(..) can be implemented without unsafe code.

2) For parsers

Sometimes is it convenient to have serialized data be mapped to structure. For example to parse IPv6 header one could use a struct:

struct Ipv6Header {
    ver_cls_label: u32,
    payload_length: u16,
    next_header: u8,
    hop_limit: u8,
    src_address: Ipv6Address,
    dest_address: Ipv6Address,

&[T] where T is RawData may have

// return None if alignment does not match
fn bitcast<U : RawData>(&self) -> Option<U> { ... }

So it can be use it like this:

let packet_data: &[u8] = ...
let header_slice = packet_data
if header_slice.len() < 1 {
    // buffer does not contain enough data
let header = header_slice[0];

Safety concerns

Because grow_no_init() can cause leakage of sensitive information stored previously in the heap and freed, grow_no_init() function still should probably be unsafe. However, this function is still safer than set_len().

Anyway, simply presence of RawData type kind does not allow users to read uninitialized data. RawData just tells some properties of a type. It is up to library authors to decide whether their functions like grow_no_init should be safe or unsafe.


Does reading uninitialized data count as ‘memory safe’? It could leak sensitive information - certainly pointers ASLR is trying to protect, possibly application data.


Good point!

You are right, functions like grow_no_init() indeed make programs less ‘memory safe’ in sense of leaking address space information.

If I understand correctly, similar issue caused Heartbleed.

grow_no_init() (as well as current unsafe { vec.reserve() + vec.set_len(); }) should be used carefully.

So I should probably adjust my proposal. RawData type kind itself does not affect security at all, grow_no_init() does.

grow_no_init() should still be applicable only to vecs of RawData, but it should marked unsafe for the reason you mentioned. So you still have to use unsafe to call grow_no_init(), but unlike call to vec.reserve(); vec.set_len();, you can be sure that your program won’t start crashing if vector type parameter changes.


Maybe a better approach would be to have a type that keeps track of how much it is filled?


Maybe a better approach would be to have a type that keeps track of how much it is filled?

I don’t understand.


In order to prevent reading uninitialized data, I pressume.

So, if you have this:

let mut buffer: Vec<u8> = Vec::new();
// Unnecessary memset, because data
// is overwritten in the next line
my_reader.read_into(&mut buffer); // writes only 10 bytes in buffer

you should not be able to read more than 10 bytes from this buffer (so you can’t read what was in the memory where this is placed). However, now when I say it, it seems that both your examples have this (e.g. header_slice.len()).


Sorry, I meant:


Code that does read_into, needs a linear memory area to write to. It needs slice, not Vec. For example, because it passes that mutable slice to zlib or to read syscall.