Pre-RFC: RawSlices


#1

Getting raw unchecked access to a buffer is an important use case that Rust, as a systems language, needs to be able to provide. Obviously, it should be discouraged, especially when a developer is prototyping out an implementation. However, it still needs to be something that can be done when performance matters.

Currently the way we provide raw access is sporadically offered via some unsafe_* methods on slices, and by some utilities provided by the ptr module. Everything you need is there, but it’s confusing and often cumbersome to do. Making unsafe code complicated and hard to write is a recipe for disaster. This is code that needs to have the most care taken while written.

To alleviate these issues, this RFC proposes to introduce raw slice types in the form of *const[T] and *mut[T]. These types would consist of a rawptr, and a length. They would expose the following APIs:

pub Trait RawSlice<T> {
    /// Converts the rawslice into a slice
    unsafe fn as_slice(&self) -> &[T];
    /// Converts the rawslice into a rawptr
    fn as_ptr(&self) -> *const T;

    /// Reads the data at the given index and interprets it as a value of T. 
    /// This does not move the value out.
    unsafe fn read(&self, index: uint) -> T;
}
pub Trait RawMutSlice<T> : RawSlice<T> {
    //// Converts the rawslice into a mutable slice
    unsafe fn as_mut_slice(&self) -> &mut[T];
    /// Converts the rawslice into a mutable rawptr
    fn as_mut_ptr(&self) -> *mut T;

    /// Writes a value to the given index without reading or destroying whatever
    /// data might exist at that index. Appropriate for initializing unitialized 
    /// data.
    unsafe fn write(&mut self, index: uint, val: T);
    /// Reads the data at the given index and interprets it as a value of T. 
    /// Then zeros out the memory without calling any destructors.
    unsafe fn read_and_zero(&mut self, index: uint) -> T;
    /// Swaps the value at the given index.
    unsafe fn replace(&mut self, index: uint, val: T) -> T;
    /// Copies the contents of the given rawslice into this one, assuming that they 
    /// might have overlapping regions of memory.
    unsafe fn copy(&mut self, from: *const[T]);
    /// Copies the contents of the given rawslice into this one, 
    /// assuming they don't have any overlapping memory.
    unsafe fn copy_nonoverlapping(&mut self, from: *const[T]);
}

These types would also provide unchecked indexing and subslicing via the [] operator.

In addition, existing extension traits would be modified accordingly:

pub trait ImmutableSlice<'a, T> {
    ...
    /// Converts the slice into a rawslice for unsafe manipulation
    fn as_raw(&self) -> *const[T];
}

pub trait MutableSlice<'a, T> {
    ...
    /// Converts the mutable slice into a mutable rawslice for unsafe manipulation
    fn as_mut_raw(self) -> &mut[T];
}

pub trait RawPtr<T> {
    ...
    /// Converts the ptr into a rawslice with the given length
    fn as_raw_slice(&self, len: uint) -> *const[T];
}

pub trait RawMutPtr<'a, T> {
    ...
    /// Converts the mutable ptr into a mutable rawslice with the given length
    fn as_mut_raw_slice(&self, len: uint) -> &mut[T];
}

All unsafe_* methods would then be removed from regular slices, in favour of coercing them to raw slices first.

Case study: Porting insertion sort from rawptrs to rawslices

Old code:

fn insertion_sort<T>(v: &mut [T], compare: |&T, &T| -> Ordering) {
    let len = v.len() as int;
    let buf_v = v.as_mut_ptr();

    // 1 <= i < len;
    for i in range(1, len) {
        // j satisfies: 0 <= j <= i;
        let mut j = i;
        unsafe {
            // `i` is in bounds.
            let read_ptr = buf_v.offset(i) as *const T;

            // find where to insert, we need to do strict <,
            // rather than <=, to maintain stability.

            // 0 <= j - 1 < len, so .offset(j - 1) is in bounds.
            while j > 0 &&
                    compare(&*read_ptr, &*buf_v.offset(j - 1)) == Less {
                j -= 1;
            }

            // shift everything to the right, to make space to
            // insert this value.

            // j + 1 could be `len` (for the last `i`), but in
            // that case, `i == j` so we don't copy. The
            // `.offset(j)` is always in bounds.

            if i != j {
                let tmp = ptr::read(read_ptr);
                ptr::copy_memory(buf_v.offset(j + 1),
                                 &*buf_v.offset(j),
                                 (i - j) as uint);
                ptr::copy_nonoverlapping_memory(buf_v.offset(j),
                                                &tmp as *const T,
                                                1);
                mem::forget(tmp);
            }
        }
    }
}

New code:

fn insertion_sort<T>(v: &mut [T], compare: |&T, &T| -> Ordering) {
    let len = v.len() as int;
    let buf_v = v.as_mut_slice();

    // 1 <= i < len;
    for i in range(1, len) {
        // j satisfies: 0 <= j <= i;
        let mut j = i;
        unsafe {
            // `i` is in bounds.
            let read_ptr = &buf_v[i] as *const T;

            // find where to insert, we need to do strict <,
            // rather than <=, to maintain stability.

            // 0 <= j - 1 < len, so j - 1 is in bounds.
            while j > 0 &&
                    compare(&*read_ptr, &buf_v[j - 1])) == Less {
                j -= 1;
            }

            // shift everything to the right, to make space to
            // insert this value.

            // j + 1 could be `len` (for the last `i`), but in
            // that case, `i == j` so we don't copy. The
            // offset by j is always in bounds.

            if i != j {
                let tmp = ptr::read(read_ptr);
                buf_v[j + 1, ..].copy(buf_v[j, i]);
                buf_v.write(j, tmp);
                mem::forget(tmp);
            }
        }
    }
}

The resultant code is more concise, and the use of slicing syntax makes it more clear what exactly is happening. Note that an explicit length computation is completely dropped, as it falls naturally out of the range that the slice is over.

In general, this API creates a much nicer migration path from safe slice code to unsafe slice code. Just coerce the slice to a rawslice, shadow the variable, and the rest of the code should stay the same. This minimizes the chances of a translation error causing a bug. If bugs do occur, commenting out the shadow instantly restores safe code for debugging purposes. Ideally, developers should be more likely to start with safe code if they know that the migration should be painless.


#2

FWIW, we already have *const [T] and *mut [T] in the language (they were introduced along with DST). They are currently not indexable, but I would imagine that adding unsafe unchecked indexing and slicing to them would not be a particularly major change to the language.

I do like the sound of this proposal: it makes unsafe code seem much more of a part of the language, rather than a set of functions that work on unsafe pointers alone. I like the way that this proposal makes translating & to * require very few changes other than adding in an unsafe block. Making code do things unsafely most of the time should be as simple as adding in an unsafe block, changing a few types to use *, and of course doing a good audit of the code. Having to throw a whole bunch of methods with long and complex names about your code only makes it harder to audit. The only change I’d like to make would be maybe to move the as_raw method to the std::slice::Slice trait as a default method (implemented as self.as_slice().as_raw()), allowing as_raw to be called on Vecs and so on too.


#3

Awesome! That’s the hardest part of the proposal. :P

That doesn’t super fit with the “use the safe way first” transition path, but I don’t have any strong feelings on the matter.


#4

Is this just a nice way to support arrays without bounds-checking, or something more?


#5

Gankro – *const [T] and *mut [T] are already types due to the DST work. They are quite similar to what you wrote except that they do perform bounds checking when indexed (this basically falls out of DST). I guess we could in principle maybe omit the bounds checks if we wanted.


#6

+1 for unchecked indexing, I have half a mind to ask for it on regular slices too. The slice API is just that awesome.

Has anyone actually seen a buffer overflow fail! caused by idiomatic Rust code?


#7

The problem of security vulnerabilities is that they only appear on very specific input, so you don’t see all the buffer overflow it prevents – if the buffer overflows actually happened, fixing security vulnerabilities would be much easier in C.


#8

Just an update on this: on a lark (read long car ride) I hacked out an implementation of the proposed API using the DST *const/mut[T]. Works perfectly, implementation is trivial… except for the slicing sugar, which still needs some compiler magic to get unsafe impls of.

I added a len function to RawSlice in my actual implementation, as this was an obvious oversight.


#9

Full RFC: https://github.com/rust-lang/rfcs/pull/365