Ergonomics of raw-pointer slices

kornel · September 9, 2022, 12:56pm

In unsafe code it's common to have pointer + length pairs, but due to strict requirements of &mut they can't always be turned into a safe slice.

Rust has a type for this: *mut [T]. However, this type is currently painful to use:

All slice convenience methods are implemented for safe slices only.
Index doesn't support unsafe
Raw slices don't even support indexing by range, which is safe (in the sense that creation of new raw pointers is safe). raw_slice[a..b] could be equivalent of the safe ~~ptr.add()~~ ptr.wrapping_add() with their length appropriately checked and adjusted.
SliceIndex has an unstable get_unchecked for raw pointers, but is missing checked variants that work on raw slices. Bounds checking is still valuable, even if the pointer is risky.

I'm bringing this in the context of the firecracker vm buffer overflow vulnerability. It basically needed unsafe slices, and implemented them with a bug. A get_host_address function returned a pointer valid for *mut T only (one element and no more), but the caller of this function interpreted it as a pointer for an entire buffer of some greater length — a classic buffer overflow.

I presume these are raw pointers to shared mutable memory, so &mut [u8] wouldn't be appropriate. But if *mut [T] was as easy to use as &mut [T], then this code wouldn't operate on raw pointers without a size so much, and the confusion between get() -> *mut T and get() -> *mut [T] could have been caught.

RalfJung · September 9, 2022, 1:37pm

No it cannot be safe, since this function lets you safely create a raw slice with arbitrary length. (It could be safe if it used wrapping_add, though.)

But I fully agree that our raw pointer ergonomics in general are pretty bad (obligatory link to Gankra's blog post), and for slices it is particularly bad. Some currently unstable APIs that can hopefully help:

Tracking Issue for raw slice len() method (slice_ptr_len, const_slice_ptr_len) · Issue #71146 · rust-lang/rust · GitHub
Tracking Issue for raw slice getters (slice_ptr_get) · Issue #74265 · rust-lang/rust · GitHub
Tracking Issue for `*mut [T]::split_at_mut` · Issue #95595 · rust-lang/rust · GitHub

dlight · September 9, 2022, 2:32pm

I think the lesson from this is that ergonomics of unsafe code matters, and just discouraging unsafe usage by making it noisier isn't a good safety strategy.

That's why I think Rust should have indexing operators for raw pointers (just like C), as long as they are used only inside unsafe blocks (except for range indexes). So this works:

let x: *mut i8 = ..;

unsafe { x[10] } // same as unsafe { x.offset(10) }

let y: *mut [i8] = ..;

unsafe { y[10] } // to decide: either panics on bounds checking, or UB

x[1..2] // same as std::ptr::slice_from_raw_parts(..), doesn't need unsafe

This could be accomplished by another set of indexing traits, like UnsafeIndex and UnsafeIndexMut, with a compiler-enforced constraint that a type can't implement at same time both unsafe and safe indexing traits for the same Idx.

Another way to accomplish this is allowing to mark fn index and fn index_mut as unsafe when implementing Index and IndexMut (and make functions generic on Index and IndexMut accept only safe implementations, with a new syntax like fn f<T: Index + unsafe>(x: T) to accept both safe and unsafe). But this sounds a lot trickier, and seems to require first class support of effects or something like it.

CAD97 · September 9, 2022, 3:40pm

See the keywords generics initiative.

It would be nice to not block on it, but like we have experimental T: const Trait we could also have T: unsafe Index. A useful observation is that like Trait: const Trait, it also holds that Trait: unsafe Trait.

The one big caveat is that unsafe Trait is confusable with unsafe trait, and probably even unusable generically since there's extra preconditions you don't know about.

So the "most correct" solution is probably an UnsafeIndex[Mut] trait family which are supertraits of the Index[Mut] traits, and default implemented if Index[Mut] are implemented. Roughly it'd probably look something like

/// Used for unsafe immutable indexing.
///
/// This trait is not possible to soundly use in a generic
/// context. `UnsafeIndex::index_unsafely` may have
/// arbitrary unsafe preconditions, so you must know the
/// concrete type in order to satisfy them. If `Index` is
/// implemented, `index` and `index_unsafely` have
/// identical behavior (`index_unsafely` has no unsafe
/// preconditions other than those of `index`).
///
/// Typically, `index_unsafely` still does bounds checking.
/// The unsafety comes from assuming the *receiver* is
/// valid for indexing, not that the index is.
trait UnsafeIndex<Idx: ?Sized> {
    type Output: ?Sized;
    unsafe fn index_unsafely(this: &Self, index: Idx) -> &Self::Output;
}

impl<T, Idx> UnsafeIndex<Idx> for T
where T: Index<Idx> {
    type Output = <Self as Index<Idx>>::Output;
    unsafe fn index_unsafely(this: &Self, index: Idx) -> &Self::Output {
        &this[index]
    }
}

One of few use cases for a safe trait with an unsafe function

HeroicKatora · September 9, 2022, 5:05pm

If the pointer can be derefenced, but uniqueness is too strong then &Cell<[T]> is another alternative. However, ergonomics for that type are in similarly neglected state. We can turn it into individual cells (&[Cell<T>]) and back but this is very redundant. Mutable functionality from slice I've missed in particular but that should be (un-)intuitively sound:

copy_from_slice/clone_from_slice with interesting design choices with regards to overlap, but those make naive implementation non-trivial which I see as a argument for such functionality.
fill and fill_with, with all their std-given specialization support.
get with full index support, since this is safe actually via Index
swap where the alternative is two accesses to the original and a swap.
reverse, rotate etc. with similar argument as memcpy.
windows because this is actually the equivalent of impossible iterator slice::windows_mut.
other iterator adaptors from slice are arguably 'just' cute in terms of less duplication of conversion.

Slightly annoying facts:

sort, starts_with, contains and other value judgments that require access to the element are hard to write and/or justify. Not that they wouldn't be awesome, but it's not clear if there are feasible and clear semantics. Comparison traits take &self so all elements need to be copied out of their cells if they need to be invoked.

More musings:

Which functions from &str could be added to &Cell<str>?
Is &[Cell<T>]::to_vec(&self) -> Vec<T> of any use?
Special methods that apply just to &Cell<[impl Copy]> which justify complexity with functions that do not exist on regular slices?

SkiFire13 · September 10, 2022, 8:10am

x[10] being the same as x.offset(10) has two problems:

it is not consistent with references:
- with references a place is produced, and if you want a reference you need to explicitly reborrow
- with raw pointers a raw pointer is produced, and if you want a place you need to explicitly dereference
  - also, making this consistent with references requires the raw reference operator (&raw) and this may become a footgun for those who don't know that
  - note that C implements the first one, so consistency here would also be intuitive, however in C the reference operator is equivalent to our raw reference operator, so this naturally leads to the footgun I was talking before
this requires a totally different kind of trait
- right now the Index trait has to yield a reference, and this is fundamentally incompatible with what you want
- there have been talks to GATify the Index trait, which should allow this, but I haven't seen a solid proposal yet.
- UnsafeIndex/UnsafeIndexMut don't seem a great (or complete) solution to me because someone might want to return some custom struct with a lifetime, so you're back to the GAT problem

notgull · September 11, 2022, 3:22am

I actually ran into this myself while attempting to design a Vec<T> replacement. Internally, the Vec<T> consisted of the capacity and a NonNull<[T]> to unify both the pointer to the allocation and the length. As a bonus, using the as_ref() method, it was easy to convert into a slice. The fact that it's hard to just "get the pointer" or "get the length" made me move away from that design after a while.

Honestly, it was probably a bad idea in the first place. I probably should've just used NonNull<T> and usize. However, I do think pointer usage could be a little better in these cases.

system · December 10, 2022, 3:23am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: RawSlices	9	2138	March 25, 2019
Raw pointer ergonomics	22	1839	May 3, 2023
Why is transmuting raw pointers considered safe? language design	54	1756	June 10, 2021
Split keyword `unsafe` into `unsafe` / `checked` language design	8	731	June 2, 2024
Array reference from raw parts missing? libs	7	745	November 10, 2022

Ergonomics of raw-pointer slices

Related topics