Pre-RFC Type property functions

Summary

Two functions are added to the standard library that allow querying specific type related information.

Motivation

I'm suggesting the standardization of two functions:

const fn trivial_drop<T>() -> bool;

const fn has_direct_interior_mutability<T>() -> bool;

The main motivation is to provide a standardized way to do something that is already being done, and to provide a sound foundation for optimizations surrounding types with direct interior mutability.

A specific example:

Summary
#[inline(always)]
unsafe fn merge_up<T, F>(
    mut src_left: *const T,
    mut src_right: *const T,
    mut dest_ptr: *mut T,
    is_less: &mut F,
) -> (*const T, *const T, *mut T)
where
    F: FnMut(&T, &T) -> bool,
{
    // This is a branchless merge utility function.
    // The equivalent code with a branch would be:
    //
    // if !is_less(&*src_right, &*src_left) {
    //     ptr::copy_nonoverlapping(src_left, dest_ptr, 1);
    //     src_left = src_left.wrapping_add(1);
    // } else {
    //     ptr::copy_nonoverlapping(src_right, dest_ptr, 1);
    //     src_right = src_right.wrapping_add(1);
    // }
    // dest_ptr = dest_ptr.add(1);

    let is_l = !is_less(&*src_right, &*src_left);
    let copy_ptr = if is_l { src_left } else { src_right };
    ptr::copy_nonoverlapping(copy_ptr, dest_ptr, 1);
    src_right = src_right.wrapping_add(!is_l as usize);
    src_left = src_left.wrapping_add(is_l as usize);
    dest_ptr = dest_ptr.add(1);

    (src_left, src_right, dest_ptr)
}

#[inline(always)]
unsafe fn merge_down<T, F>(
    mut src_left: *const T,
    mut src_right: *const T,
    mut dest_ptr: *mut T,
    is_less: &mut F,
) -> (*const T, *const T, *mut T)
where
    F: FnMut(&T, &T) -> bool,
{
    // This is a branchless merge utility function.
    // The equivalent code with a branch would be:
    //
    // if !is_less(&*src_right, &*src_left) {
    //     ptr::copy_nonoverlapping(src_right, dest_ptr, 1);
    //     src_right = src_right.wrapping_sub(1);
    // } else {
    //     ptr::copy_nonoverlapping(src_left, dest_ptr, 1);
    //     src_left = src_left.wrapping_sub(1);
    // }
    // dest_ptr = dest_ptr.sub(1);

    let is_l = !is_less(&*src_right, &*src_left);
    let copy_ptr = if is_l { src_right } else { src_left };
    ptr::copy_nonoverlapping(copy_ptr, dest_ptr, 1);
    src_right = src_right.wrapping_sub(is_l as usize);
    src_left = src_left.wrapping_sub(!is_l as usize);
    dest_ptr = dest_ptr.sub(1);

    (src_left, src_right, dest_ptr)
}

/// Merge v assuming the len is even and v[..len / 2] and v[len / 2..] are sorted.
///
/// Adapted from crumsort/quadsort.
unsafe fn parity_merge<T, F>(v: &[T], dest_ptr: *mut T, is_less: &mut F)
where
    F: FnMut(&T, &T) -> bool,
{
    // SAFETY: the caller must guarantee that `dest_ptr` is valid for v.len() writes.
    // Also `v.as_ptr` and `dest_ptr` must not alias.
    //
    // The caller must guarantee that T cannot modify itself inside is_less.
    // merge_up and merge_down read left and right pointers and potentially modify the stack value
    // they point to, if T has interior mutability. This may leave one or two potential writes to
    // the stack value un-observed when dest is copied onto of src.

    // It helps to visualize the merge:
    //
    // Initial:
    //
    //  |ptr_data (in dest)
    //  |ptr_left           |ptr_right
    //  v                   v
    // [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx]
    //                     ^                   ^
    //                     |t_ptr_left         |t_ptr_right
    //                                         |t_ptr_data (in dest)
    //
    // After:
    //
    //                      |ptr_data (in dest)
    //        |ptr_left     |           |ptr_right
    //        v             v           v
    // [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx]
    //       ^             ^           ^
    //       |t_ptr_left   |           |t_ptr_right
    //                     |t_ptr_data (in dest)
    //
    //
    // Note, the pointers that have been written, are now one past where they were read and
    // copied. written == incremented or decremented + copy to dest.
    let len = v.len();
    let src_ptr = v.as_ptr();

    let len_div_2 = len / 2;

    let mut ptr_left = src_ptr;
    let mut ptr_right = src_ptr.wrapping_add(len_div_2);
    let mut ptr_data = dest_ptr;

    let mut t_ptr_left = src_ptr.wrapping_add(len_div_2 - 1);
    let mut t_ptr_right = src_ptr.wrapping_add(len - 1);
    let mut t_ptr_data = dest_ptr.wrapping_add(len - 1);

    for _ in 0..len_div_2 {
        (ptr_left, ptr_right, ptr_data) = merge_up(ptr_left, ptr_right, ptr_data, is_less);
        (t_ptr_left, t_ptr_right, t_ptr_data) =
            merge_down(t_ptr_left, t_ptr_right, t_ptr_data, is_less);
    }

    let left_diff = (ptr_left as usize).wrapping_sub(t_ptr_left as usize);
    let right_diff = (ptr_right as usize).wrapping_sub(t_ptr_right as usize);

    if !(left_diff == mem::size_of::<T>() && right_diff == mem::size_of::<T>()) {
        panic_on_ord_violation();
    }
}

This function is a fundamental piece of fast data-parallel sorting algorithms. However it is not safe for types that can modify their stack value as part of the user provided comparison function is_less. One example for such a problematic type would be Mutex<Option<Box<str>>>. As part of a call to is_less it could replace the option with None freeing the inner String. If this modification is not observed in the slice the user has access to after the sort completes, dropping the slices backing storage would free the String again -> UB. A large percentage of types, have no direct interior mutability, eg. u64, String. It would be safe to do this optimization for such types. However currently there is no way to query this information, even though the compiler has access to it. This leads to a situation where people are using, functional but inappropriate ways to approximate this property. For example using unsafe specialization to figure out if a type is Copy or not. As UnsafeCell does not implement Copy, at least for now.

Guide-level explanation

const fn trivial_drop<T>() -> bool;

This function returns true for types that have a trivial Drop implementation, false otherwise. Trivial Drop implementation means, no code can be executed as part of Drop.

const fn has_direct_interior_mutability<T>() -> bool;

This function returns true for types that have direct interior mutability, false otherwise. Direct interior mutability is the ability for a type to modify its stack value through a shared reference. u64, String, and Box<RefCell<u64>> would all return false, in contrast RefCell<u64>, Cell<String> and Mutex<u64> would all return true.

Reference-level explanation

The compiler has access to the relevant information, the functions would have to implemented with intrinsics.

Drawbacks

This information could be used to build unsound abstractions, and may embolden people to do incorrect assumptions.

Rationale and alternatives

Some alternatives would be:

  • A) Ignore the possible optimizations.
  • B) Adjust the code so it is safe for types with direct interior mutability.
  • C) Add an unsafe marker trait.

A, seems like a bad fit for Rust, a language that markets itself as an efficient way to exploit hardware resources. B might not always be feasible, for the provided example it is possible but would make the code significantly slower and complex. That's not a property that should be ignored, especially because often in such performance critical code, unsafe code is being used. C, is really problematic too. It creates this two class 'society', where builtin u64 ... sort(), and #[derive(Copy)] struct X{u32: a, u32: b} ... sort_by_key(|x| x.a) would have vastly different performance.

Regarding Copy specialization, this check is already being done in the standard library for optimizations purposes see as approximation for trivial Drop. The fact that this exists is testament to:

  • Such optimizations are important enough that they are done in practice
  • Crude approximations can and will go wrong see TrustedRandomAccessNoCoerce

What is the impact of not doing this?

Rust being slower and more power-inefficient than it could be. And people using wrong approximations, which may break code with future updates, and or stop the future improvements to Rust because incorrect assumptions have proliferated.

If this is a language proposal, could this be done in a library or macro instead? Does the proposed change make Rust code easier or harder to read, understand, and maintain?

Currently with the exception of proc macros that disassemble generated code and analyze optimizations, there is no way to surface said information in user code.

Prior art

There was a RFC with a similar goal, the Freeze auto trait https://github.com/rust-lang/rfcs/pull/2944. The main motivation for declining said PR, was a worry that it may add another possibility for library authors to inadvertently break SemVer. This new proposed design avoids this problem by adding functions that can be used for optimizations purposes, but not for specialization. Similar to mem::size_of and mem::align_of which also query type related information, and are also used to inform algorithmic optimization decisions.

Unresolved questions

  • Should these functions be const functions? One motivation to have functions instead of auto traits, was to avoid specializations inadvertently breaking with updates. I could imagine a future where const functions influence code compilation in a way that makes it fail based on properties calculated within them. But arguably that ship has already sailed with many existing const functions creating the same potential problems.

Future possibilities

It may make sense to group these functions in a module for example core::type_info, as they may not be the last type properties deemed important enough that it can be queried.


Disclaimer, I'm not attached to those names, I'm sure better ones can be found.

You should point out the difference between trivial_drop and needs_drop in std::mem - Rust

3 Likes

Oh neat, I didn't know about needs_drop, why is rust/into_iter.rs at afaf3e07aaa7ca9873bdb439caec53faffa4230c · rust-lang/rust · GitHub necessary if this function exists? At least a course analysis of the documentation would suggest it could be used there.

I believe, because it’s used to conditionally implement another trait, which might be something not possible with a const fn like that. Nonetheless, now that you mention it, it might be possible to re-design that system so that it can work with needs_drop. E.g. with an additional associated const on the trait in question that specifies whether or not the trait is actually to be “considered implemented”. It’s been a while since I contributed to that stuff, so I would need to review the system to know for sure.

Further analysis seems to suggest that this is only an internal trait, and used for optimizations. As such I see no reason why it couldn't be implemented as a const function, and then used with if else style programming, which might even yield less code duplication, and stuff like rust/iterator.rs at f55b0022db8dccc6aa6bf3f650b562eaec0fdc54 · rust-lang/rust · GitHub

Why functions specifically? We're stuck with functions for legacy reasons for size_of and friends, but this could be a sealed TypeProperties trait with const TRIVIAL_DROP: bool; and such on it.

(Not entirely sure whether that's better, but it seems work a rationale discussion at least.)

Personally I'm open for both solutions. All I want is a solid foundation to build my optimizations on. However as noted in the prior art there was a RFC for a trait based solution, and that was dismissed. I hope this will avoid the mentioned worries.

It's worth noting that the exact same semver questions apply to const fn as to traits, as const fn can be near trivially used in the type system via const generics. If we want to prevent all semver pitfalls, then it needs to be a non-const function. If the intent is just to use it in a constant-folded if; this is sufficient. (The documentation would state that branching on the function should trivially optimize out.)

If these are provided, whether via trait or const, it needs to be made clear in documentation that the value/applicability is not stable, and should only be used for optimization; it is valid for any type to give the conservative answer spuriously. This means that any abstraction that only works for T: Freeze or T: TrivalDestruct is on shaky grounds.

If we do decide to expose this information, then I'd suggest it should be via the Freeze trait for "no shared mutation before indirection" and const <T as Destruct>::SKIPPABLE: bool for "guaranteed trivial drop". (...but this makes Destruct no longer object safe, so there's no standard trait object for knowing how to destruct a value but no more...)

const fn versions in the medium term could make sense, though, as these API points involve bigger questions than solely that of exposing this information.

As noted in my open questions, I could see us being extra cautious and going with non const functions, that would be totally sufficient for the intended use case. Maybe a little ugly if you need it for things like the size of a stack array (fixed sized slice), but even those can be worked around.

That said it would be somewhat inconsistent with:

pub const fn needs_drop<T>() -> bool
where
    T: ?Sized,

This function exists today and is const, as are all other type property functions such as size_of and align_of.

That's used in where bounds for specialization. where {T::NEEDS_DROP==false} doesn't work yet, does it?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.