Intervals use case


#1

Here I try to translate a bit of code from a little Minesweeper game written in the interesting Whiley language (that contains a SMT solver). A cell of the Minesweeper matrix is represented as:

// An exposed square is one which has been exposed by the player, and 
// displays its "rank".  The rank is the count of bombs in the eight 
// directly adjacent squares.
public type ExposedSquare is {
    int rank,
    bool holdsBomb
} where rank >= 0 && rank <= 8

public type HiddenSquare is {
    bool holdsBomb,
    bool flagged
}

public type Square is ExposedSquare | HiddenSquare

(Code from: https://github.com/Whiley/WyBench/blob/master/src/107_minesweeper/Minesweeper.whiley ).

To represent a rank in Rust I can use a u8, but it goes against the good idea of “Making illegal states unrepresentable” ( https://fsharpforfunandprofit.com/posts/designing-with-types-making-illegal-states-unrepresentable/ ). So I’ve used an enum:

#[derive(Debug)]
enum AdjacentBombs { B0, B1, B2, B3, B4, B5, B6, B7, B8 }

But to convert numbers from and to AdjacentBombs you need bug-prone switches that are perhaps worse than using an u8:

fn to_adjacent_bombs(n_bombs: u8) -> Option<AdjacentBombs> {
    use AdjacentBombs::*;

    match n_bombs {
        0 => Some(B0),
        1 => Some(B1),
        2 => Some(B2),
        3 => Some(B3),
        4 => Some(B4),
        5 => Some(B5),
        6 => Some(B6),
        7 => Some(B7),
        8 => Some(B8),
        _ => None,
    }
}

fn to_n_bombs(ab: AdjacentBombs ) -> u8 {
    use AdjacentBombs::*;

    match ab {
        B0 => 0,
        B1 => 1,
        B2 => 2,
        B3 => 3,
        B4 => 4,
        B5 => 5,
        B6 => 6,
        B7 => 7,
        B8 => 8,
    }
}

fn main() {
    let b1 = AdjacentBombs::B3;
    println!("{:?}", b1);
    let b2 = to_n_bombs(b1);
    println!("{}", b2);
    let b3 = to_adjacent_bombs(b2);
    println!("{:?}", b3);
}

You can try to “improve” the code like this:

#[repr(u8)]
#[derive(Debug, Copy, Clone)]
enum AdjacentBombs {
    B0 = 0,
    B1 = 1,
    B2 = 2,
    B3 = 3,
    B4 = 4,
    B5 = 5,
    B6 = 6,
    B7 = 7,
    B8 = 8,
}

fn to_adjacent_bombs<T>(n_bombs: T) -> Option<AdjacentBombs>
where usize: From<T> {
    const ALL_BOMBS: [AdjacentBombs; 9] = [
        AdjacentBombs::B0, AdjacentBombs::B1,
        AdjacentBombs::B2, AdjacentBombs::B3,
        AdjacentBombs::B4, AdjacentBombs::B5,
        AdjacentBombs::B6, AdjacentBombs::B7,
        AdjacentBombs::B8];

    let n_bombs = usize::from(n_bombs);
    if n_bombs > 8 { return None; }
    Some(unsafe { *ALL_BOMBS.get_unchecked(n_bombs) })
}

fn to_n_bombs(ab: AdjacentBombs ) -> u8 {
    ab as u8
}

Or with this:

fn to_adjacent_bombs(n_bombs: u8) -> Option<AdjacentBombs> {
    if n_bombs > 8 { return None; }
    Some(unsafe { std::mem::transmute::<u8, AdjacentBombs>(n_bombs) })
}

But it uses unsafe code, and it’s bug-prone, and long still.

There are also crates that try to help, like: https://crates.io/crates/enum_primitive

This seems a bit better solution, that could be supported:

use std::convert::TryFrom;

#[repr(u8)]
#[derive(Debug, Copy, Clone, TryFrom)]
enum AdjacentBombs {
    B0 = 0,
    B1,
    B2,
    B3,
    B4,
    B5,
    B6,
    B7,
    B8,
}

fn main() {
    let b1 = AdjacentBombs::B3;
    println!("{:?}", b1);
    let b2 = u8::from(b1);
    println!("{}", b2);
    let b3 = AdjacentBombs::try_from(b2);
    println!("{:?}", b3);
}

But for the type system of a reliable language like Rust I really prefer intervals (subsets) of integral values (using a hypotetical syntax):

enum Cell {
    Exposed { holds_bomb: bool, rank: u8[0 ... 8] },
    Hidden  { holds_bomb: bool, flagged: bool },
}

(A different and more general solution is to introduce refinement types in Rust, but it’s probably too much complex for general Rust code and compiler).

(To avoid “boolean blindness” ( https://existentialtype.wordpress.com/2011/03/15/boolean-blindness/ ) you can also replace those bools with FlagState { Unflagged, Flagged } and BombPresence { Missing, Present } enumerations, but this is less important in this specific data structure and program).

Once you have defined a subset type:

type Rank = u8[0 .. 9];

you should be able to convert Rank->u8 with a u8::from() and u8->Option with a Rank::try_from().


#2

Sounds like something that can be done in an external crate once const-generic is available.

struct LimitedU8<const RANGE: Range<u8>>(u8);

impl<const RANGE: Range<u8>> TryFrom<u8> for LimitedU8<RANGE> {
    fn try_from(a: u8) -> Result<Self> {
        if RANGE.contains(a) {
            Ok(LimitedU8(a))
        } else {
            Err(...)
        }
    }
}

impl<const RANGE: Range<u8>> From<LimitedU8<RANGE>> for u8 {
    fn from(a: LimitedU8<RANGE>) -> u8 { a.0 }
}

type Rank = LimitedU8<{0..9}>;

#3

Subsets of integral values is a complex feature that most probably requires compiler implementation (or compiler support) if you want to implement it well enough. You need compiler optimizations, type system support for super- and sub- typing, optimizations to remove lot of the overhead at compile-time, nicer built-in syntax, also having them as built-ins reduces compilation time (compared to thousands of lines of generics-heavy library-level code), having them as built-ins encourages programmers to use them everywhere (making their usage idiomatic).

Two of the most important things in my Rust programs are integral numbers and arrays (slices, arrays, vecs) and I think both aren’t handled with enough care and love by the Rust language/compiler yet.


#4

What optimizations do you need for from() and try_from(), which the normal inliner cannot provide or will spend a lot of time?

Does this even matter in Rust? Only types differing in lifetime involves sub-/super-typing. Interval type certainly won’t.

Agreed (although I don’t see this is important enough to warrant new syntax).


#5

What about a custom derive version of the existing enum_primitive macro (from the crate of the same name)?

It’s not as nice as const generics, but it’d mean you could do something like this:

#[derive(Debug, FromPrimitive, ToPrimitive)]
enum AdjacentBombs { B0, B1, B2, B3, B4, B5, B6, B7, B8 }

// where the `FromPrimitive` and `ToPrimitive` traits are declared as:

pub trait FromPrimitive {
  fn from_usize(n: usize) -> Option<Self>;
  ...
}

pub trait ToPrimitive {
  fn to_usize(self) -> usize;
}

// corresponding `TryFrom<{integer}>` and `From<AdjacentBombs>` impls elided

Which would skip the need for the error-prone manual conversions.

I created a similar issue on the enum_primitive crate but am yet to hear back from the author.


#6

Thank you for your answers.

What optimizations do you need for from() and try_from(), which the normal inliner cannot provide or will spend a lot of time?

Ranged (subset of) integral values need their bounds to be enforced. This needs to give a compile-time error:

let x: u8[0 .. 4] = 8;

In general if we want most people to use them often then the compiler needs to move as many such tests as possible from run-time to compile-time. I don’t know if the current optimizer is able to do this well enough. Perhaps a specialized optimization pass is needed (Swift language has some of this despite it doesn’t have ranged integrals).

Does this even matter in Rust? Only types differing in lifetime involves sub-/super-typing. Interval type certainly won’t.

You are right.

Agreed (although I don’t see this is important enough to warrant new syntax).

You need slices and arrays to support ranged intervals natively as indexes:

let v = [0u8; 8];
// len() of an array needs to be a compile-time const.
let idx: u8[0 .. v.len()] = 3;
v[idx] = 10; // No array bound test here.

Alternative simpler and less bug-prone version:

let v = [0u8; 8];
// .range returns the range of valid indexes for v.
let idx: u8[v.range()] = 3;
v[idx] = 10; // No array bound test here.

If you also want subranges of enumerations with name-only variants (Haskell Enum typeclass to give a sequence to enum variants: http://zvon.org/other/haskell/Outputprelude/Enum_c.html ):

#[derive(Enum)]
enum Day { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday }

type Weekday = Day[Monday ... Friday];

With your standard syntax it becomes like (still using the Enum trait):

type Weekday = Limited<Day, {Day::Monday ... Day::Friday}>;

In a reliable system language ranged integrals should be a basic feature like arrays. This feature is composed with some other parts that plug into each other:

  1. Ranged integrals. This means subsets of the various integers, char, enumerations with name-only variants (static predicates could be added later if desired);
  2. Value-range analysis for integral values, that’s needed to optimize the ranged integrals conversions and arithmetic operations;
  3. Slice-length analysis to convert slices <=> arrays safely, cleanly and avoiding some run-time bound tests.

The feature 2 allows to replace many “as” and try_from() with from() where the compiler is able to statically infer that a integral value is inside the bounds of the target type:

fn foo(x: u32[0 .. 256]) -> u8 {
    // Ranged value fits in u8, so no need of try_from():
    u8::from(x)
}
fn bar(x: u32) -> u8 {
    if x < 256 {
        // Value-range analysis sees it fits in
        // u8, so no need of try_from():
        u8::from(x)                    
    } else {
        0
    }
}

An example of the desire for feature 3 from real Rust code:

There’s code like: a[ 1] = mask_21bits & (load4(&self.0[ 2..]) >> 5);

Where load3 and load4 are like:

pub fn load4(input: &[u8]) -> i64 {
       (input[0] as i64)
    | ((input[1] as i64) << 8)
    | ((input[2] as i64) << 16)
    | ((input[3] as i64) << 24)
}

This is bug-prone and requires useless run-time bound tests. How the code should look:

a[ 1] = mask_21bits & (load4(&self.0[ 2..6]) >> 5);

Where load3 and load4 are like:

pub fn load4(input: &[u8; 4]) -> i64 {
       i64::from(input[0])
    | (i64::from(input[1]) << 8)
    | (i64::from(input[2]) << 16)
    | (i64::from(input[3]) << 24)
}

(Or better).

Another (#4) related feature is strongly-typed indexing for array and slices, that reduces run-time array bound tests. This means allowing you to define a type for an array/slice/vec that is only allowed to be indexed with a specified type of indexes. This is handy to not mismatch indexes when you have to handle more than one slice at a time, and allows to remove some run-time bounds. If later you want to support this for dynamically sized slices then you need dynamically ranged integrals (instead of integrals with bounds known at compile-time).


#7

Sounds like we need a LiquidRust!


#8

This could be made into a lint, just like fixed-array bounds checking

fn main() {
    let m = [1, 2, 3, 4];
    println!("{:?}", m[5]); //~ WARN this expression will panic at run-time
}

But custom lints are unstable (requires compiler plugin), and I’m not sure about clippy’s policy on third-party crates.

Given that miri is going to integrate with rustc soon, the following should be able to check the bounds for a compile-time constant:

#[inline]
const fn new(value: u8) -> Self {
    assert!(RANGE.contains(value), "value is outside of range {:?}", RANGE);
    LimitedU8(value)
}

However,

  1. Due to how const fn works, the check is still done at runtime if you write let x = LimitedU8::<{0 .. 9}>::new(10) afaik;
  2. From::from cannot be a const fn.

I don’t see why a specialized optimization pass is needed, const-folding and inlining should be enough to eliminate the runtime check when it can be proven.

If the compiler can recognize v.len() is a compile-time constant, then you could as well write it like this:

let idx = LimitedUsize::<{0 .. v.len()}>::new(3);

If not, then your syntax is impoosible too. We don’t really support types that depend on runtime numbers.

For the foo, if we support RFC 1932 one could have

impl<T, const RANGE: Range<T>> From<Limited<T, RANGE>> for T 
    where T: Ord + Bounded
    with RANGE.start >= T::MIN && RANGE.end <= T::MAX 
{
    ...
}

For the bar, I see that the type-system alone cannot fix it, it will require value-range analysis (or some kind of type-states) as you said. But why not just write it as u8::try_from(x).unwrap_or(0)…

How would that work?

let mut m: Range<usize[2 .. 6]> = 2 .. 6;
m = 3 .. 5;
let subarray: &[T; 4/*????*/] = &array[m];

For strongly-typed indexing/slicing to work, the type of 2 .. 6 cannot be a Range<usize[2 .. 6]>, but something like CompileTimeRange<usize, 2, 6>, and this type is unrelated to interval type.

If you don’t use the the x[y..z] notation we could use a const-generic function, but the constraint is out-of-scope for RFC 2000.

let subarray: &[i64; 4] = self.0.subarray::<{2 .. 6}>();

impl<T, const N: usize> [T; N] {
    fn subarray<const RANGE: Range<usize>>(&self) -> &[T; {RANGE.end - RANGE.start}] 
    with
        RANGE.start <= RANGE.end && RANGE.end <= N
    {
        unsafe { self.as_ptr().offset(RANGE.start.into()) as &[T; whatever] }
    }
}

#9

Is there some plan make it possible to force a const fn to be evaluated at compile time, so one could to something like this without making the value const?


#10

As noted with array indices, this already is a lint for regular ints:

let x: u8 = 500; // triggers a warning

Well… not specifically, but we already have constant promotion. So any value that could be a constant will be a constant and if const eval of it fails it will report an error.

I’m not sure I understand the question. The right hand side of the assignment is constant in your case… The left hand side obviously isn’t since it’s a let binding. let x = 5 + 6; also has a “constant” right side.


#11

I seem to have misremembered a bit how this worked since it doesn’t work with constant statements either, but what I mean is if it was possible to make the two bottom functions behave like the first one (especially useful for branching code on size_of/align_of etc):

Obviously with optimizations turned on llvm will optimize this down to simply a return value, but without opts, test1 and test2 in this example is evaluated at run-time. In the bounded value this would mean one could create a bounded value and do something akin to static_assert in c++ to give a compile-time error like @kennytm suggested without needing more advanced bounds on generic values (even though that would of course be neat to have some day).


#12

Have you tried using -Zmir-opt-level=3? If yes, then it’s just that the constant propagation isn’t powerful enough yet. But miri is in the process of moving to rustc, once that’s through these functions should all contain the same MIR.