Some examples of simple stronger static typing


#1

This is a post of musings. While Rust design takes its code correctness and performance from ML-family and C++ (and other) languages, the Ada language is another rich source of ideas worth copying.

To better focus my mind I often like to use a concrete and not too much long but complete program example. This time I use this not too much hard to solve Euler Project Problem: https://projecteuler.net/problem=244

There are many ways to solve this problem efficiently. The starting solution I use is this C++ one:

That C++ solution is not the fastest possible (it copies strings too much), but it’s an example of good enough C++ code that’s sufficiently efficient and sufficiently clean to read and understand.

On my laptop with gcc V.6.1.0 compiling with -Ofast this code finds the right answer (96356848 that corresponds to the LLURRDLLLURRDLUURULDLURDRRULDDRD moves) in about 0.12 seconds.

If in that C++ code I replace the usage of with <unordered_set> the run-time goes down to about 0.08 seconds.


This is my first direct Rust translation:

use std::collections::{HashSet, VecDeque};

fn e244() -> u64 {
    const N_CELLS: usize = 16;
    type Configuration = [u8; N_CELLS];

    #[derive(Clone)]
    struct GameNode {
        state: u32,
        move_sequence: Vec<u8>
    }

    impl GameNode {
        fn new(config: &Configuration) -> Self {
            assert!(config.iter().filter(|&&c| c == b'_').count() == 1);
            GameNode {
                state: GameNode::compute_hash(config),
                move_sequence: vec![]
            }
        }

        fn compute_hash(config: &Configuration) -> u32 {
            let mut hash = 0;

            for (i, &c) in config.iter().enumerate() {
                match c {
                    b'_' => {
                        hash |= 1 << (i + N_CELLS);
                        hash &= !(1 << i);
                    }
                    b'R' => { hash |= 1 << i; },
                    _ => (),
                }
            }

            hash
        }

        fn find_empty_pos(&self) -> u32 {
            for i in 0 .. N_CELLS as u32 {
                if (self.state & (1 << (i + N_CELLS as u32))) != 0 {
                    return i;
                }
            }
            panic!();
        }

        fn move_internal(&mut self, base: u32, shift: i32) {
            self.state ^= (1 << (base + N_CELLS as u32)) |
                          (1 << (base as i32 + N_CELLS as i32 + shift));
            if (self.state & (1 << (base as i32 + shift))) != 0 {
                self.state |= 1 << base;
            } else {
                self.state &= !(1 << base);
            }
            self.state &= !(1 << (base as i32 + shift));
        }

        fn move_left(&mut self) -> bool {
            let empty_pos = self.find_empty_pos();
            if empty_pos % 4 == 3 { return false; }
            self.move_internal(empty_pos, 1);
            self.move_sequence.push(b'L');
            true
        }

        fn move_right(&mut self) -> bool {
            let empty_pos = self.find_empty_pos();
            if empty_pos % 4 == 0 { return false; }
            self.move_internal(empty_pos, -1);
            self.move_sequence.push(b'R');
            true
        }

        fn move_up(&mut self) -> bool {
            let empty_pos = self.find_empty_pos();
            if empty_pos > 11 { return false; }
            self.move_internal(empty_pos, 4);
            self.move_sequence.push(b'U');
            true
        }

        fn move_down(&mut self) -> bool {
            let empty_pos = self.find_empty_pos();
            if empty_pos < 4 { return false; }
            self.move_internal(empty_pos, -4);
            self.move_sequence.push(b'D');
            true
        }
    }

    fn visit(starting: &Configuration, destination: &Configuration) -> Vec<u8> {
        let node = GameNode::new(starting);
        let destination_status = GameNode::compute_hash(destination);

        let mut visited = HashSet::new();
        let mut bfs_queue = VecDeque::new();
        bfs_queue.push_back(node);

        while let Some(top) = bfs_queue.pop_front() {
            if top.state == destination_status {
                return top.move_sequence;
            }
            if visited.contains(&top.state) {
                continue;
            }
            visited.insert(top.state);

            // The clone() are called about 285_000 times.
            let mut next = top.clone();
            if next.move_right() { bfs_queue.push_back(next); }

            let mut next = top.clone();
            if next.move_left() { bfs_queue.push_back(next); }

            let mut next = top.clone();
            if next.move_down() { bfs_queue.push_back(next); }

            let mut next = top.clone();
            if next.move_up() { bfs_queue.push_back(next); }
        }

        vec![] // No solution.
    }

    fn checksum(move_sequence: &[u8]) -> u64 {
        move_sequence
        .iter()
        .fold(0, |acc, &c| (acc * 243 + u64::from(c)) % 100_000_007)
    }

    const STARTING: &'static Configuration =
        b"_RBB\
          RRBB\
          RRBB\
          RRBB";
    const DESTINATION: &'static Configuration =
        b"_BRB\
          BRBR\
          RBRB\
          BRBR";

    checksum(&visit(&STARTING, &DESTINATION))
}

fn main() {
    println!("{}", e244() == 96_356_848);
}

That first version improves over the C++ code in varius small ways, I’ve removed the useless “depth” field, I’ve made Configuration a stronger fixed-size type, I’ve used a fold(), Vec instead of C++ strings, I have made most ingegral types unsigned, and I’ve cleaned up the code a little in various other parts. But it’s essentially the same code.

Compiled with -O on a Nightly Rustc, that first Rust version runs in about 0.06 seconds, faster than the unordered_set C++ version.

I have also tried to replace the Vec of move_sequence with a SmallVec<[u8; 16]>, to remove some heap allocations, but the performance gets worse or it’s the same :frowning:


But in a statically typed language I often like to use stronger (more strict) types, so this second Rust version uses two enums instead of generic u8 (but the enums contain exactly the same data as the Vec). With this stronger types the Rust compiler will not allow you to put a wrong unsigned byte inside the array in functions like move_right():

fn e244() -> u64 {
    mod e244 {
        use std::collections::{HashSet, VecDeque};

        #[derive(Clone, Copy, PartialEq, Eq)]
        #[repr(u8)] enum Cell { E_ = b'_', R = b'R', B = b'B' }

        #[derive(Clone, Copy)]
        #[repr(u8)] enum Move { L = b'L', R = b'R', U = b'U', D = b'D' }

        const N_CELLS: usize = 16;
        type Configuration = [Cell; N_CELLS];

        #[derive(Clone)]
        struct GameNode {
            state: u32,
            move_sequence: Vec<Move>
        }

        impl GameNode {
            fn new(config: &Configuration) -> Self {
                assert!(config.iter().filter(|&&c| c == Cell::E_).count() == 1);
                GameNode {
                    state: GameNode::compute_hash(config),
                    move_sequence: vec![]
                }
            }

            fn compute_hash(config: &Configuration) -> u32 {
                let mut hash = 0;

                for (i, &c) in config.iter().enumerate() {
                    match c {
                        Cell::E_ => {
                            hash |= 1 << (i + N_CELLS);
                            hash &= !(1 << i);
                        }
                        Cell::R => { hash |= 1 << i; },
                        _ => (),
                    }
                }

                hash
            }

            fn find_empty_pos(&self) -> u32 {
                for i in 0 .. N_CELLS as u32 {
                    if (self.state & (1 << (i + N_CELLS as u32))) != 0 {
                        return i;
                    }
                }
                panic!();
            }

            fn move_internal(&mut self, base: u32, shift: i32) {
                self.state ^= (1 << (base + N_CELLS as u32)) |
                              (1 << (base as i32 + N_CELLS as i32 + shift));
                if (self.state & (1 << (base as i32 + shift))) != 0 {
                    self.state |= 1 << base;
                } else {
                    self.state &= !(1 << base);
                }
                self.state &= !(1 << (base as i32 + shift));
            }

            fn move_left(&mut self) -> bool {
                let empty_pos = self.find_empty_pos();
                if empty_pos % 4 == 3 { return false; }
                self.move_internal(empty_pos, 1);
                self.move_sequence.push(Move::L);
                true
            }

            fn move_right(&mut self) -> bool {
                let empty_pos = self.find_empty_pos();
                if empty_pos % 4 == 0 { return false; }
                self.move_internal(empty_pos, -1);
                self.move_sequence.push(Move::R);
                true
            }

            fn move_up(&mut self) -> bool {
                let empty_pos = self.find_empty_pos();
                if empty_pos > 11 { return false; }
                self.move_internal(empty_pos, 4);
                self.move_sequence.push(Move::U);
                true
            }

            fn move_down(&mut self) -> bool {
                let empty_pos = self.find_empty_pos();
                if empty_pos < 4 { return false; }
                self.move_internal(empty_pos, -4);
                self.move_sequence.push(Move::D);
                true
            }
        }

        fn visit(starting: &Configuration, destination: &Configuration) -> Vec<Move> {
            let node = GameNode::new(starting);
            let destination_status = GameNode::compute_hash(destination);

            let mut visited = HashSet::new();
            let mut bfs_queue = VecDeque::new();
            bfs_queue.push_back(node);

            while let Some(top) = bfs_queue.pop_front() {
                if top.state == destination_status {
                    return top.move_sequence;
                }
                if visited.contains(&top.state) {
                    continue;
                }
                visited.insert(top.state);

                // The clone() are called about 285_000 times.
                let mut next = top.clone();
                if next.move_right() { bfs_queue.push_back(next); }

                let mut next = top.clone();
                if next.move_left() { bfs_queue.push_back(next); }

                let mut next = top.clone();
                if next.move_down() { bfs_queue.push_back(next); }

                let mut next = top.clone();
                if next.move_up() { bfs_queue.push_back(next); }
            }

            vec![] // No solution.
        }

        fn checksum(move_sequence: &[Move]) -> u64 {
            move_sequence
            .iter()
            .fold(0, |acc, &c| (acc * 243 + c as u64) % 100_000_007)
        }

        pub fn solve() -> u64 {
            use self::Cell::*;

            const STARTING: Configuration = [E_, R, B, B,
                                              R, R, B, B,
                                              R, R, B, B,
                                              R, R, B, B];

            const DESTINATION: Configuration = [E_, B, R, B,
                                                 B, R, B, R,
                                                 R, B, R, B,
                                                 B, R, B, R];
            checksum(&visit(&STARTING, &DESTINATION))
        }
    }

    e244::solve()
}

fn main() {
    println!("{}", e244() == 96_356_848);
}

Unfortunately this second Rust version runs in about 0.08 seconds, despite I think the run-time data is the same. So I think this is some kind of performance bug worth exploring and fixing.

I have had to use the “mod e244 {}” suggestion (as shown here pre-RFC: support `use Enum::*` for function-local enums ) because I wanted to nest the code in a function and I needed the glob use of the Cell variants to allow me to write short configurations inside the solve() function. So I approve the efforts to allow glob enum imports in functions.

Despite this second Rust version is slower, overall I like it more than the first Rust version because in my opinion having magic char constants spread in your code is often bad even for 150 lines long programs.


You can also make this first (and similarly the second) program a little less bug-prone (to be more sure functions like move_up() are correct) if you define Configuration as a 4x4 matrix (it’s less bug-prone because every index is bounded in 0…4 intervals and your up and down movements are better constrained, while in the 1D case you can move around in the linearized matrix in a wrong way):

enum TABLE_SIDE: usize = 4;
type Configuration = [[Cell; TABLE_SIDE]; TABLE_SIDE];

The memory used is the same and even the layout of the data in memory is the same. But then iterating on all the cells become a bit less handy, you need two loops (in theory a good optimizer could unroll the loops for both those little 1D and 2D arrays, giving the same access performance).

For a matrix like [[Cell; 4]; 4] I’d probably like some more handy way to perform a maximally efficient flattened (linearized) iteration on all the matrix cells (equivalent to performing an iteration on a [Cell; 16] mem::transmute of the matrix. D language allows you to do it safely:

void main() {
    uint[4][4] m = 1;
    foreach (ref x; cast(uint[16])m) {
        x++;
    }
}

It’s sufficiently safe because the D compiler verifies that lengths match correctly at compile-time:

void main() {
    const uint[4][4] m = 1;
    foreach (immutable x; cast(uint[15])m) {
    }
}

test.d(3,35): Error: cannot cast expression m of type uint[4][4] to uint[15] because of different sizes

Like in Rust:

#![feature(type_ascription)]
fn main() {
    let m = [[1u32; 4]; 4];
    for &x in unsafe { std::mem::transmute(m) : [u32; 15] }.iter() {
    }
}

error[E0512]: transmute called with differently sized types: [[u32; 4]; 4] (512 bits) to [u32; 15] (480 bits)


If you look at this second Rust version with Ada glasses on (or even D language glasses), you see many small expressivity limitations and problems.

This is part of the skeleton of the second version:

fn e244() -> u64 {
    mod e244 {
        fn checksum(...) -> u64 {
            //...
        }

        pub fn solve() -> u64 {
            //...
            checksum(...)
        }
    }

    e244::solve()
}

Code like this is not DRY:

fn foo() -> u64 {
    fn bar() -> typeof(return) {
    }
    bar()
}

D language allows you to write DRY code in that case:

ulong foo() {
    typeof(return) bar() {
        return 0;
    }
    return bar();
}
void main() {}

typeof(return) is the type of the return value of the function where you use it, so here typeof(return) equals to the return type of foo() function. So bar() must return the same type as foo, and this is what you want, and it’s more DRY. If you later want foo() to return an uint, you have to change the type only in one point.

In Rust you can do:

type T = u64;
fn foo() -> T {
    fn bar() -> T {
        0
    }
    bar()
}
fn main() {}

But this forces you to define a type at outer level. It’s not as convenient.


This is how I have defined the moves and cell state in the second Rust version:

#[repr(u8)] enum Cell { E_ = b'_', R = b'R', B = b'B' }
#[repr(u8)] enum Move { L = b'L', R = b'R', U = b'U', D = b'D' }
type Configuration = [Cell; 16];

use self::Cell::*;
const STARTING: Configuration = [E_, R, B, B, R, R, B, B, R, R, B, B, R, R, B, B];
const DESTINATION: Configuration = [E_, B, R, B, B, R, B, R, R, B, R, B, B, R, B, R];

Unlike a u8 array like b"_RBBRRBBRRBBRRBB", those arrays of enums allow you to catch the bug at compile time if you write by mistake (this contains a ‘P’):

b"_RBBRRBBRPBBRRBB"

But in Ada the enumerations have a quite different design, and they are rather handy for certain tasks. Instead of using the Ada syntax I use a Rust-Ada syntax mix:

enum Cell : u8char = '_', 'R', 'B';
const STARTING: String(16) of Cell = "_RBBRRBBRRBBRRBB";

The Ada compiler verifies at compile-time that string contains only chars of specified subset of all 8 bit ASCII chars.

(Ada also allows you to do other things, like to iterate on the whole range of an enum (from ‘_’ to ‘B’ here). (The D language standard library has a template that allows you to do the same), to denote and find the first and last variant of an enumeration, and so on).

Here he explains how using the very strong and very precise types of Ada felt almost like building and assembling together the cogs and mechanisms of the physical machine, to build a software simulation:

http://www.adacore.com/adaanswers/lorenz-code

Ada allowed me to concisely express the algorithms I wanted to implement. Computing the Fourier amplitudes of the frequencies of interest, convoluting the demodulated signal with the symbol patterns of the Baudot alphabet, and extracting the teleprinter symbol stream with clock recovery based on a phase locked loop were all easily implemented in Ada. Representing the SZ42 in software by declaring data types for the wheels and putting them together to a data type for the entire machine felt a bit like building a real machine piece by piece. While implementing the cryptographic attacks on the SZ42, I could concentrate on the design of the algorithms, while their representation in Ada was straight forward.<


The Move sequences are represented as:

#[repr(u8)]
enum Move { L = b'L', R = b'R', U = b'U', D = b'D' }

struct GameNode {
    state: u32,
    move_sequence: Vec<Move>
}

But there are only four of those enums, so a well designed language could offer a way to represent them with just two bits. Ada has a “Pack” pragma, that seems usable, but not very well defined:

https://en.wikibooks.org/wiki/Ada_Programming/Pragmas/Pack

But in a well designed language not everything needs to be build-in, so you can replace move_sequence with this kind of vector: http://zitsen.github.io/nbits_vec.rs/nbits_vec/struct.NbitsVec.html

The disadvantage of using such representation is the loss of some correctness assurance, because you have manually to translate from/to the Move enum representation.

While something this allows you to let ther compile handle those details correctly (bit_packed here equals to an “u2” value):

#[repr(bit_packed)]
enum Move { L = b'L', R = b'R', U = b'U', D = b'D' }

Then you have to benchmark your program to see if the time needed by the bit twiddling to handle the vector of u2 is worth the performance gain from having and cloning about 285_000 of about four times shorter vectors.


In the second Rust version the Configuration is defined as:

#[repr(u8)] enum Cell { E_ = b'_', R = b'R', B = b'B' }
type Configuration = [Cell; N_CELLS];

In Ada you can do better, because you can add boty dynamic and static predicates to types:

With something like an Ada 2012 Static_Predicate you can verify at compile-time that a Configuration contains exactly one empty cell (here I have named it “E_”). So the data and literas you put in your code are correct and don’t need to be verified at run-time.

A feature like a well implemented Static_Predicate in Rust will also allow you to define “good enough” Ada-style ranged types in library code, if you also have few empty (unsafe?) traits like Associative, Distributive, Commutative, etc the compiler uses to optimize (and simplify) user-defined integral values (including maybe Bigints) as well as built-in integral types.

I think this feature is sufficiently important to make Rust stronger (and more efficient with user-defined types).

Once you’re statically sure a Configuration contains one empty item, you can remove the run-time pre-condition from the GameNode::new():

fn new(config: &Configuration) -> Self {
    assert!(config.iter().filter(|&&c| c == Cell::E_).count() == 1);
    GameNode {
        state: GameNode::compute_hash(config),
        move_sequence: vec![]
    }
}

If you also add something like an Ada-like Dynamic_Predicate to the GameNode struct, the D language too allows you to do add struct/class invariants, so you can be more sure a method like find_empty_pos() never panics at run-time.


Ada offers something better to write a line of code like this:

.fold(0, |acc, &c| (acc * 243 + c as u64) % 100_000_007)

It allows you to extract the value representation of an enum without using hard cast like that “as 64” in Rust.

Ada also allows you to index an array with a contigous enumeration, this is both safe, efficient, and allows sometimes to write at a bit higher level than the equivalent code you have to write in C/Rust languages.

Ada language also allows you to define subtypes, an example is a subset of another enumeration. This allows you to statically enforce certain variables only contain a subset of the variants of a larger enum. In some cases this is quite handy. Here you can see a situation where the D language lacks this feature:

http://rosettacode.org/wiki/Universal_Turing_machine#Nearly_Strongly_Typed_Version

This associative arrays in D language is typed less strongly than the equivalent data in the Ada code at the top of the page:

// The first index of this 'rules' matrix is a subtype of State
// because it can't contain H, but currently D can't enforce this,
// statically unlike Ada language.
Rule[EnumMembers!Symbol.length][EnumMembers!State.length - 1] mRules;

In D language there’s a feature I miss in Rust, that Rust could introduce, the Value Range Analysis (VRA). In Rust it’s even more useful because values are immutable by default. An example where it’s handy:

fn find_empty_pos(&self) -> u32 {
    for i in 0 .. N_CELLS as u32 {
        if (self.state & (1 << (i + N_CELLS as u32))) != 0 {
            return i;
        }
    }
    panic!();
}

N_CELLS is a compile-time constant, so you know “i” is in a range that is contained in an u32 value. Also “i + N_CELLS” is < 32, so shifting too fits in an u32 and it can’t shift too much or by a negative number. So with VRA you don’t actually need those casts to be safe. Rust doesn’t like implicit type casts, so there’s another solution:

fn find_empty_pos(&self) -> u32 {
    for i in 0 .. u32::from(N_CELLS) {
        if (self.state & (1 << u32::from(i + N_CELLS))) != 0 {
            return i;
        }
    }
    panic!();
}

Currently that code doesn’t compile because N_CELLS is an usize, but once you have VRA the u32::from() could be made smarter and allow that provably safe type conversion from the correctly bounded usize to u32. This allows to remove some dangerous “as” from the code, keeping the code as fast as before (and safer).


In the code there’s a function that performs bit twlidding, it returns a value in 0…16:

fn find_empty_pos(&self) -> u32 {
    for i in 0 .. N_CELLS as u32 {
        if (self.state & (1 << (i + N_CELLS as u32))) != 0 {
            return i;
        }
    }
    panic!();
}

In Ada you can statically enforce that with a ranged type:

type Range_Type is range 0 .. 15;

The result of find_empty_pos() is later given as ‘base’ argument to the move_internal() function:

fn move_internal(&mut self, base: u32, shift: i32) {
    self.state ^= (1 << (base + N_CELLS as u32)) |
                  (1 << (base as i32 + N_CELLS as i32 + shift));
    if (self.state & (1 << (base as i32 + shift))) != 0 {
        self.state |= 1 << base;
    } else {
        self.state &= !(1 << base);
    }
    self.state &= !(1 << (base as i32 + shift));
}

Here the input ‘base’ value could still be of the same ranged type, so you can remove some of those “as” hard casts and replace them with u32::from() as I explained before about Value Range Analysis.

Ranged types (and subranges) allow you to be more explicit, and they help catch some of your bugs very well. Rust panics on integral overflows, this has already helped me quickly find many bugs in my Rust code. In C/D those bug are much harder to catch. The ranged values are similar but better, they help you to catch other bugs more precisely. One advantage of having a ranged value is that its correctness is verified only when the value is created. This means that the program doesn’t need to verify its bounds again later. And if your program has a bug, the code panics at run-time as early as possible, when the wrong value is created and not when it’s later used. Catching wrong values early, when you create them, helps a lot the debugging, because the error you receive is where you create the mistake.

Another nice thing about most of this stuff is that it’s optional. If you prefer to represent your 0…16 value in an u32 you can do it. So you can increase the complexity of your types and your code only when and where you want, where you think it adds value to your code (or in high integrity code).

Ranged values are also useful to be used as array indexes. If you move around such values the compiler still knows their range, so if you are using fixed-size arrays, it can remove bound tests in more cases (and even if you are using Vecs, the compiler can still remove some bound tests, if it knows the vec is large enough, using a “Slice Length Analysis”).


This simple program doesn’t allow me to show all the possible ideas you can get from Ada (and other languages). One Ada feature I’ve not shown here is that Ada allows you to optionally define the type (like a newtype of usize in Rust, and more) of the index of an array. Later that array can be indexed only by values of that type. So if you are using two or more different arrays with several indexes, the compiler statically avoids you to use the wrong indexes by mistake. A simple example is avoiding mixing the index of the rows with the index of the columns of a 2D matrix. Idiomatic Rust code uses less direct array indexing compared to Ada code, but you can still find lot of Rust code that uses arrays and indexes.

This post is already long enough, so better leave the other stuff for later. I hope to see some more optional stronger typing in Rust :slight_smile:


#2

I have laboriously analysed the asm output of the two versions, and I think I have found the performance bug (or one of them). It’s the clone().

This first version uses Vec<u8>:

fn main() {
    let v = vec![b'U'; 50];

    let mut total: u32 = 0;
    for _ in 0 .. 10_000_000 {
        let v2 = v.clone();
        total += v2[28] as u32;
    }

    println!("{}", total);
}

The second version uses a Vec<Move>:

fn main() {
    #[allow(dead_code)]
    #[derive(Clone, Copy)]
    #[repr(u8)] enum Move { L = b'L', R = b'R', U = b'U', D = b'D' }

    let v = vec![Move::U; 50];

    let mut total: u32 = 0;
    for _ in 0 .. 10_000_000 {
        let v2 = v.clone();
        total += v2[28] as u32;
    }

    println!("{}", total);
}

To me the first version runs in about 0.54 seconds, while the second in about 1.12 seconds.

Edit: and manually implementing this unsafe workaround clone() makes the second version as fast as the first:

impl Clone for GameNode {
    fn clone(&self) -> Self {
        use std::mem::transmute;
        let aux: &[u8] = unsafe { transmute(&self.move_sequence[..]) };
        let move_sequence2: Vec<Move> = unsafe { transmute(aux.to_vec()) };

        GameNode {
            state: self.state,
            move_sequence: move_sequence2
        }
    }
}

So to use stronger typing that can statically catch bugs I’ve used some unsafe code…


#3

You should report this. There’s no fundamental reason for this happening, the codegen should be identical.


#4
            // The clone() are called about 285_000 times.
            let mut next = top.clone();
            if next.move_right() { bfs_queue.push_back(next); }

            let mut next = top.clone();
            if next.move_left() { bfs_queue.push_back(next); }

            let mut next = top.clone();
            if next.move_down() { bfs_queue.push_back(next); }

            let mut next = top.clone();
            if next.move_up() { bfs_queue.push_back(next); }

This could probably be rewritten to only clone if a change succeeds:

            if let Some(next) = top.move_right() { bfs_queue.push_back(next); }
            if let Some(next) = top.move_left() { bfs_queue.push_back(next); }
            if let Some(next) = top.move_down() { bfs_queue.push_back(next); }
            if let Some(next) = top.move_up() { bfs_queue.push_back(next); }

#5

Right, I should start writing Rust bug reports & enhancement requests soon.


#6

A nice idea I didn’t manage to find, and the resulting code is probably more idiomatic. With this change the Vec.clone() gets called about 215_000 times instead of about 285_000 times.


#7

On top of that you might be able to allocate the moves in something like:

struct Moves {
    moves: [Move; 16],
    count: usize,
    base: Option<Rc<Moves>>
}

And once the array is full, you’d Rc-allocate it and share it between all of the directions created next.
The inline array length is 16 there but you can adjust it for optimal performance.


#8

The solving strategy used in the original C++ is not the fastest for this problem, it’s just sufficiently readable and allowed me to use it for the purposes of my post. If you want a faster solution, it’s better to throw away that solution and start from scratch, this runs in about 0.03 seconds:

fn main() {
    fn e244() -> u64 {
        // Inputs.
        const S: u32 = 0b_1100_1100_1100_1100;
        const T: u32 = 0b_0101_1010_0101_1010;

        fn create_moves() -> [[i8; 4]; 16] {
            let mut moves = [[0i8; 4]; 16];
            for gap in 0 .. 16 {
                for dir in 0 .. 4 {
                    if match dir { 0 => (gap & 3) == 3,
                                   1 => (gap & 3) == 0,
                                   2 => gap >= 12,
                                   _ => gap < 4 } {
                        moves[gap][dir] = -1;
                    } else {
                        moves[gap][dir] = gap as i8 + match dir { 0 => 1,
                                                                  1 => -1,
                                                                  2 => 4,
                                                                  _ => -4 };
                    }
                }
            }
            moves
        }

        const M: usize = 1 << 20;
        const NIL: u32 = std::u32::MAX;

        let moves = create_moves();

        let mut todo = vec![0u32; M];
        todo[0] = S;

        let mut dist = vec![NIL; M];
        dist[S as usize] = 0;

        let mut ways = vec![0u64; M];
        ways[S as usize] = 1;

        let mut checksum = vec![0u64; M];

        //const DIRS: [u8; 4] = b"LRUD"; // Error.
        const DIRS: [u8; 4] = [b'L', b'R', b'U', b'D'];

        let mut n_done = 0;
        let mut n_todo = 1;
        let mut step = 0;

        while dist[T as usize] == NIL {
            step += 1;
            let last_ntodo = n_todo;
            while n_done < last_ntodo {
                let pos = todo[n_done] as usize;
                n_done += 1;
                let gap = pos >> 16;
                for dir in 0 .. 4 {
                    let mut new1 = i32::from(moves[gap][dir]);
                    if new1 != -1 {
                        new1 = pos as i32
                               & 0xffff
                               & !((1 << gap)
                               | (1 << new1))
                               | (new1 << 16)
                               | (((pos as i32 >> gap) & 1) << new1)
                               | (((pos as i32 >> new1) & 1) << gap);
                        let new1 = new1 as usize;

                        if dist[new1] == NIL {
                            dist[new1] = step;
                            ways[new1] = 0;
                            checksum[new1] = 0;
                            todo[n_todo] = new1 as u32;
                            n_todo += 1;
                        }

                        if dist[new1] == step {
                            ways[new1] += ways[pos];
                            checksum[new1] += (checksum[pos] * 243 +
                                               ways[pos] * DIRS[dir] as u64)
                                              % 100_000_007;
                        }
                    }
                }
            }
        }
        checksum[T as usize]
    }
    assert_eq!(e244(), 96_356_848);
}

The purposes of the top post are to show situations where I’d like Rust to offer me ways to write more precise types and to express more static information, to see if some people like some of that stuff, if there are some reasons that forbid those features to be added to Rust, and it’s a presentation of material for several future enhancement requests. The solution of that Euler Project problem is just an example of typical code that I’d like to write better in Rust.