The problem with array/slice/vector indexes


#1

Here I’ve discussed the dangers of wild-west casts in Rust code:

The two main points of that thread are still unaddressed. True casts are dangerous, and Rust has to do what it can to help the programmers minimize their number, replacing as many as possible with safer alternatives. (I’ll discuss Value Range Analysis better in another specific post.)


The main source of casts are the “as usize” casts needed to index arrays/vectors/slices. I am now porting C code to Rust, and this is an example:

const N1: usize = 324;
const N2: usize = 729;

struct Sudoku {
    r: [[u16; 9]; N1],
    c: [[u16; 4]; N2]
}

type Tsc = [u8; N1];
type Tsr = [i8; N2];

impl Sudoku {
    pub fn new() -> Sudoku {
        let mut s = Sudoku { r: [[0; 9]; N1], c: [[0; 4]; N2] };
        let mut nr = [0; N1];
        let mut r = 0;

        for i in 0u16 .. 9 {
            for j in 0u16 .. 9 {
                for k in 0u16 .. 9 {
                    s.c[r] = [9 * i + j,
                              (i / 3 * 3 + j / 3) * 9 + k + 81,
                              9 * i + k + 162,
                              9 * j + k + 243];
                    r += 1;
                }
            }
        }

        for r in 0 .. N2 {
            for c2 in 0 .. 4 {
                let k = s.c[r][c2] as usize;
                s.r[k][nr[k]] = r as u16;
                nr[k] += 1;
            }
        }
        s
    }

    #[inline(always)]
    fn forward(&self, sr: &mut Tsr, sc: &mut Tsc, c: u16, min: &mut u8, min_c: &mut u16) {
        for &rr in self.r[c as usize].iter() {
            // Take a pointer to avoid repeated bounds checks.
            let srrr = &mut sr[rr as usize];
            *srrr += 1;
            if *srrr == 1 {
                for &cc in self.c[rr as usize].iter() {
                    // Take a pointer to avoid repeated bounds checks.
                    let sccc = &mut sc[cc as usize];
                    *sccc -= 1;
                    if *sccc < *min {
                        *min = *sccc;
                        *min_c = cc;
                    }
                }
            }
        }
    }

    #[inline(always)]
    fn revert(&self, sr: &mut Tsr, sc: &mut Tsc, c: u16) {
        for &rr in self.r[c as usize].iter() {
            // Take a pointer to avoid repeated bounds checks.
            let srrr = &mut sr[rr as usize];
            *srrr -= 1;
            if *srrr == 0 {
                for &i in self.c[rr as usize].iter() {
                    sc[i as usize] += 1;
                }
            }
        }
    }

    #[inline(always)]
    fn update(&self, sr: &mut Tsr, sc: &mut Tsc, r: u16, v: i32) -> u32 {
        let mut min = 10;
        let mut min_c = 0;
        for &i in self.c[r as usize].iter() {
            sc[i as usize] = sc[i as usize].wrapping_add((v << 7) as u8);
        }
        for &c in self.c[r as usize].iter() {
            if v > 0 { // Move forward.
                self.forward(sr, sc, c, &mut min, &mut min_c);
            } else {
                self.revert(sr, sc, c);
            }
        }
        return ((min as u32) << 16) | (min_c as u32);
    }

    pub fn solve(&self, inp: &str) -> Vec<String> {
        let mut sc: Tsc = [9u8; N1];
        let mut sr: Tsr = [0i8; N2];
        let mut cr = [-1i8; 81];
        let mut cc = [-1i16; 81];
        let mut s = [0; 81];
        let mut s8 = [48u8; 81];
        let mut hints = 0;

        for i in 0 .. 81 {
            let c = inp.as_bytes()[i];
            s[i] = -1;
            if c >= b'1' && c <= b'9' {
                s[i] = (c - b'1') as i32;
                self.update(&mut sr, &mut sc, (i as u16 * 9 + s[i] as u16), 1);
                hints += 1;
                s8[i] = c;
            }
        }

        let mut ret: Vec<String> = vec![];
        let mut i: i32 = 0;
        #[derive(PartialEq, Eq)] enum Dir { Left, Right }
        let mut dir = Dir::Right;
        let mut cand: u32 = 10 << 16;

        loop {
            while i >= 0 && i < 81 - hints {
                if dir == Dir::Right {
                    let mut min = (cand >> 16) as u8;
                    cc[i as usize] = (cand & 0xffff) as i16;
                    if min > 1 {
                        for (c, &v) in sc.iter().enumerate() {
                            if v < min {
                                min = v;
                                cc[i as usize] = c as i16;
                                if min <= 1 { break; }
                            }
                        }
                    }
                    if min == 0 || min == 10 {
                        cr[i as usize] = -1;
                        dir = Dir::Left;
                        i -= 1;
                    }
                }
                let c = cc[i as usize];
                if dir == Dir::Left && cr[i as usize] >= 0 {
                    self.update(&mut sr, &mut sc, self.r[c as usize][cr[i as usize] as usize], -1);
                }
                let mut tmp = 9i8;
                for r2 in cr[i as usize] + 1 .. 9 {
                    if sr[self.r[c as usize][r2 as usize] as usize] == 0 {
                        tmp = r2;
                        break;
                    }
                }
                if tmp < 9 {
                    cand = self.update(&mut sr, &mut sc, self.r[c as usize][tmp as usize], 1);
                    cr[i as usize] = tmp;
                    dir = Dir::Right;
                    i += 1;
                } else {
                    cr[i as usize] = -1;
                    dir = Dir::Left;
                    i -= 1;
                }
            }
            if i < 0 { break; }
            for j in 0 .. i {
                let r = self.r[cc[j as usize] as usize][cr[j as usize] as usize];
                s8[r as usize / 9] = (r % 9 + u16::from(b'1')) as u8;
            }
            ret.push(String::from_utf8(s8.to_vec()).unwrap());
            i -= 1;
            dir = Dir::Left;
        }
        ret
    }
}

fn main() {
    use std::io::{ stdin, BufRead };

    let s = Sudoku::new();
    let stdin = stdin();

    for line in stdin.lock().lines() {
        for l in s.solve(&line.unwrap()).iter() {
            println!("{}", l);
        }
        println!("");
    }
}

This Rust code is 166 lines long and it contains 30 casts (I’ve managed to remove some of them, they were more before), 25 casts are “as usize” for array/vector/slice indexing. (The types u8/u16 aren’t always necessary, but they useful to make the code faster, and if you want the port C code to Rust you often need to keep the original types, otherwise sometimes the ported code doesn’t work correctly).

This is a lot. And in my precedent post I’ve shown Rustc and Servo have a large amount of “as usize” hard casts despite they are not translations from C code.

So, if possible, I suggest to let Rust array/vector/slices accept indexes of type u8/u16/u32/usize, performing implicit safe type conversions.

This special cast rule allows to remove a ton of true casts from the code, making Rust code shorter, more readable and safer. This way all the other (five) casts stand out better from the code, and you can audit their safety better.


I have also a smaller enhancement request. Now you can perform some safe numerical conversions with into() and from():

fn main() {
    let x: u32 = 10;
    let y: u64 = x.into();
    let z: u64 = u64::from(x);
}

I’d like the following program to give a warning (using -W trivial-casts or -W trivial-numeric-casts or something similar):

fn main() {
    let x: u32 = 10;
    let y: u64 = x as u64;
}

The warning should suggest the use of into() or from() instead of “as”.


#2

That sounds like a good lint to add to rust-clippy.


#3

But I think none or nearly none of the hard casts in the code ported from C that I’ve shown can be replaced with into/from (I have put a from() already). So that lint solves only a small percentage of the problem.


#4

Oh, I agree, and conversions to usize would help a lot. See PR29220, which is closed, but it’s still an open discussion AFAIK.


#5

Ah, thank you, so there’s life :slight_smile: I think this is what you are talking about:

fn main() {
    let i1: u8 = 1;
    let i2: u16 = 1;
    let i3: u32 = 1;
    let i4: usize = 1;

    let data1 = [0; 10];
    data1[i1.into()] = 5;
    data1[i2.into()] = 5;
    data1[i3.into()] = 5;
    data1[i4.into()] = 5;

    let data2 = &data2;
    data2[i1.into()] = 5;
    data2[i2.into()] = 5;
    data2[i3.into()] = 5;
    data2[i4.into()] = 5;

    let data3 = vec![0; 10];
    data3[i1.into()] = 5;
    data3[i2.into()] = 5;
    data3[i3.into()] = 5;
    data3[i4.into()] = 5;
}

It’s safer than “as” hard casts and a bit shoter than “as usize”, but it’s still noisy. If you are on a 16 bit CPU and usize means u16, then the “i3” index in the code above is a problem (and I guess this is why from/into currently don’t handle usize/isize).

But what I’m suggesting is this code to compile:

fn main() {
    let i1: u8 = 1;
    let i2: u16 = 1;
    let i3: u32 = 1;
    let i4: usize = 1;

    let data1 = [0; 10];
    data1[i1] = 5;
    data1[i2] = 5;
    data1[i3] = 5;
    data1[i4] = 5;

    let data2 = &data2;
    data2[i1] = 5;
    data2[i2] = 5;
    data2[i3] = 5;
    data2[i4] = 5;

    let data3 = vec![0; 10];
    data3[i1] = 5;
    data3[i2] = 5;
    data3[i3] = 5;
    data3[i4] = 5;
}

I don’t know if this is possible, and I guess solving this problem with library arrays like Vec is less easy.


#6

It is possible, you just have to implement Index for the different integer types. It might even be possible to do something generic like:

impl<T, I: Into<usize>> Index<I> for [T] {...}

and then we’re back to advocating PR29220.

If that’s done, then should we also do this for all the other index-like methods which take usize?
insert, remove, etc. etc.

Lots of APIs could be more generic, but it’s not clear that they should be…


#7

I did a quick analysis of both Rustc and Servo source code, and I saw that most “as usize” casts are for array indexing. So I think adding a special case for array/vector/slice indexing is enough to remove hundreds of useless casts.

For all the the other methods, using the other type conversion solutions (like into/from) seems acceptable.


#8

One big problem (noted in this comment) is that it makes type inference problematic. AIUI, in an expression like foo[0] the compiler would no longer have a definite inference for the type of 0, so it would default to 0i32, which of course doesn’t actually work here.


#9

I gathered some statistics about this a year ago and according to it only 17-18% of as usize casts were performed directly in indexing context. The things could change though.


#10

Thank you for more hard data, to make this discussion a bit more solid :slight_smile:


#11

Supporting i32 doesn’t make sense for silent conversions to usize, but there’s no problem supporting it when directly implementing Index on smaller integers - there would be a separate impl for signed integers that treats negative indices like any other out of range value (i.e. usually panicking, but not with methods that return Option like Vec::get).

In practice, I guess this would be best done with a trait containing a method like fn into_index(&self) -> Option<usize>.


#12

I like that. When indexing an array/vec you have to think about panics anyway, so an additional source of panic wouldn’t outweigh the convenience gained.


#13

First, it needs to be proven, that convenience is indeed gained in practice. There’s a good chance that you are trying to solve a non-problem.

I strongly doubt that indexing with anything except for u8/16/u32 and usize/isize is common enough to require ergonomics improvements (there are even some data in a message above).
Moreover, I don’t want v[index] to silently accept integer types other than usize (and maybe u8/16/32 with a stretch), I’d better change the type of index to usize in this situation (I agree, there are exceptions). The loss of explicitness is much worse than modest gains in convenience.

(I think i write this for the tenth time or so in various threads)


#14

I would like to point out that if one is frequently indexing with other integers than usize AND considers using as or .into() a problem, then one can create a new type (over the slice) which Deref to the slice (to gain all its methods) and then implement Index<u8>, Index<u16> etc… as appropriate for its newtype.