Bounds checking at initialization


#1

I really like the philosophy behind Rust and feel excited to use it in future projects. However, bounds checking when random accessing arrays/slices could be a real performance hog. As been pointed out before, this is not really a solvable problem: if one wants memory safe code bounds must be checked. And one can use iterators if access is sequential and unsafe code for the truly critical parts to speed up things.

But I figured that there must be some middle ground: some type of optimizations which ensure within-bounds access but keeps checking at a minimum. In the spirit of Rust this would need to happen completely at the compiler stage, lead to guaranteed safe code and lead to higher performance. The idea I had builds on that checking for a immutable object in a fixed-length (at run-time) vector-type object is only needed once. As variables are immutable by default in Rust and arrays/slices are fixed-sized (when initialized at run-time) this should be a quite common situation and therefore lead to some performance improvements.

I’ll explain the idea with some examples. Consider the following code:

fn main() {
    let mut data = vec![1, 2, 3]; // user supplied
    let req = 2; // user supplied
            
    let slice = &mut data;
    let i = req; 
    for n in 0..10 {
        slice[i] = n; // Bounds check i in slice
        println!("Value: {}", slice[i]); // BC i in slice
    }
}

At compiler time it is impossible to know the length of slice or the value of i – bounds must be checked. However, the compiler can infer that the length of slice and i will never change after they are initialized. There is no need to bounds check at each iteration of the loop. It can instead be done at the first instance where both slice and i existed and then never be done again after that. I.e.:

fn main() {
    let mut data = vec![1, 2, 3]; // user supplied
    let mut req = 2; // user supplied
            
    let slice = &mut data;
    let i = req; // BC i in slice
    for n in 0..10 {
        slice[i] = n; // No BC
        println!("Value: {}", slice[i]); // No BC
    }
}

We have reduced the number of checks from 20 to just one! “Bounds safety” could be easily be inherited if the new object also is immutable:

fn main() {
    let mut data = vec![1, 2, 3]; // user supplied
            
    let slice = &mut data;
    let i = 3; // BC i in slice
    let j = i; // j inherits i's bounds safety for slice
    let mut k = i; // mutable, no inheritance
    for n in 0..10 {
        slice[i] = n; // No BC
        println!("Value: {}", slice[j]); // No BC
        k = n;
        println!("Value: {}", slice[k]); // Do BC for k in slice
    }
}

Bounds safety could also be inherited through function calls if parameters are immutable:

// Compiler infers that `index' will be
// bounds checked in `data_slice' and requires that
// input is checked before supplied to function.
fn print_data(data_slice: &mut [i32], index: usize, set: i32) {
    data_slice[index] = set; // No BC
    println!("Value: {}", data_slice[index]); // No BC
}

fn main() {
    let mut data = vec![1, 2, 3]; // user supplied

    let i = 1;
    let slice = &mut data; // do BC i in slice

    for n in 1..10 {
        // `print_data()' requires bounds checking 
        // Mark `i' as needing bounds checking for `slice'.
        print_data(slice, i, n); // No BC
    }
}

This could of course apply to several nested levels of functions. If there is some subroutine which will do a lot of random access in many iterations then this could be done only once when the subroutine is call. This without compromising safety, needing unsafe code or other considerations of the coder.

With scalar indices and simple flow control, inference is quite simple: one can easily infer at compile time if a variable needs bounds checking. In more complex setting this might not be possible. Then there is a trade-off: bounds check early to eliminate all future checks but at the cost of possible unnecessary checks, or check as close to access as possible thereby re-check often but never do unnecessary checks. Note that safety is never compromised either way; it is purely a question of performance. Consider for example:

fn main() {
    let data = vec![1, 2, 3, 4, 5]; // user supplied
    let index = 1; // user supplied
    let b1 = true; // user supplied
    let b2 = true; // user supplied

    let d_slice = &data;
    let i = index; // Bounds check `i' in d_slice here?

    if b1 {
        println!("Value: {}", d_slice[i]); // Or bounds check here?
    }
    if b2 {
        println!("Value: {}", d_slice[i]); // And possibly again here?
    }
}

If i is BC at initialization then there will be an unnecessary check when b1=b2=false. However, if bounds are check on access then the bounds will be checked twice when b1=b2=true.

There are probably many smart ways the compiler can infer how the flow will be and do best guesses. In a few exceptions it could also introduce conditional checks, if early checking is deemed very beneficial in some setting but very bad in others. There could also be keywords to the compiler so the coder can indicate code that is likely needing early or late bounds checking:

0: fn main() {
1:     let mut data = vec![1, 2, 3, 4, 5]; // user supplied
2:     let indices = vec![0, 2, 3]; // user supplied
3:     let order = vec![1, 0, 2]; // user supplied
4: 
5:     let d_slice = &mut data;
6:     let i_slice = &indices;
7:     let o_slice = ℴ
8: 
9:     bounds_check(i_slice, d_slice);
A:     for &o in o_slice.iter() {
B:         let i = i_slice[o];
C:         d_slice[i] = 2;
D:     }
E:     ...
   }

In this program there will never be any bounds checking on line B (where o accesses i_slice). The compiler can infer that the loop will iterate over the elements of o_slice thus all elements o_slice need bounds checking and this can be done as early as possible. I.e., at the first state where both i_slice and o_slice exist: at line 7. As a benefit, if elements of o_slice is used to access i_slice beyond line E this can be done safely without bounds checking.

However, the problem here is line C. Should i_slice be checked in d_slice at line 6 or should each single access call be checked at line C? If the length of indices is shorter than order the first alternative is best, otherwise the second is. But when compiling this is not known! In this case the coder have added bounds_check(i_slice, d_slice); to indicate that bound checking the complete i_slice is preferable so the compiler will add checking at line 6 and the calls at line B (and beyond) are free.

If the inference mechanism is good and coders keeps to simple and immutable data structures when possible (which I guess is good style), it feels like this type of optimization could lead to large improvements in many cases. Without unsafe code! But I’m quite new to Rust and not yet completely familiar with its internals, so I hope this idea is not incompatible with some of the inner workings or that I have not re-invented the wheel and that it is already implemented. So, does the idea have any bearing?


#2

This is similar to the way Ada does it. You declare an index-type with a range, and an array type with the index-type as index. The array is then automatically of the size that is expressable by the index type. Since any value that is assigned to a variable of the index-type is bounds checked during that assignment, you do not need to check when you index into the array. Since rust already gets integer overflow checking, the step to arbitrary range integer overflow checking isn’t too big. Probably just a question of syntax and some implementation details (infinite numeric types instead of ~10).


#3

This seems like a pretty feasible approach. There’s just one problem: LLVM already does that! And it doesn’t need to know anything about slices being immutable or bounds checking to do so. It just sees checks i < length, and with its usual, widely applicable set of optimizations and analyses, notices that the condition is always true. Unless you can give an example program where LLVM can’t figure it out (and rustc can’t help it by generating the proper metadata/attributes), but a high-level Rust-specific optimization could, there is no incentive to duplicate the work LLVM already does.

“Inheriting through functions” seems like something LLVM can’t do if it can’t inline the function. However, I have doubts whether this is a realistic problem (it would have to be called in a loop but be inline(never) or rather large, probably among other conditions) and worth adding a complicated, error-prone interprocedural analysis.

See for example this playpen, adapted from your first “one bound check instead of 20” (check the LLVM IR; I use an made-up extern fn instead of println! to simplify the generated code). Note that there are zero bounds checks, though only because LLVM sees both the creation of the Vec and the value of req. This playpen eliminates the latter, and indeed LLVM hoists the bounds check out of the loop.


#4

Optimizations aside, you can do some of this as a library using invariant lifetimes. I was starting to write it up, then I remembered that someone already did so:

As described in the comments, that implementation is slightly wrong (unsafe), but it can be fixed by having the initial function that turns a slice into an ‘indexer’ and ‘proxy’ pair do it using a closure, i.e. something like fn split<T, R, Cb: for<'a> FnOnce(Indexer<'a>, Proxy<'a, T>) -> R>(slice: &mut [T], cb: Cb) -> R


#5

+1for leave it to LLVM. I am always impressed at how rarely get_unchecked actually yields me any performance improvements.


#6

This doesn’t only have optimization benefits (which are irrelevant due to LLVM). It also has safety benefits. If you explicitly define the ranges of your integral types, you can prevent getting numbers that don’t make sense. Use cases:

  1. When you read a number from a string, you need not add checks for it’s range, you can’t even forget the checks if you also add a way to read your number from some other source. The usual candidates like FromStr could also return a Result that encodes what went wrong.
  2. FFI: You get stuff from C that will probably try to bite you at every corner. If you forget a check, your type won’t accept the number and will panic instead of allowing you to continue to work with a number that is out of range.
  3. invariants! You always make sure that your functions on your own types hold your invariants. No such thing for numbers. Can be done with newtypes, but that’s just horrible.

#7

@ker Using special types would be an interesting way to solve this (and many other things as you point out). A similar idea was discussed on the mailing lists a while ago. While it’s elegant, would it not also probably to be a bit cumbersome to be enforced everywhere? Using compiler optimizations has the advantage that there would be no need for changes to the language design (net of the compiler command asking for early checking). Then @comex’s suggestion with a library feels better as it would be cumbersome only when needed.

@rkruppe You are of course right. LLVM seems to pick up most of the straightforward cases – but not all of them. As I’m new to the project, I don’t have the overview to know where (if anywhere) this fits in. But for the sake of the argument I will hold my ground and argue for at the Rust level :slight_smile: For two reasons: immutability is a central concept in Rust and bounds checking is a core feature – neither is true for LLVM. Thus there is a lot of (Rust level) information, which would be useful when optimizing, which is missing at the LLVM level.

Take this example. It is here easy to infer (in Rust at least) that there is no need for bounds checking in the second loop (line 10), but if you look at the LLVM IR it checks in both loops. Of course, bounds checking is necessary at line 4 and 7. I suspect that it would be way easier to optimize this in Rust than to hack away at the LLVM optimizer.

In addition, one aspect that I missed in my original post speaks for Rust level optimization. In cases with ambiguous control flow, early bounds checking could break code than is valid. Take this example. When b1=b2=false, this code is perfectly fine for any i (it might be unwise to allow such states, but it is hardly the job of the compiler to enforce that). If one does early bounds checking by default (i.e., uncommenting the assert at line 6 and make unchecked calls later) this could erroneously break when b1=b2=false.

While this speak for a compiler that is very conservative in applying early checking, it also speaks for having it at the Rust level. The compiler will have little knowledge in the expected or typical control flow and LLVM even less so. It would therefore be advantageous for the coder to tell the compiler to do early bounds checking when it would be beneficial – and I guess that only fits into the Rust level.

Borrowing my old example, if one know that i_slice typically has a length around 10, o_slice a length in the millions and all elements in i_slice are valid indices of d_slice, early bounds checking would be a very good choice in this case. The fictive compiler command do_bounds_checking would tell the compiler to check the whole i_slice once before the loop (i.e., in this case ~10 checks) and save one check at each iteration of the loop (i.e., millions of checks). And this type of cases does not feel that uncommon.

You might be right that “function bounds checking inheritance” could be too error-prone to be worth it. But it feels that it should not be too hard and could probably be done with a quite simple algorithm. If the compiler infer (either automatically when flow is clear or by command of the coder) that a function parameter needs bounds checking, the corresponding variable used to call that function is marked as needing bounds checking. If, in the calling branch, one can do early checking, all is nice and good and one has saved some checks. If one cannot do early checking (mutability, unclear branching or whatever) the variable is checked just before sending it to the function and nothing is lost.

Concerning your “doubts whether this is a realistic problem”. Of course, I agree that one should not complicate things, especially if LLVM fixes them away and if the use-cases which would benefit from the improvements are few. However, I’m unsure whether this really is as uncommon as you seem to make it.

As I’ve understood it, the biggest performance hit of Rust compared to unsafe language as C is bounds checking. When discussing Rust it seems peoples view is “safe but slow,” so I suspect putting effort on this part could narrow the performance gap (obviously not close it) and hopefully lead to wider adoption. Particularly so because it fits nicely into the Rust doctrine: it is a compiler level improvement that lead to fast and safe code. Of course, in toy examples or non-heavy duty applications it does not matter that much. But I suspect Rust will be used for some demanding stuff as well.

For example, I came to think about this since I’ve worked with compressed sparse matrices a bit lately. Here bounds checking would be useful, but there is often a host of operations on the matrices so one would like to avoid it for each value extraction. As every extraction is at least three nested calls into arrays for compressed matrices checking could really add up even for relatively simple cases. Checking as much as possible preemptively could really speed things up.

But, again, I lack the overview to know where this best fits in. If Rust is not intended to cater to the heavy duty numerical stuff, it might be better to refer sparse matrices and the like to C/C++, unsafe code or using @comex’s suggestion.


#8

Well… there’s certainly a lot of room for improvement to the extreme way of Ada.

There was a rfc for proving overflow semantics at compile-time. Didn’t get accepted (mostly b/c noone can remember the details for more than a few minutes). Something along those lines could help make everything more usable.

Basically you’d have a dynamic range type attached to every numeric variable. When you constrain the values (e.g. variable x with range 1..100 by if x < 5 {}) the dynamic type of x is 1..5 inside the block, and again 1..100 afterwards.

If you do x = x + 1 then the range of x is 2..101.


#9

I believe LiquidHaskell does something akin to this. Types with additional predicates, like “value != 0” for safe division or “value >= 0 && value < 100” for safe range indexing. They are called “Refinement types”. http://goto.ucsd.edu/~rjhala/liquid/haskell/blog/blog/2013/01/01/refinement-types-101.lhs/

There’s a something about those in RFC repo too, tagged as A-wishlist:


#10

Well, special types are very interesting. I like the idea as it imposes rigor by design but I doubt it would fit into the Rust lang:

  • It is probably a bit late in the process to do a complete remake of the type system. Isn’t 1.0.0 just around the corner?
  • Ranged types would not solve the bounds checking problem. See for example in this example. Instead special index-types as @ker mentioned would be needed and I cannot see how one would come around the cumbersomeness problem then.
  • It would not lead to increased performance, rather the opposite with overflow/range checks and the like. Many of those things feels hard optimize when compiling as well.
  • Abstracting away from the bare-metal to that degree does not feel like the sweet spot in the performance/safety trade-off that I believe Rust want to hit.
  • It would be another large departure from the C/C++ paradigm from where many potential users will come and thus likely reduce adoption.

The fact that more complex types have been discussed several times before without being adopted should be indicative of these problems. Making bounds checking more intelligent by doing early checking has, as mentioned, the advantage that no design changes are needed, does not compromise safety and lead to faster code. The only cost is to complicate the compiler a bit. But those are just my two cents. :smile: