To reduce the number of true casts


#1

I’m learning Rust in my free time since less than two months, and I am still not sure that so strict typing is a good thing (compared to the type strictness of D, that is more strict than C++ but less strict than Rust). This is also a warning, the ideas below come from a Rust newbie.

The type strictness of Rust has some advantages: it catches some of my mistakes and avoids the ugly “type soup” I sometimes see in C/D code, where int/size_t are mixed all the time, etc. I prefer a tidy language and tidy code. But a disadvantage is the number of “as” casts I’m seeing in the many small Rust programs I’ve written so far.

Coding in D I’ve seen that casts are dangerous, they are a code stench (bad code smell), and it’s better to minimize their number. Casts are dangerous because they sometimes lose information and sometimes they don’t. They only specify the target type, so if you later refactor of modify the code and change the source type, the compiler keeps compiling the code, even if now the cast is meaningless, dangerous or wrong. A well designed language should try to minimize their number (removing all of them from a system language is not my purpose).


I’ve done a quick analysis on the Servo browser source code, looking for casts (some of them are false positives, inside comments):

" as " 3207 occurrences in 301 files
"as usize" 1551 occurrences in 112 files
"as f32" 105 occurrences in 20 files
"as u32" 103 occurrences in 44 files
"as f64" 80 occurrences in 21 files
"as i32" 74 occurrences in 32 files
"as &VirtualMethods" 68 occurrences in 35 files
"as AzFloat" 67 occurrences in 7 files
"as isize" 39 occurrences in 13 files
"as u64" 30 occurrences in 16 files
"as u8" 29 occurrences in 14 files

Both Rust language designers and Servo developers, should try to reduce the number of those casts in Servo code.

Sometimes removing those casts in Servo code is easy, like here:

components\net\data_loader.rs(80): let bytes = bytes.into_iter().filter(|&b| b != ' ' as u8).collect::<Vec<u8>>();

You can remove that cast with:

.filter(|&b| b != b' ')


The same analysis for Rust compiler source code:

" as " 7625 occurrences in 1545 files
"as usize" 780 occurrences in 254 files
"as u32" 363 occurrences in 134 files
" as *mut " 362 occurrences in 117 files
"as u64" 357 occurrences in 94 files
"as isize" 274 occurrences in 89 files
"as u8" 256 occurrences in 82 files
"as i32" 159 occurrences in 55 files
"as f64" 115 occurrences in 38 files
"as f32" 82 occurrences in 25 files
"as c_uint" 77 occurrences in 20 files

In the last years D language has introduced various language/library features that allow to remove many casts from the user code. In D language there are various ways to cast:

1) The most explicit and strong way to cast is to use cast(), that is a built-in. The “cast” keyword makes it simpler to perform a textual search for casts in your code, compared to a C-style cast. This way is similar to the “as” of Rust (but the D cast() also performs a dynamic cast where appropriate on class references). The usage of such kind of “hard casts” should be minimized. An usage example:

long x = 100;
int y = cast(int)x;

2) Implicit casts:

Generally D allows implicit integral type conversions only if there’s no loss of information (this rule has two exceptions, for uint/int, to avoid some common hard casts). So this is allowed without casts:

int x = 10;
long y = x;

(A i64 is able to contain all values of an i32, so here D doesn’t require a a hard cast).

3) Implicit casts based on Value Range Analysis:

D language computes statically the range of expressions and uses this approximate information to allow some safe implicit casts:

void main(in string[] args) {
    import std.conv;
    // A run-time value from command line.
    immutable uint x = args[1].to!int;
    ubyte y = x % 100;
}

Here the D compiler knows the range of the immutable value x is 0 … uint.max, so if you take that modulus the result must be a value 0 … 99, so the D compiler requires no cast() here. This is an awesome feature that helps remove some bug-prone casts, avoids noise, and keeps the code safe and correct.

Note: D compiler simplifies that analysis a lot, the results of the analysis are kept only for the current expression, unless they are immutable values (or they are manifest constants). In future this D feature will probably be improved, making the analysis more powerful (it was recently improved, for the immutables). A small amount of flow analysis will also allow code like this (not yet allowed in D):

void main(in string[] args) {
    import std.conv;
    immutable uint x = args[1].to!int;
    if (x < 100) {
        ubyte y = x;
    }
}

4) Explicit safe conversions:

char x = 'a';
ubyte y = ubyte(x);

Such conversions are allowed only for the types that allow implicit casts. This syntax allows to change the type of a value avoiding the need of assigning an intermediate variable of the desired type. This allows to return the right type, or pass the right type to functions, etc.

5) to!T safe conversions:

The standard library has a templated function “to” that converts the value to the specified type:

long x = 100;
int y = x.to!int;
long x2 = "100";
int y2 = x2.to!int;

Unlike cast() this is safe, if the given value can’t be converted (like because the string is wrong or the value is outside the bounds for the target type), it raises an exception (there is also a way to return something like an Option instead).

6) assumeUTF(), representation() and similar functions that perform specialized casts in a more documented, less bug-prone way (and sometimes they also perform some sanity tests when the program is compiled in debug mode). Example: representation() converts from a string of chars (that are represented with 1 byte in D) to a dynamic array of bytes. A similar function is already present for Rust strings.

7) unqual! and other templates that change the attributes of a type, like removing its const/immutableness, etc, without actually changing the underlying type. Using this is safer, more handy and less bug-prone than using raw casts (and it’s also smarter and more flexible).


So some suggestions/ideas for the Rust language:

  1. If not already present (or already planned) I suggest to add a syntax or some short easy UFCS function (like into()) that performs “safe” casts that don’t cause loss of information between numberic values. I can call this a “light cast”.

  2. Add Value Range Analysis to Rust, that allows more cases to be handled with the “light cast”, like the situation with “%100” above. Value Range Analysis gets even more useful in Rust because in Rust most variables are immutable, and this makes that analysis simpler. Even a simple VRA like in D is going to be quite useful. (I also suggest to add to Rust a related feature: a Slice Length Range Analysis, that keeps track statically of size bounds of slices, and helps remove some run-time tests, to catch some bugs statically, and helps in conversions between slices and fixed-size arrays. I’ll discuss this better in another post).

  3. If not already present to introduce in the standard library a safe conversion function like “to”. (This is even more important in Rust because generally the compiler catches integrals overflows in debug mode. So using “as” is a very blunt tool that goes against such higher safety).

  4. Lot of casts I’ve found in my Rust code are “as usize” for array/vector indexing (and the analysis of Servo/Rustc shows they are indeed really common also in code written by rustaceans far more experienced than me). Example: I have an array of indexes, that I have defined like Vec<u32> to save half memory space, because I am not going to need more than 4 billion items. And every time I use such numbers as array/vector indexes I have to cast them to usize. This is awful! So is it possible and a good idea for Rust arrays/vectors to make an exception to the general rule of Rust strong type strictness and allow them to accept u8/u16/u32 indexes too (and u64 too on 64 bit systems?)? I think this change will slice away lot of those 1551 occurrences of “as usize” in Servo code.

  5. Perhaps in not-generic code is a good idea to give a warning in situations where you add an useless cast:

    fn main() { let x = “12”.parse::().unwrap(); let y: i64 = 10; let z = x as i64 + y; // Should raise a “useless cast” warning? }


I’ve also done two other small searches in the Servo browser source code:

".unwrap()" 2633 occurrences in 255 files
".clone()" 1114 occurrences in 172 files

The number of unwrap() calls should also be minimized, with pattern matching, etc.


The problem with array/slice/vector indexes
#2

Lossless From is mostly implemented now, which also gives you into(). See PR28921 for integers and PR29129 for floats. Both left out potential usize/isize implementations though.


#3

There are lints for trivial-casts and trivial-numeric-casts, which are allowed by default, but you could choose to warn or deny them.


#4

Just for reference, the conv crate has a few possibly relevant traits: ValueFrom, ApproxFrom and TryFrom (plus their *Into duals).


#5

There’s an RFC about this:

With that RFC and existing From/Into impls you could write lossless and lossy casts as

let a = 10u8;
let b = a.into(): u16; // Lossless
let c = b.cast(): u8; // Lossy, catches overflows in debug mode

Unfortunately, From/Into impls for usize and isize met some resistance and weren’t implemented yet.

(Speaking of D, its parentheses elision is great! With it these casts would look like b.cast: u8)


#6

Oh, I didn’t see them, nice. Why isn’t such warning active by default?

Looks a bit over-engeneered, but it seems complete and usable. This needs to be used by everybody in every program, so it’s something for the Rust standard library.

It’s a bad feature to copy from D. It has caused endless problems and discussions (including the -property switch). Rust syntax is a bit longer, but more uniform, and I prefer it over all those troubles. From D I prefer you to copy Value Range Analysis (and less strong typing for array/vector indexes).

Edit: regarding parentheses, a shortening syntax that has caused no troubles in D is the single argument template instantiation syntax. So you can write:

“5”.to!int

Instead of:

“5”.to!(int)

Unfortunately I think you can’t do something like that with the Rust <> syntax.


#7

They were added in PR23630 as warnings, then changed to allow by default in PR23776. I only found a little more context in the 2015-03-31 meeting minutes.


#8

IIRC, the reasoning was that in cross-platform code, such “trivial” casts were often needed on some platforms but not others. This meant that the lint had an very large number of false positives – that is, warnings on code that was correct – and was judged more annoying than useful.


#9

It also triggered on some places that actually required casts in current Rust - &T to *const T maybe?


#10

Not that one, or at least not anymore:

fn with_ptr<T>(_: *const T) -> ! { unimplemented!() }
fn main() { with_ptr(&"test"); }

#11

Those will coerce, but some contexts require casts: a as *const _ == b as *const _ to compare for pointer equality for example.