I’m learning Rust in my free time since less than two months, and I am still not sure that so strict typing is a good thing (compared to the type strictness of D, that is more strict than C++ but less strict than Rust). This is also a warning, the ideas below come from a Rust newbie.
The type strictness of Rust has some advantages: it catches some of my mistakes and avoids the ugly “type soup” I sometimes see in C/D code, where int/size_t are mixed all the time, etc. I prefer a tidy language and tidy code. But a disadvantage is the number of “as” casts I’m seeing in the many small Rust programs I’ve written so far.
Coding in D I’ve seen that casts are dangerous, they are a code stench (bad code smell), and it’s better to minimize their number. Casts are dangerous because they sometimes lose information and sometimes they don’t. They only specify the target type, so if you later refactor of modify the code and change the source type, the compiler keeps compiling the code, even if now the cast is meaningless, dangerous or wrong. A well designed language should try to minimize their number (removing all of them from a system language is not my purpose).
I’ve done a quick analysis on the Servo browser source code, looking for casts (some of them are false positives, inside comments):
" as " 3207 occurrences in 301 files
"as usize" 1551 occurrences in 112 files
"as f32" 105 occurrences in 20 files
"as u32" 103 occurrences in 44 files
"as f64" 80 occurrences in 21 files
"as i32" 74 occurrences in 32 files
"as &VirtualMethods" 68 occurrences in 35 files
"as AzFloat" 67 occurrences in 7 files
"as isize" 39 occurrences in 13 files
"as u64" 30 occurrences in 16 files
"as u8" 29 occurrences in 14 files
Both Rust language designers and Servo developers, should try to reduce the number of those casts in Servo code.
Sometimes removing those casts in Servo code is easy, like here:
components\net\data_loader.rs(80): let bytes = bytes.into_iter().filter(|&b| b != ' ' as u8).collect::<Vec<u8>>();
You can remove that cast with:
.filter(|&b| b != b' ')
The same analysis for Rust compiler source code:
" as " 7625 occurrences in 1545 files
"as usize" 780 occurrences in 254 files
"as u32" 363 occurrences in 134 files
" as *mut " 362 occurrences in 117 files
"as u64" 357 occurrences in 94 files
"as isize" 274 occurrences in 89 files
"as u8" 256 occurrences in 82 files
"as i32" 159 occurrences in 55 files
"as f64" 115 occurrences in 38 files
"as f32" 82 occurrences in 25 files
"as c_uint" 77 occurrences in 20 files
In the last years D language has introduced various language/library features that allow to remove many casts from the user code. In D language there are various ways to cast:
1) The most explicit and strong way to cast is to use cast()
, that is a built-in. The “cast” keyword makes it simpler to perform a textual search for casts in your code, compared to a C-style cast. This way is similar to the “as” of Rust (but the D cast()
also performs a dynamic cast where appropriate on class references). The usage of such kind of “hard casts” should be minimized. An usage example:
long x = 100;
int y = cast(int)x;
2) Implicit casts:
Generally D allows implicit integral type conversions only if there’s no loss of information (this rule has two exceptions, for uint/int, to avoid some common hard casts). So this is allowed without casts:
int x = 10;
long y = x;
(A i64 is able to contain all values of an i32, so here D doesn’t require a a hard cast).
3) Implicit casts based on Value Range Analysis:
D language computes statically the range of expressions and uses this approximate information to allow some safe implicit casts:
void main(in string[] args) {
import std.conv;
// A run-time value from command line.
immutable uint x = args[1].to!int;
ubyte y = x % 100;
}
Here the D compiler knows the range of the immutable value x is 0 … uint.max, so if you take that modulus the result must be a value 0 … 99, so the D compiler requires no cast() here. This is an awesome feature that helps remove some bug-prone casts, avoids noise, and keeps the code safe and correct.
Note: D compiler simplifies that analysis a lot, the results of the analysis are kept only for the current expression, unless they are immutable values (or they are manifest constants). In future this D feature will probably be improved, making the analysis more powerful (it was recently improved, for the immutables). A small amount of flow analysis will also allow code like this (not yet allowed in D):
void main(in string[] args) {
import std.conv;
immutable uint x = args[1].to!int;
if (x < 100) {
ubyte y = x;
}
}
4) Explicit safe conversions:
char x = 'a';
ubyte y = ubyte(x);
Such conversions are allowed only for the types that allow implicit casts. This syntax allows to change the type of a value avoiding the need of assigning an intermediate variable of the desired type. This allows to return the right type, or pass the right type to functions, etc.
5) to!T safe conversions:
The standard library has a templated function “to” that converts the value to the specified type:
long x = 100;
int y = x.to!int;
long x2 = "100";
int y2 = x2.to!int;
Unlike cast() this is safe, if the given value can’t be converted (like because the string is wrong or the value is outside the bounds for the target type), it raises an exception (there is also a way to return something like an Option instead).
6) assumeUTF(), representation()
and similar functions that perform specialized casts in a more documented, less bug-prone way (and sometimes they also perform some sanity tests when the program is compiled in debug mode). Example: representation() converts from a string of chars (that are represented with 1 byte in D) to a dynamic array of bytes. A similar function is already present for Rust strings.
7) unqual!
and other templates that change the attributes of a type, like removing its const/immutableness, etc, without actually changing the underlying type. Using this is safer, more handy and less bug-prone than using raw casts (and it’s also smarter and more flexible).
So some suggestions/ideas for the Rust language:
-
If not already present (or already planned) I suggest to add a syntax or some short easy UFCS function (like into()) that performs “safe” casts that don’t cause loss of information between numberic values. I can call this a “light cast”.
-
Add Value Range Analysis to Rust, that allows more cases to be handled with the “light cast”, like the situation with “%100” above. Value Range Analysis gets even more useful in Rust because in Rust most variables are immutable, and this makes that analysis simpler. Even a simple VRA like in D is going to be quite useful. (I also suggest to add to Rust a related feature: a Slice Length Range Analysis, that keeps track statically of size bounds of slices, and helps remove some run-time tests, to catch some bugs statically, and helps in conversions between slices and fixed-size arrays. I’ll discuss this better in another post).
-
If not already present to introduce in the standard library a safe conversion function like “to”. (This is even more important in Rust because generally the compiler catches integrals overflows in debug mode. So using “as” is a very blunt tool that goes against such higher safety).
-
Lot of casts I’ve found in my Rust code are “
as usize
” for array/vector indexing (and the analysis of Servo/Rustc shows they are indeed really common also in code written by rustaceans far more experienced than me). Example: I have an array of indexes, that I have defined likeVec<u32>
to save half memory space, because I am not going to need more than 4 billion items. And every time I use such numbers as array/vector indexes I have to cast them to usize. This is awful! So is it possible and a good idea for Rust arrays/vectors to make an exception to the general rule of Rust strong type strictness and allow them to accept u8/u16/u32 indexes too (and u64 too on 64 bit systems?)? I think this change will slice away lot of those 1551 occurrences of “as usize
” in Servo code. -
Perhaps in not-generic code is a good idea to give a warning in situations where you add an useless cast:
fn main() { let x = “12”.parse::().unwrap(); let y: i64 = 10; let z = x as i64 + y; // Should raise a “useless cast” warning? }
I’ve also done two other small searches in the Servo browser source code:
".unwrap()" 2633 occurrences in 255 files
".clone()" 1114 occurrences in 172 files
The number of unwrap()
calls should also be minimized, with pattern matching, etc.