So I’m a newbie with rust, but quite an experienced programmer otherwise (most experienced in c++).
I really like what I’ve seen about Rust, and have been doing some exploration.
I was looking through rust_doom for a starting point for getting some actual concrete stuff up and running with Rust, and came across a comment in this source file which made me stop and think.
// Sorry for the messy/unsafe code, but the array bounds checks in this
// tight loop make it 6x slower.
The thing is that a big trend for me in my work has been to move away from iterators in data representations and use simple indices instead.
The context for this is PathEngine, where we do a lot of stuff with mesh data structures. And I’ve written a bit about this subject on my blog, in this series of posts.
Many years ago our meshes started out as being ‘list like’, so with pointers for access to individual mesh elements, and these pointers then wrapped up in iterators.
Since then we’ve moved from list-like to vector like, so with contiguous buffers replacing lots of small allocations (and memory fragmentation), but also with simple indices replacing iterators.
Some advantages of simple integer indices for this kind of stuff is that:
- data structures are relocatable, and can be saved and loaded directly to/from memory (or moved around memory) very easily
- integer indices are more compact than iterators
- integer indices can easily refer to more than one data structure in parallel (so we can have multiple, independant, data channels for stuff like mesh face attributes, and use the same integer indices across these multiple data channels)
As well as looking up mesh related data, indices are also used for traversing around the mesh, e.g. looing up edges in a face and neighbouring faces across those edges, and this has to be very fast.
Adding bounds checking to this all would be a huge performance hit, then. And so, as soon as I see this issue this comes up straight away for me as a big red flag!
I saw some previous discussion about this here: https://www.marshut.net/iswyph/compiling-with-no-bounds-checking-for-vectors.html And actually, I agree with the ‘compiler flag for unsafe goes against the spirit of rust’ argument. It is really cool, I think, to have some kind of full safeness guarantees, and this is part of why I am interested in rust.
So I’m thinking about the whole issue of array bounds checking, and safety, and it occurred to me, what if instead of array accesses being bounds checked, and out of bounds cases being an error, the indices where instead forced to the valid range.
So, for example, indices could be clamped to the valid range, or a modulo operator applied.
This may seem kind of odd, in that it would silently cover up program errors that you would prefer to know about, and result in some strange and (initially) unintuitive behaviour. But what I’m looking for very specifically is:
- safeness guarantees of the kind ‘your program cannot write or read outside of allocated memory’ and ‘your program will not crash or result in undefined behaviour’
- as close to the metal as possible performance
So, personally, I’m willing to accept a bit of unintuitiveness for this.
The other side of it is then: what kind of difference would this actually make to performance? Clamping would seem to require conditional execution, but modulo can be done without any conditional execution, for what that’s worth (although still is quite an expensive operation, I think).
Taking this idea a bit further, if we knew that every array range is inside some power of 2 range owned by the program, perhaps this could be reduced to a very fast bitmask? (But that would be a lot more complicated and intrusive a change, if even possible, than just inserting a modulo operation!)
What are people’s thoughts on this? If there’s a reasonable chance of this actually getting in to the language, I’m willing to have a go at patching this in and setting up some kind of benchmark to get an idea about what kind of difference this would actually make, if any…