I brought up a similar example in an earlier post, and the reaction was that out-of-bounds get_unchecked
should just be UB, regardless of what it ends up accessing. (Of course, this is easy to check in sanitizing mode.)
Of course, UB can cause arbitrary negative effects, but this isn't about malicious code; it's about honest code making a contract with the compiler. With "unchecked", you (the code author) would promise not to request out-of-bounds indices, and the compiler would guarantee that no other types of UB can occur.
So the split is viable - but I still find it fairly dissatisfying, as it only improves things for a subset of use cases. For other use cases, the dilemma remains. Either:
- Using, e.g., FFI (or other non-unchecked unsafe stuff) anywhere in a module penalizes the entire module, which could be the whole codebase for small programs.
or
- Refactoring a function by moving some code into a separate function can create undefined behavior where there was none before.
They're both spooky action at a distance, affecting performance and correctness, respectively.
But... I'm really starting to lean towards a relatively UB-maximalist policy, personally. A lot of the original Tootsie Pop discussion boiled down to what code people are likely to write. In that regard, I suggest a simple principle.
Real code rarely commits aliasing violations conditionally.
That is, if the faulty code is reached, chances are it'll always commit a violation, rather than only doing so with certain inputs. This is very different from other types of UB, like integer overflow in C, which are usually input-dependent.
There may be some exceptions. I'll give you one up front: suppose some code takes array: &[Foo], idx1: usize, idx2: usize
; accesses the idx1
'th and idx2
'th items of array
; and on each, calls get
on an UnsafeCell
and converts the result to &mut
, keeping both references alive at the same time. Then an aliasing violation will only occur if idx1 == idx2
, obviously. But (a) deoptimizing unsafe code won't necessarily fix that case, e.g. if the code passes the two &mut
s as parameters to some unrelated safe function; and (b) most cases are not like that.
Therefore, with respect to aliasing, the sanitizer is almost as good as a static compiler check. We accept (empirically) counterintuitive aliasing rules when the borrow checker can enforce them; why not when the sanitizer can enforce them?
What's more important is ensuring that people actually run the sanitizer, and heed the output. Suggestions:
- Take performance seriously. I have a lot to say about potential implementation strategies, but it's off topic here
- Have
cargo test
default to using it (when unsafe code is present?) - From experience with C, people are always tempted to dismiss sanitizer warnings as "false alarms" or "harmless". Minimize their motivation to do so:
- Make the warnings kinda scary.
- Ensure there's always a trivial patch that will make the errors go away, even in cases where a 'proper' fix would require more refactoring. For example, have a block attribute to just disable all aliasing assumptions in the block.