I realized the error here. I think what I am really looking for is some general discussion of our priorities and goals. For example, here are some goals of mine:
-
easy for end-users to know if code is right or wrong
- as a rule of thumb, imagine you have some (unsafe) Rust code, and you know that – if it were compiled in a naive and non-optimized way – the resulting assembly would execute and do the right thing; then it should be simple to decide if the Rust code itself is correct (i.e., that it obeys the additional correctness rules)
- put another way, once we have an elaborated set of rules, we should be able to take all the E-code-example issues and very quickly categorize them as “legal” or “illegal”
-
optimizable
- in safe code, we should be able to take advantage of all the extended aliasing information that Rust’s type system offers us;
- in unsafe code, we should be able to easily inform the compiler about aliasing, to enable users of unsafely impl’d abstractions (esp. things like
Vec and HashMap) to achieve full performance
Of course, achieving both of these simultaneously may not be possible. When in conflict I probably lean towards safety, not performance. But here are some wildcards that can change the equation:
-
testable, at least dynamically, but maybe statically with lints and extended annotations
- I’m starting to get converted over to the idea that we ought to only accept rules that we can test for dynamically, even if that test comes at high overhead; this might make relatively complex rules less problematic
-
complexity opt-in
- if we can design rules that contain a very simple subset, that might let people start out with something straightforward, then add aliasing info progressively
Finally, some things I might be willing to sacrifice:
-
compatibility with existing unsafe code
- I don’t think we’ll be able to accept all unsafe code (I mean, it’ll compile, but it might change behavior as we add optimizations to the compiler)
- If we are breaking commonly used patterns, I get worried
- But I think if our overall rules are simple enough, we can get the word out to make code comport with them, particularly it automated testing is possible
A complication that has been troubling me lately.
- C itself has undesirable semantics that are then copied over by LLVM
- for example, pointer comparisons have rich semantics that are not just “cast to integers, compare those integers”, as you might expect
- this might imply that to get the safety and simplicity I want, we ought to compile a comparison between two pointers as a comparison between integers
- this might in turn inhibit LLVM’s optimizations, of course
- from what I can tell, there hasn’t been a thorough analysis of the undefined behavior in C and how vital it is for optimization, but we need to start cataloging what is out there (I know there have been some refrences sent along…)