Discussing high-level direction


#1

So I’ve been thinking about how to approach this unsafe code guidelines discussion. I think it’s great that we’re focusing on particular issues on the repo (and I’ve got a bunch of those discussions to catch up on…) but the conversation generally feels a bit “in the weeds” to me right now. Basically I find that I cannot see the forest for all the trees right now. I think we need to spend some time trying to “tile the space” of the “philosophy” that we will use to judge whether code examples / optimizations are to be considered legal.

To that end, I wanted to propose a couple of prongs of investigation:

  • First, documenting and proposing higher-level models. For example, I wrote up a K-model (K for “Kind” of issue) describing the Tootsie Pop model and some of its known shortcomings (mostly I just referenced my blog post). I would like to see @arielb1/@ubsan write up a summary of the model that they floated here; I’m also happy to give it a shot, based on my conversations with @arielb1. I have some thoughts about another possible approach that I will try to write-up (don’t have a cutesy name for it yet).
  • Second, continued exploration of unsafe code patterns that are used in practice, by which I think we will judge models. I created issue #18 to help focus this exploration.

Finally, I think there is some place for discussions of invariants or high-level principles. At the moment though I have to run so I can’t devote a lot of energy trying to figure out a good starting point for that discussion. =) An immediate example that leap to mind is whether the presence of an unsafe keyword ca affect whether code is legal, etc. It may be though that it’s more valuable to discuss these in the contexts of the higher-level models.

Thoughts?


#2

I realized the error here. I think what I am really looking for is some general discussion of our priorities and goals. For example, here are some goals of mine:

  • easy for end-users to know if code is right or wrong
    • as a rule of thumb, imagine you have some (unsafe) Rust code, and you know that – if it were compiled in a naive and non-optimized way – the resulting assembly would execute and do the right thing; then it should be simple to decide if the Rust code itself is correct (i.e., that it obeys the additional correctness rules)
    • put another way, once we have an elaborated set of rules, we should be able to take all the E-code-example issues and very quickly categorize them as “legal” or “illegal”
  • optimizable
    • in safe code, we should be able to take advantage of all the extended aliasing information that Rust’s type system offers us;
    • in unsafe code, we should be able to easily inform the compiler about aliasing, to enable users of unsafely impl’d abstractions (esp. things like Vec and HashMap) to achieve full performance

Of course, achieving both of these simultaneously may not be possible. When in conflict I probably lean towards safety, not performance. But here are some wildcards that can change the equation:

  • testable, at least dynamically, but maybe statically with lints and extended annotations
    • I’m starting to get converted over to the idea that we ought to only accept rules that we can test for dynamically, even if that test comes at high overhead; this might make relatively complex rules less problematic
  • complexity opt-in
    • if we can design rules that contain a very simple subset, that might let people start out with something straightforward, then add aliasing info progressively

Finally, some things I might be willing to sacrifice:

  • compatibility with existing unsafe code
    • I don’t think we’ll be able to accept all unsafe code (I mean, it’ll compile, but it might change behavior as we add optimizations to the compiler)
    • If we are breaking commonly used patterns, I get worried
    • But I think if our overall rules are simple enough, we can get the word out to make code comport with them, particularly it automated testing is possible

A complication that has been troubling me lately.

  • C itself has undesirable semantics that are then copied over by LLVM
    • for example, pointer comparisons have rich semantics that are not just “cast to integers, compare those integers”, as you might expect
    • this might imply that to get the safety and simplicity I want, we ought to compile a comparison between two pointers as a comparison between integers
    • this might in turn inhibit LLVM’s optimizations, of course
    • from what I can tell, there hasn’t been a thorough analysis of the undefined behavior in C and how vital it is for optimization, but we need to start cataloging what is out there (I know there have been some refrences sent along…)

#3

Here are my ideas of some high-level principles:

  • Does using pointers instead of integers change semantics? (I think this is obviously yes.)
  • Does using references instead of raw pointers change semantics? (I think it is desirable, but I am not sure how to do so.)
  • Does naming lifetimes change semantics? (I am not sure, is there a convincing example here?)

#4

You mean aliasing-wise, not the lack of nullability and the guarantee of one valid value behind the pointer, right?


#5

Actually, some versions of my model have raw pointers and integers being treated identically - after all, you can do comparison operations on pointers to “launder” things, and you can create pointers “out of thin air” using ptr::read.


#6

Yes, these are good questions. Speaking personally I think I am not ready to say either way for a lot of these sort of “invariants”. They are basically somewhat abstracted code examples, and I think often I could get behind any answer, as long as we think the overall story is simple enough. But I like the idea of trying to distill the code examples into one-line questions of this kind.