Process for unsafe code guidelines


#1

RFC 1643 lays out a plan for creating guidelines for what is and is not legal unsafe code. The idea is to develop the guidelines over time, with a specific “strike team” charged with driving the process along, in conjunction with the lang team – as is typical for Rust, though, the discussion will be open to all, and the role of the strike team is primarily to help steer the discussion towards consensus. The RFC has not yet been accepted but I expect it will be.

I expect that the first goal will be to try and explore the space of possible rules at a high-level. This will require identifying interesting examples of optimizations as well as code that ought or ought not to be legal, and then evaluating how different strategies might fare on these examples.

What I was hoping to do here is to discuss how and where to have these discussions. For example, should be do this via threads on discuss, or GitHub? I’ve generally found it much easier to keep up with discuss (it’s notifications are better, it allows quoting, etc), but I had the thought that it might be very nice to create a specialized GitHub repository for the entire unsafe code guidelines process (something like https://github.com/nikomatsakis/rust-memory-model).

If we were to try that, I imagine then that we could use files in the repo to represent both examples and other information, as well as proposals. We can open issues to discuss specific aspects of various proposals, or PRs for alterations we are considering, and perhaps adopt an FCP-like process to help drive decision-making.

One thing I had considered is that for larger discussions we can just open a thread on discuss and link to it from GitHub (perhaps closing the issue), so that we can use GH to organize the conversation, but Discuss to help track where replies are needed. That might be overkill though.

Another option is just creating a custom category on discuss – or doing that in addition to the stuff above.

Thoughts?


#2

I agree that Discourse is much better suited for discussions than GitHub. I recently have been frequently frustrated trying to reply to various thoughts in GitHub, in particular because of an almost total lack of support for quoting. Maybe we can have a category here to mark all threads related to unsafe code guidelines / memory model?

I guess a GitHub repo would be useful for collecting results / decisions (like RFCs) in the repo itself, but having issues disabled and redirecting people here for discussion.


#3

Yes I can easily make a category specific to this discussion.

Hmm. I certainly like the policy of redirecting discussion over to this repo – I imagine that we can just have PRs as normal etc, but they mostly contain links to discuss threads or something. That part is a bit vague to me. I don’t want to disable issues altogether because I think it’s a useful way of tracking outstanding problems and questions and so forth.


#4

Here are some of my own thoughts:

  • Do not mark anything as UB unless one can provide a real program (not a microbenchmark) on which the optimization that it invalidates makes a real, substantial difference. C and C++ violate this rule, badly, which forces programmers to spend time appeasing the compiler instead of source-level optimizations that are much more effective.
  • Especially do not mark anything as UB if it forces programs to be made slower than they would if the behavior was well-defined. A good example is strict-aliasing in C and C++: strict aliasing requires that unsigned char buffers be copied(!) before being passed to functions like strlen that take char* parameters. The copy (and the heap allocation that might be needed to hold it) are almost certainly much more expensive (in terms of execution time, much less programmer time) than passing -fno-strict-aliasing to the compiler (which, along with -fwrapv -fno-delete-null-pointer-checks, is in my CFLAGS and CXXFLAGS).
  • Do not mark something as UB if it forces contorted and difficult workarounds. The aforementioned example (strict aliasing requiring a copy) is one. Another would be disallowing a *mut pointer aliasing an &mut or & pointer, even if the raw pointer is never written to (or, in the *mut case, read from) while the reference is in scope. Other examples are probably in memory allocators, kernels, and garbage collectors.
  • Finally, make sure that anything that needs to be done can be done without invoking undefined behavior. That includes such dangerous behavior as reading an integer from a file, casting it to a function pointer, and jumping to the result! Yes, under normal circumstances this is a dangerous security vulnerability – but for a dynamic linker it might be exactly what is required. A more mundane example is jumping to JIT-generated machine code. It also includes various types being treated as essentially an array of raw bytes or machine words – see many garbage collectors.

#5

Could you start a new topic for discussing specific implications of the memory model? This thread is discussing the process, not any details. (If you have specific examples you think are important to preserve, it would be helpful if you could write them out in Rust and post them; we’re still gathering examples to guide the discussion.)


#6

So I made this category, for now: