Next steps for the unsafe code guidelines

It’s no secret that the “unsafe code guidelines” process has been kind of stalled out lately! I’d like to try and get things going. Honestly I think what we need most right now is a certain amount of legwork – or, more specifically, research work.

Before anything else in this message, then, I want to propose that we organize a synchronous chat to try and get organized (either over IRC or as a voice/video chat). Specifically, I want to try to avoid wrangling about details of the rules, and focus on how we can start to make slow but steady work on the research that lies before us.

As I once proposed in an earlier thread, I think some synchronous meetings may be helpful. To that end, I have a doodle poll with a few possible times next week; please fill it out if you are interested in participating. I don’t feel this has to be limited to people on the unsafe code guidelines strike team necessarily. However, if you do sign up, I view it as a (tentative) commitment to be involved and meet on a regular basis (but not every week or anything like that).

What do I mean by research?

It seems to me that we still have a fair amount of basic questions that we haven’t really answered, though we’ve investigated a good deal:

  • What kinds of optimizations does the compiler currently do?
  • What kinds of unsafe code do people write in practice?
  • What kinds of optimizations do we want to enable and what do they require?

This is what we have been trying to do with the rust-memory-model repository, and there is indeed a lot of good information on there. But I think there are some flaws, too:

  • The issue format is very disjointed. It’s too hard to go over and read those threads and extract information.
  • There is no “greater narrative”; no way to easily get up to speed on some of the crucial questions and tradeoffs.

To some extent, this is the usual problem we always have. We need people to “summarize”. I feel like a good format for the repository might be that issues exist to track open questions, or requests for information, and that we use files committed in the repo to track summaries and collect results. There are existing files that are along these lines; we could maybe build on that format, or find another. Ideally, we can have files for individual questions, and then larger reports that summarize the summaries and highlight the most important questions.

So I see a lot of work to organize here:

  • Making sense of the existing content on the rust-memory-model repository
  • Investigating real-world code
  • Documenting the cases in which the compiler will currently supply annotations and hints to LLVM and our best guess at what they actually mean
  • Documenting some of the common needs of optimizations (I’d probably start by trying to mine various research papers or articles here)

Deciding on principles

This is somewhat in constrast with the previous point, but it is probably reasonable to try and put forward and debate some “guiding principles” (though I’d rather do this a bit more async, I suspect, so as to have a broader and more reasoned discussion). I feel like we’ve never really tried do this. I wouldn’t consider any agreement here to be truly meaningful, but it may be helpful for guiding other work.

For example, I am still pretty enamored of trying to achieve an executable specification for when “undefined behavior” occurs, at least to the extent that we can (this is presuming of course that we want to have the concept of UB, but it seems like that ship has sailed, to some extent, thanks to us using LLVM and so forth). To that end, I’d love it if the examples and things we come up when doing research can be fully formed Rust programs that execute to produce (or not produce, as the case may be) undefined behavior. As a bonus, once we are making progress towards an actual validator, this would allow us to test the models by running them across the tests.

But there are some other big questions before us that I would like to make agreement on:

  • Should we distinguish “safe” vs “unsafe” code in the rules in any way?
  • Well, I’m not sure what others just now, but probably more. =)

That’s it.

OK, those are my thoughts. What do people think? Is this the right way to make progress? Wrong-headed?

3 Likes

One of the things I’d most like to see would be what are the operations which require the unsafe keyword to be used. And ideally, what are the minimum semantic guarantees that need to be provided by each to have them be useful at all.

The reason for this is that I don’t view Rust as having unsafe ‘code’ as much as I view Rust as having operations that require being within an unsafe block. Code that happens to be inside an unsafe block which can occur outside an unsafe block, isn’t unsafe code… it’s safe code that just happens to be inside an unsafe block. I’m not suggesting that everyone has to view it this way, but this is the lowest common denominator viewpoint upon which everyone can agree, I think.

Based on the doodle poll, it seems like Tuesday @ 10am is the best time. The only thing is that I’d like really like Robert Krebbers to be able to participate, but perhaps I will catch him up at some other time. For those of you who don’t know him, Robert Krebbers is the author of a formalization of the C standard in Coq and quite familiar with the standard as well as this problem space. He’s taken an interest in Rust and I’m trying to draft him into helping us along. =)

1 Like

Unfortunately, more work just turned up that has to be done on Tuesday… I will definitely attend, but right now I am hoping this is not going to take longer than an hour. Sorry for this, this week is just really bad timing.

Hmm, you’re the second to tell me such a thing. I wonder if we should just defer to next week?

I would be very much in favor.

OK, didn’t get a lot of feedback here, but let’s try to defer to next week, as I might reach out to a few more people to see if they want to participate. Here is a new doodle poll. I added more days and options this time.

http://doodle.com/poll/pshrgupntzv8fkcc

cc @eternaleye @ubsan @asajeffrey --- still lots of people who have not signed up here

OK let’s do Thursday March 9th at 2pm Boston time (EST). It’s the only time that works for everyone who signed up, it seems. =)

Probably google hangouts is the best “mass communication” medium here?

Is it possible to join a Hangout without a Google account?

never mind, let’s use this: https://meet.jit.si/rust-unsafe-code-guidelines

be there in 2mins :slight_smile:

because I am wildly optimistic about web tech, let’s try https://talky.io/unsafe-code-guidelines

2 Likes

So, after numerous attempts, we wound up falling back to IRC. How shall we talk in the future? IRC is not that bad, but it’s also got some limitations.

In any case, here are some vague notes in an etherpad. We discussed the fate of the rust-memory-model repository. I think the consensus was that we should move towards a set of executable .rs tests (with a main() function) that demonstrate UB or lack thereof for various patterns. Each test would have copious comments and be organized in some sensible and TBD way.

I proposed toward the end that we should try to get started by attempting to canvas “real world” unsafe code in the wild. It seemed that there was general agreement. Therefore, I plan to try to assign to each participant (and anyone else who wants!) a crate or two to look at. There are some notes in the etherpad about what to look for, but here they are reproduced (and any additional comments would be fine):

  • Things to watch out for:
    • “escaping” – e.g. functions like fn foo(x: &u32) { .. }, where the memory at *x is somehow used after foo returns
    • aliasing between raw pointers and references – e.g., a *mut u32 and &mut u32 both pointing at same memory, used in intermingled ways
    • new capabilities that could not be modeled in safe code (e.g., mutex, rayon)
    • hidden assumptions in unsafe code (e.g., rayon assumes dtors of local variables execute)
    • type-based aliasing violations
    • “too strong” types – as in RefCell, declaring a &T when in fact the value is not always safe to use for the lifetime
    • “uninitialized memory” – how does it get used?
    • transmutes of all kinds

The idea would be to gather up notes and then meet to discuss what we found. I think this could be helpful in terms of helping us to restructure the repo too.

Thanks everyone! When should we meet again? This time slot isn’t great for me as a repeating thing (it overlaps with a regular meeting I have), but maybe we could do something somewhat earlier? Also, how frequently should we meet?

4 Likes

OK, I failed to get organized this week, but here are some initial assignments:

Anyone else who wants to participate, pick a crate from issue 18 and leave a comment here.

Maybe we can meet again on March 30th with some observations? (1 week)

I was planning on talking about Servo’s SM bindings, since that kicks up all kinds of fun.

er, that’s what I meant. Sounds good.

I recently proved Rc sound in my model, so it’d probably make sense for me to take that one. I think in terms of “things they do with borrows and ownership”, Rc and Arc are fairly similar anyway?

There is a subtle detail about the implementation of Arc: decrementing the reference count is effectively equivalent to freeing the allocated object (since another thread might come in immediately afterwards and do just that). However, this decrement is done while holding a &AtomicUsize reference, which in theory is supposed to remain valid after the decrement.

Since we are largely ignoring multiple threads to start, seems reasonable, though @Amanieu raises an interesting point.