Proposal: eliminate wording "memory safety" and "thread safety"

Actually I am not such a big fan of this t-shirt, it makes UB much more "mysterious" than it has to be. Also see this talk. (Well, truth be told I like the shirt but not the message. :wink: )

Basically, there is a contract between you and the compiler when writing code in a language (not different from the contract you might have when using a library API or a network protocol), and UB is just a different word for a contract violation.

I don't think this is correct. Rust without Rc and mem::forget actually does not leak memory. Well, there are probably a bunch of containers that are not panic safe in this model, i.e. they skip destructors on panic, but that can be fixed. In fact let's just drop all containers except for Vec, and lets reduce that to the core API (e.g., no drain, which I believe can leak on panics). This is a leak-free language. And then we can implement a Turing machine in that language to show it is Turing complete.

Arguing with the halting problem makes no sense. Deciding whether a program dereferences a NULL pointer is also equivalent to the halting problem and that doesn't stop us from preventing this with static checks. We are just willing to also have rustc reject some correct programs. Similarly, our type system does not have to exactly characterize leak-free programs.

7 Likes

This is turning into a pure semantic argument. We both know that it’s possible for any application, regardless of whether it’s implemented in a GCed language or even a language with a relevant type system, to have a bug that causes its memory usage to grow without bound. The only question is whether we count scenarios in which memory is reachable, but not actually needed, as memory leaks. I obviously can’t prove that my definition is truer than yours, though I can point out that since Wikipedia counts deregistering object listeners and setting GC’ed pointers to null as techniques to avoid memory leaks in GC’ed languages and the GMail team are documenting the “three snapshot” technique for detecting memory leaks in JavaScript, I’m not the only one using a definition of memory leak that includes reachable but unused memory.

2 Likes

That talk looks interesting, and I will definitely watch it! I wear the shirt as a sarcastic response to attitudes exemplified by the "nasal demons" saying and by conversations like this wherein the community expresses outright hostility to the suggestion of trying to figure out what actually happens for a particular instance of UB.

More specifically, it is a violation of the contract set out by the operational semantics of said language resulting in your program having no defined semantics.

The talk is pretty good and there are many valuable things for us to take from that talk, but I find Carruth to be a bit too excusing at some points. A compiler may actually delete your files when UB happens, not because some compiler developer thought it would be funny, but because:

if some_condition {
    loop {} // let's pretend it's UB to have a non-productive infinite loop...
//  ^------ Compiler: Well, this is UB, I'll remove this since it cannot happen.
}

delete_file(location); // Ooops.

That said, your compiler obviously wouldn't manifest black holes or unicorns because your computer physically cannot.

It's fine to consider what a compiler may do to your program when it has no defined semantics, but you should also be cognizant that this may be exclusive to that implementation (for Rust, this may manifest itself when we have different backends, e.g. LLVM and cranelift).

4 Likes

… or an updated release of the same backend, as in LLVM 7.1 → LLVM.8 .

6 Likes

Having watched the talk, I think you and I are mostly on the same page about UB; again, the shirt is sarcastic. (Have you read the original blog post that goes with it? It cites Linus arguing, quite sanely, that behavior guaranteed by the implementation should not be considered "undefined" by the programmer, regardless of what the standard says.)

I don't think the speaker gives enough thought to the idea that the standard's definition of "undefined behavior" is....well...flat-out insane, and that "unspecified behavior" could actually be used more frequently in the standard, even for conditions that should be considered programmer error.

I think the C++ definition of UB ought to be more similar to what I thought it was when I wrote this (wildly controversial) question. Of particular interest in that thread are the comments (on the question itself and on quite a few answers) by user supercat.

TBH that doesn't sound like we are mostly on the same page. :wink: I think UB meaning "anything can happen" is a reasonable approach, you just need a precise definition of which conditions lead to UB. Also see my blog post on the topic.

Do you mean this one? I don't think I did, thanks for the pointer! I'll read it more carefully later. I think I have seen it before but clearly I need to re-read it, then.

If by "guaranteed by the implementation" he means "the current implementation happens to do this", then I find myself very strongly disagreeing with Linus.
If he means "the documentation says that this is and always will be guaranteed", then sure, relying on it is fine. At that point you are writing "GCC C" instead of "ISO C", but there is nothing inherently wrong with that (as long as you don't intend to ever switch compilers...).

I do agree with this one. Rust's approach to integer overflow is a good example for this.

That said, sometimes I think we should also have unsafe integer arithmetic operations that actually do assume to be non-overflowing, because there are some cases where that can help the optimizer a lot. In a language as relying on the optimizer as Rust, having a good interface for the programmer to communicate assumptions to the optimizer is crucial. Making such assumptions the default for all arithmetic is indeed "flat-out insane"---but just having such an interface means you need to say things like "violating these assumptions is UB", and that UB must be of the form "literally anything may happen" (aka "you just assumed false" in logic), otherwise it is not useful.

So maybe we actually agree more than I thought: many cases of UB in C++ ought to not allow anything to happen. But then instead of weakening what programs with UB can do, I'd just not call these UB any more (but "relying on implementation-specific behavior", or so)---because we also do need a kind of UB that does allow anything to happen, and "UB" is how the entire world calls that, so I'd say let's stick to that terminology.
In Rust IMO saner choices were made wrt. what is and is not UB, so for the things that are UB, we really do want "anything can happen"-style UB.

4 Likes

Any suggestions for good examples of places to use these? I'd be happy to pick up this PR again to add them...

Well, that made me smile!

I had actually read that some time back, but forgotten that you had written it. I think this is the key point on which Chandler Carruth, you, and I seem to agree with each other but disagree with the (rest of the) C & C++ ISO committees:

As far as I can tell, this seems very different from the way the committee (or, indeed, the community) thinks about UB. I seem to remember someone (John Regehr, maybe, or perhaps someone he cited?) doing a survey on various types of UB to determine if software developers were aware of them and/or whether they recognized them as dangerous patterns; the responses seemed to show a surprising lack of interest in tools for helping find and fix UB.

I suppose the thing I really find nonsensical about the discussion around UB is that it seems to be promoted as a tool for compilers to infer preconditions in ways that are surprising to the programmer. Yes, eliminating null checks can make your code faster, but UB is a terrible way for the programmer to communicate to the compiler that a condition cannot occur, because we lack the tooling to ensure that programmers only ever do this intentionally.

That was what I understood him to mean. He brought up the GCC manual and the fact that Linux only targets the GCC toolchain (and makes use of other extensions).

2 Likes

Chandler has one in his talk.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.