[Pre-RFC] Another take at clarifying `unsafe` semantics

Doesn’t feel ready for an RFC. Still a little unclear on what will be accomplished for the amount of churn.

1 Like

Unfortunately, there is no difference from being called from inside and unsafe block and returning inside an unsafe block. In both cases, the call to the function must be wrapped in and unsafe block (unless you are proposing something like???):

unsafe { let x } = fn( zzzz );

Where ‘fn’ is an unsafe(post) function??? This would seem radically unuseful and confusing (to me) though.

1 Like

This I like. I agree that mixing up declaring an unsafe contract with and unsafe block of code is a mistake in the current implementation.

It is not unsafe to call an unsafe(post) function. All this annotation tells you is that you can rely about some properties of the result (described in the function’s documentation, like all unsafe contracts) for memory safety. If you are not doing unsafe things, then you can safely ignore the annotation.

In this sense, unsafe(post) is like const fn: a contract which sets requirements on the implementation, which the user may or may not rely on.

This is very different from unsafe(pre), which sets requirements on the user, which the implementation may or may not rely on.

1 Like

Is this true? Unsafe code is specifically not permitted to rely upon safe code being implemented correctly for correct behavior (lack of UB). Should unsafe code be allowed to rely upon other (far distantly implemented) unsafe code to have been implemented correctly in order for it to not have UB? This doesn’t “FEEL” right. I think this needs some exploration.

1 Like

Then, why:

1 Like

Send and Sync are two examples of prior art. Unsafe code is allowed to rely on Send types being safely sendable across threads, and on Sync types being safely shareable between threads, in spite of the Send and Sync implementations for the type being defined kilometers of code away.

It is not unsafe to call an unsafe(post) function, but it is unsafe to implement one, because a bad implementation of an unsafe(post) function will cause memory/type/thread unsafety. This is the same reason why implementing Send and Sync is already unsafe today.

The rule which I found to formalize this requirement in a way that is symmetrical with unsafe(pre) is to mandate that these function’s return statements be within an unsafe block, and thus characterized as potentially unsafe operations, like this.

unsafe(post) fn trust_me() -> usize {
    /* Do something inconsequential */

    // This is the part where it gets dangerous
    unsafe { 42 }
}

The rationale is that return statements in unsafe(post) functions propagate safety-critical information to the outside, and are thus unsafe operations. But I’m not 100% satisfied with this rule, and am open to bikeshedding it. As an even more minimal proposal, perhaps the unsafe(post) annotation on the function implementation is enough?

1 Like

OK, I think I’m able now to more clearly clarify my discomfort with this proposal and offer a possible enhancement that I believe could make all of this more worthy of the churn it imposes. I can best sum it up as, “Spell out the contract!”. What I mean by this would be to include some meta-language for specifying pre-conditions/post-conditions, that would server 2 purposes:

  1. Clearly document the contract requirements in a way that can both be consumed by documentation generation in a standard fashion, AND
  2. has the POTENTIAL to at some point be statically solvable and/or run-time optionally (debug/test only perhaps) enforced

A quick straw-man syntax/idea might be something like:

unsafe ( pre: assert ( ...some predicate DSL... )
             post: assert ( ...some predicate DSL.... ) )
fn ( .... ) -> Result

The “some predicate DSL” could be a mini, extensible language that allows expression conditions about the inputs and outputs. It could even allow an “Escape Hatch” like clause that is effectively, “We have no way of expressing this condition in the DSL, but, here is some normal human language that describes the necessary contract”.

That way, we can gradually evolve this DSL to support more and more of the kinds of necessary predicates and eventually work towards linting or otherwise statically analyzing whether or not they have been met.

Again, this is all very, very straw-man at this point and I would have to put significant effort to clarify further (I realize). I’m not an expert on this sort of thing, but, if I had my druthers, this is the direction I’d like to see pursued.

Perhaps this is an entirely orthogonal concern to what you are proposing though.

With this DSL idea, the compiler could parse the DSL predicates, determine (at some future point) if it has the potential to statically solve the predicate, if it CAN solve the predicate, and in the context of the call to the function it is able to verify that the predicate has been met, an “unsafe” block around the call isn’t required. However, if it can’t (yet) verify the predicate, it could spit out an error if no unsafe block is present and display the predicate that it can’t verify. If the predicate includes the “Escape Hatch” predicate, then the compiler obviously can’t verify and will output the the predicate for sure with an error if the unsafe block isn’t present.

If it can’t verify the predicate, but, and unsafe block is present, it would simply output a NOTE/REMINDER (like a WARNING, but, not classified as a warning) that simply reiterates the necessary predicates.

If the call is inside an unsafe block, and the predicates CAN be verified, it could warn about an unnecessary unsafe block.

Why this last? Because, then you will get an error in compiling if you break some code elsewhere that messes up the previously verifiable predicate.

To clarify further, with these predicates there are several possibilities for the compiler to find at the site of a call to an unsafe function (similar argument applies to unsafe returns):

  • predicates not specified - unsafe block required/lint on function def advising to add preds
  • preds specified, but, not statically solvable (aka compiler support lacking, escape hatch predicate used) - unsafe block required for call
  • preds specified, statically solvable (with one of the following possibilities):
    • compiler can prove at call-site that predicate not met - COMPILER ERROR
    • compiler can prove at call-site that predicate is met - UNSAFE block not required and is NOTE/REMINDERED against (on second thought, prob should be a compiler error as well – I’ll explain below)
    • compiler cannot say for sure whether or not the predicate is met - UNSAFE block REQUIRED with output of a NOTE/REMINDER of what predicate(s) are being unsafely relied upon

I think the middle sub-bullet point above is the interesting one. Because the compiler can prove statically that you’ve met the predicate, it should be a compiler error to wrap in unsafe block. Why? Because that way, if you later change the code to break the predicate, you’ll get a new compiler error.

1 Like

Ah…Yes. That makes more sense. I misunderstood.

EDIT: That being said, I agree with @jmst, “What’s th point?” What function would you not want to guarantee that its post-conditions are met?

I think this could be a possible backwards-compatible extension to this proposal, that could be added to the proposal’s list of future possibilities. It is not clear to me if it would be needed on day one, as long as we plan ahead for it.

I don’t think “unsafe(post)” on non-trait functions makes any sense.

Unsafe code could want to rely on the advertised behavior of ANY function defined in ANY crate, and there is no way to predict whether a given function would want to be used by unsafe code.

For instance, what parts of the standard library should be tagged as “unsafe(post)”? (the answer of course is all of it, which makes it pointless)

1 Like

That seems to me like it stems from an orthogonal issue. Let me elaborate:

Danger is inherent to unsafe code. (Well, that statement is a bit redundant, isn’t it.) However you twist the syntax and/or the semantics of the language, writing unsafe code (viewed either from the “consumer of unsafe abstractions” or “author of unsafe abstractions” side) is always going to require more thinking and vigilance than usual. Yes, this can be true even for very basic concepts and operations. But I’m sure that is not a unique feature of programming in Rust. If you have studied some sort of scientific discipline at a (say) intermediate level, you probably very well know the feeling of sitting on a 1-line maths of physics problem for a couple of hours or days before being able to grasp an otherwise basic idea.

I don’t want to get too philosophical or generally off-topic here, but the problem is that our monkey brains are just not designed to do what we are doing with them. :slight_smile: So at the end of the day, implying that intrinsically hard problems are only hard because we are not approaching them ergonomically enough and not because they are hard looks like a fallacy to me.

the amount of people who are constantly confused about what is perhaps the most dangerous feature of the Rust language worries me.

I understand and agree with that; what I am saying is that this is an inherent issue, and not something that can be mitigated completely or nearly completely by improved language design. Either interpretation of unsafe means “give up some safety guarantees”. That’s not going to be pretty either way for the reasons described above, and at the same time, the complexity introduced by the language changes laid out in this pre-RFC is, in my opinion, does not have a good enough investment-return rate. I’m not saying that ergonomic improvements to unsafe would be completely useless – I’m just trying to argue that the advantages they would bring to the table wouldn’t be worth the increased complexity.

In the world of life-threatening systems, the more dangerous a tool is, the harder you make it to use it incorrectly. Knives have a handle shaped in such a way that it is pretty clear how you are supposed to pick them up.

I absolutely agree with all of this too; in fact I’m generally the biggest fan of designing systems so that they have the least chance to be used incorrectly. However, and I don’t want to repeat myself over and over again, but the very purpose and nature of unsafe code renders attempts at making it safer very hard to achieve (I would even say contradictory).

Basically, what @gbutler wrote:

You made three other very good points:

Unsafe Rust should never trust Safe Rust’s abstractions, yet there is no safeguard in place to lint against reaching for these.

I’m always in for more compiler warnings and lints! This, I think, is probably the easiest issue to address. Adding more rules/patterns to be recognized by the compiler or e.g. Clippy is an easy and in my opinion uncontroversial action, and we should go for it. If it’s likely wrong – warn about it!

As you can see, I’m in favor of these sorts of improvements as long as they don’t offset the balance towards just adding more stuff to the language in an unhealthy manner. A core language addition is a scene which always requires extremely careful consideration, and in 99.9% of the cases, the right thing to do with a new feature is to omit it. Compiler warnings and linter rules are nothing like it, they are basically free and completely safe (“safe” here meaning “does not break earlier code/concepts/assumptions or introduces bugs”).

Yet the method’s entire implementation is automatically turned into an unsafe block, silently enabling all kind of unsafe operations without any compiler warning/lint.

It’s not exactly silent, because it has a big honkin’ unsafe at the beginning. Still, I understand your concerns, I do think they are valid, and indeed, it would absolutely make sense to be able to mark a function unsafe without turning the entire body into an unsafe block. This is yet another change that likely wouldn’t need any sort of core language addition. It’s merely that the compiler should consider the body block safe rather than unsafe. If I recall correctly, that literally means flipping a one-bit flag in the compiler. (Maybe in several places, but you get the idea.)

TrustedLen is another example of unsafe abstraction design gone wrong. […]

That’s right too! I’m in favor of replacing TrustedLen with something more explicit and harder-to-forget/abuse. I still strongly doubt that this can only be done using, or that it is best achieved by, additions to the core language.

Finally:

People who, in addition, try to build unsafe abstractions which other developers (including future maintainers of the same crate) can rely upon.

Indeed, that puts the issue in a somewhat different perspective, although my comments about how more syntax wouldn’t help much still apply.

2 Likes

This is my feeling as well. No matter how much I WANT there to be a special syntax or keywords that makes all of this clear, nothing seems to fit the bill.

1 Like

It will take me a while to fully process and answer the comments from last night (thanks everyone for your interest and feedback!), but in meantime, I thought I would like to throw in a possible alternative to this pre-RFC that sprung in my mind while reading @H2CO3’s post, on which any thought would be appreciated.

It seems to me that one common theme in this discussion is that the proposed semantic clarifications are often evaluated to be useful, but not enough to justify a churn-inducing language addition at this point in time.

The obvious way to answer this kind of feedback is to provide a library-based implementation of the proposed concept, which allows experimenting with it today and gaining experience of it, to make a better future decision about whether a future language integration would be worthwhile.

So far, since this is touching rather deep language semantics I have struggled to come up with such an implementation. But I think I just had an idea which might be heading in the right direction. Like all library-level implementations of something which really should be implemented at the language level, it is pretty ugly-looking and limited in some ways, but it might still be usable enough to be viable for real-world projects.

Tentative library-level translation

First unsafe(pre) fn sketch

Unsafe function preconditions are often, though not always, about parameters. Often, as discussed in the static analysis section of the pre-RFC, they even target a subset of the function’s parameters.

As it turns out, unlike tagging whole functions as unsafe(pre), tagging individual function parameters as unsafe(pre) can be quite easily done without help from the language. All we need is a suitably designed wrapper type for the parameters:

struct UnsafeData<T>(T)

impl<T> UnsafeData<T> {
    // Need an unsafe block to attest that the contract is respected
    unsafe fn new(inner: T) -> Self { Self(inner) }

    // Accessor methods are unimportant and may be freely bikeshedded,
    // a realistic implementation will probably want to use Deref, or
    // even make the whole type pub.
    fn unpack(self) -> T { self.0 }
    fn get(&self) -> &T { &self.0 }
    fn get_mut(&mut self) -> &mut T { &mut self.0 }
}

// UNSAFE PRECONDITION: "dangerous" must be equal to 42.
fn unsafe_pre(dangerous: UnsafeData<usize>, safe: isize) {
    // Some boilerplate is needed on the callee side because this is
    // a type system hack rather than a language feature. But most
    // importantly, the body of the function is not implicitly unsafe.
    let dangerous = dangerous.unpack();

    // Do something with the parameters, with or without using the
    // unsafe precondition that the input UnsafeData provides
}

fn call_unsafe_pre() {
    // I hereby testify that I understood the contract of unsafe_pre()
    let dangerous = unsafe { UnsafeData::new(42) };
    unsafe_pre(dangerous, 64);
}

If a precondition is not about a single function parameter, but about a relationship between multiple function parameters, then we can state this by putting the parameters together in a tuple or struct and giving an UnsafeData<ParameterPack> to the function.

// UNSAFE PRECONDITION: The provided index must be in range for the
// provided slice for this code to be safe.
fn unsafe_coupling(coupled: UnsafeData<(&[u8], usize)>) -> u8 {
    let coupled = coupled.unpack();
    unsafe { coupled.0.get_unchecked(coupled.1) }
}

Then there is the problem of unsafe preconditions which are about “ambient state” (the hardware, the operating system, global variables…) rather than unsafe parameters. We can encode them in the type system by using a marker type which actually contains nothing, but is unsafe to create nonetheless:

struct UnsafeContext()

impl UnsafeContext {
    // Again, we need unsafe to attest that the precondition is upheld
    unsafe fn new() -> Self { Self() }
}

// UNSAFE PRECONDITION: Do not use this function outside of April 1st
fn april_fools(joke: &str, _date_checked: UnsafeContext) {
    println!("{}", joke);
}

fn call_april_fools() {
    // Yes, I am sure that it is April 1st today
    let date_checked = unsafe { UnsafeContext::new() };
    april_fools("Rust is getting merged into C++20", date_checked);
}

First unsafe(post) fn sketch, and a problem

These functions promise safety-critical guarantees about either their result or the ambient state at the end of their execution. So we use the same strategy here of using a suitable wrapper type on the result side. As it turns out, we can do this using the same UnsafeData and UnsafeContext notions that we have introduced above:

// UNSAFE POSTCONDITION: Will return "42"
fn unsafe_post() -> UnsafeData<usize> {
    /* Perfectly safe work */

    // I'm sure that this is 42, I have double-checked it
    unsafe { UnsafeData::new(42) }
}

// UNSAFE POSTCONDITION: Only returns on April 1st
fn wait_april_1st() -> UnsafeContext {
    /* Safely wait until the day is right */

    // I have made sure that it is time now
    unsafe { UnsafeContext::new() }
}

fn call_unsafe_post() {
    let result = unsafe_post();
    wait_april_1st();

    // Again, some _safe_ boilerplate is needed on the caller side
    let result = result.unpack();
}

Although they introduce some boilerplate at the boundary between unsafe and safe code, these library-level implementations of unsafe(pre) and unsafe(post) compose nicely with each other…

fn compose() {
    april_fools("Rust 2018 will introduce mandatory garbage collection", wait_april_1st());
    unsafe_pre(unsafe_post(), 64);
}

…however, there lies also a major problem with this first type system encoding of unsafe preconditions and postconditions: it is excessively permissive. We can plug unsafe data from any function into any other function that expects unsafe data of the same type, even if the associated unsafe preconditions and postconditions are wholly unrelated, and all this occurs without using an unsafe block. This is obviously not good.

Take two: encoding the contract

To address this major safety issue, we need to make the underlying contract part of the UnsafeData or UnsafeContext type. A first rough implementation could use simple marker types:

// Unsafe types
struct UnsafeData<T, Contract>(T, Contract)
struct UnsafeContext<Contract>(Contract)

// Marker types representing contracts
struct DataIs42()
struct DayIsApril1st()

// Shorthands to avoid insanity
type DataWithContract = UnsafeData<usize, DataIs42>
type ContextWithContract = UnsafeContext<DayIsApril1st>

By adding suitable trait bounds on the Contract type and modifications to the UnsafeData and UnsafeContext implementation, one could also later extend this implementation into a poor man’s design-by-contract tool, where contracts which are expressible in code are expressed in code, as @gbutler had in mind earlier.

As before unsafe associated trait methods would be handled with the same formalism as free unsafe functions, with added “every implementation must provide this” caveats.

What a library-based proposal cannot address

  • As stated before, like any type system hack, this is much more boilerplate-intensive than language-level concept integration.
  • A library cannot deprecate the old unsafe abstraction vocabulary, leading to confusing coexistence of the old and new approach (some will see this as an advantage)
  • There is as far as I can tell no way to express unsafe(post) traits at the type system level in today’s Rust, so they would need to stick with the current syntax
  • Unsafe invariants cannot be expressed as function inputs and outputs, and therefore cannot be expressed at the type system level. They can be again approximated with UnsafeContext, but the approximation is even more detached from the actual contract.
  • With a library-level implementation, some forms of static analysis such as linting of dangerous existing usage of unsafe code become much more difficult.
2 Likes

A similar concern already exists today with const fn. So far, the answer has basically been "mark fns as const if someone has a need for it and you see yourself upholding this contract forever in the future". I think this reasoning also applies to unsafe(post), but even more so.

unsafe(post) is a strong commitment from the implementor of a function. When you mark a function as unsafe(post), you are not merely stating "I think that this function is correct and have quickly checked it in my unit tests", as with normal functions. What you are now saying is "I am so convinced that this function is correct, for any set of inputs and on every hardware architecture which Rust will ever run on, that I am ready to bet the Rust language's type/memory/thread-safety on it, and to promise that it will remain a safe bet forever in the future".

Effectively, an unsafe(post) function is a piece of unsafe code like any other (which is why the proposed rules force use of an unsafe block to implement it), and as such it must be subjected to the same level of scrutinity and quality standards as other unsafe code, which in sane Rust projects is much higher than that of regular code.

This is why I do not expect every function out there to become unsafe(post), but only carefully selected code which is specifically intended to be used in unsafe codebases, and ready to undergo the stringent level of QA that elicits the hard-earned confidence of experienced Rust programmers in unsafe code.

1 Like

What a great comment! I find myself in agreement with much of what you are saying here, so this will be mostly about clarifying a bit my position and what I am trying to achieve here.

A tale of complexity and complication

I understand that some difficulties of unsafe emerge from essential complexity (i.e. stuff that is fundamentally hard, like proving any kind of theorem about arbitrarily aliased pointers), and that other difficulties emerge from accidental complication (i.e. stuff that is only hard because of usability issues, like unsafe Rust’s “do not trust safe abstractions” rule). Here, I am aiming for the latter kind of issue, and agree that the former kind can only be overcome through teaching, experience, and obsessive amounts of QA.

I do not think that either of us really knows how these two kinds of difficulties are distributed. Is it mostly the former? Mostly the latter? A bit of both? You seem to be partial towards the first interpretation, whereas I seem to be partial towards the second interpretation. In defense of the latter, my experience of software ergonomics has always been that software engineers tend to have a strong psychological bias towards avoiding the “bad ergonomics” explanation, instead preferring to blame either their users or the essential complexity of their product. I know that I am no different, so I consciously attempt to correct this bias by forcing my mind in the opposite direction a bit. Maybe I am pushing it too far, though, you tell me :slight_smile:

Best-case usability matters too!

The hypothesis which I build upon, rightly or wrongly, is that there definitely is a non-negligible ergonomic component to the difficulty of writing correct unsafe code. That there are things which we could do to hint our users in the right direction, and which would come at a relatively low cost when writing correct unsafe code, but which we do not do yet.

I have provided a couple of examples of what I perceive to be low-usability areas in Rust’s unsafe abstraction design, which you have agreed with, so it seems to me that without further proof, we can at least agree that this hypothesis is not completely ridiculous. But you do raise a very good point here, which is that the best answer to a usability issue does not simply make it harder to do the wrong thing. It also keeps it almost as easy, or even makes it easier, to do the right thing.

An unfortunately widespread failure to understand this fact is what gives real-life safety interlocks a bad reputation: many of these do make it harder to do the wrong thing, but at the cost of also making it harder to do the right thing, which is not usually perceived to be a good compromise even when the harm that is prevented is much greater than the harm which is inflicted.

In contrast, well-designed safety systems hint the user away from the error before it has even occurred, ideally going as far as to make the error impossible. Like the little knife design cues which tell you which side of the blade is the sharp one, or the “auto-ignite” gas appliance mechanisms that make sure that you cannot turn on the gas flow without lighting it up.

As you say, Unsafe Rust cannot, by its very nature, reach the “no error possible” usability nirvana. It exists so that if it is written correctly, Safe Rust will be able to. So our role model for Unsafe Rust ergonomics should not be an auto-ignite gas appliance, but a good knife: still dangerous, yet so easy to grasp that you do not need that much training to use it safely. More exactly, it should be a sharp knife: as everyone who regularly uses knives know, dull knives are more dangerous than sharp ones in spite of being theoretically less so, because the frustration of cutting anything with them will lead you to start doing stupid things with your hands. Annoying safeties are worse than no safety at all.

Reducing mental footprint

I think that clarifying the semantics of unsafe abstractions would be a step in the right direction, because from my understanding it actually would make it easier, rather than harder, to implement them correctly.

Today, devising an unsafe abstraction involves a small set of complex decisions. Like “where do I need to write the unsafe keyword?”, or “should I mark this function as unsafe?”. These decisions are few because each use of the “unsafe” keyword has a tremendous amount of power in today’s Rust. But they are also complex because analyzing all this expressive power comes at the cost of a higher mental footprint.

Whenever you write “unsafe” in your code, you need to pause and think “hey, why am I doing that?”. Is it because you want to do something unsafe? Because you assume something that is potentially critical to your code’s safety, or that of someone else’s code? Because you guarantee something that is critical to memory-safety? Or maybe a combination of several of these things? Are you sure that you have thought about all of it? Have you documented it somewhere?

And then, because unsafe does not mandate documentation of the underlying contract (another thing which I wish static analyzers like clippy or rustc’s warnings would lint on), there are chances that other people who will need to maintain that code much later in the future, including yourself, will need to go through this complex thought process all over again. What does each unsafe keyword in this code mean? What is it about? Am I sure that I can fully understand it? Can I trust the comments to tell me everything?

All this mental complexity leads to mistakes, which in the case of unsafe means safety bugs. So I wish we could part away with some of this complexity, by doing what humans always do when encountering big chunks of complexity: breaking it down in smaller pieces.

What this proposal aims at

And this is, in a nutshell, what this pre-RFC is about: cutting the big scary blob of unsafe abstraction semantics into smaller pieces which the human brain can more easily digest, using the opportunity to also augment the semantics of “unsafe” with notions that we have been talking about forever in the Rust community, like unsafe contracts, without yet being able to express them in code.

By making “unsafe” less monolithic, we turn the process of designing an unsafe abstraction from a set of few complex decisions to a set of many simple ones. We are no longer marking methods are unsafe and reverse-engineering what that means after the fact, instead we…

  • …start by putting an unsafe block inside of them (“Rust, please remove the safety net!”)
  • …then realize that we need to make an assumption inside of that block (“Hmmm, I really need that slice index to be in range…”)
  • …then encode that assumption as an unsafe precondition.
  • At that point, ideally, our favorite static analyzer starts linting us that we need to document that precondition. Ah, yes, this is true, we need to do that.
  • Then some development passes, and some time later, we realize that we need to make the same assumption many times, and that we always guarantee it in the same way. So we extract the code which offers that guarantee in a dedicated function, or perhaps a trait if we want to allow for other implementations in the future.
  • “But well”, our helpful static analyzer then starts wondering, “how am I, or your future self and collaborators, to figure out that the contents of this function are critical to memory safety?”. Of course, the analyzer is right. So we mark the function’s output as featuring an unsafe postcondition, and in order to avoid another static analyzer lint, we immediately document what the postcondition is about.

In this model, developing unsafe abstractions has moved from a complex exercise in reverse-engineering the semantics of the language and of your code, into a more iterative process that naturally evolves from the unsafe block, which is the root of all unsafe development workflows, via a stream of small, simple, and self-contained abstraction design decisions.

Certainly, the price to pay for making each of these decisions easier is that there are more of them along the way. But overall, I think that when dealing with limited human brains, this is almost always the right trade-off.

This will not resolve anything!

…nor does it have to. It is intended as a step forward, not as a silver bullet.

The process of making Unsafe Rust easier to understand and use has begun a long time ago, with the first visible milestone being perhaps the publication of the Nomicon. It has gone through a large amount of intemediary steps, involving many blog posts, the Rust Belt project, and the ongoing unsafe Rust guidelines and memory model efforts.

Not every step was a direct success. Some RFCs were rejected, some std APIs like scoped threads had to be discarded in a backwards-incompatible way because even the best Rust devs could not fully tell what is safe from what isn’t. But even these apparent failures turned out to be mere learning experiences, which the community could use to move forward and avoid making the same mistake twice. Now we have sound scoped threads. Well, as far as we can tell, anyway :stuck_out_tongue:

This pre-RFC is intended to be a stepping stone towards a future of easier unsafe development, which of course may never materialize:

  • By clarifying what unsafe Rust is about, it allows Rust’s famed static analysis tools (including rustc’s warnings and clippy) to go further in their analysis than was ever possible before, and hopefully to ultimately help unsafe code developers more at their task instead of less (since, by common agreement, they are the Rust developers who need most help).
  • By being designed for minimalism today and extensibility tomorrow (all the way to much more complex and tantalizing features like @gbutler’s statically checked contracts proposal), it makes gradual evolution of unsafe semantics a viable endeavour, rather than imposing from the start a daunting and impossibly large compatibility-breaking change.

Maybe this pre-RFC will ultimately be accepted, possibly after an indefinitely long library-based probation period. Maybe it will be hard-rejected. Maybe it will be left in limbo forever. Maybe it will be superseded by a better idea. But whatever happens, I hope that this thread will be remembered as a useful contribution to the continuous Rust improvement process of figuring out what Unsafe Rust is, what are its pitfalls, and how we can best avoid them…

But to go back to the core topic, yes, even if accepted, this pre-RFC won’t be enough to kill today’s TrustedLen. All it does is to introduce new abstraction vocabulary that can be used to devise its eventual replacement. And to clearly differentiate safe/unsafe abstractions and lint against unsafe code relying on safe abstractions. And to gradually phase out the “unsafe precondition means unsafe body” footgun, which we cannot just turn off in the compiler right away due to backwards compatibility promises.

2 Likes

Thanks for your detailed response. I think we still disagree on several fundamental aspects, but I see where you are coming from.

Sad but true. However, I am somewhat of an ergonomics enthusiast myself. I cringe when I have to write out a <'lifetime> explicitly. :stuck_out_tongue: Seriously though — I have seen and learned about many pieces of bad language design and ergonomics issues causing bugs. I’ve helped people debug their JavaScript containing == instead of ===, Python being mis-indented, C++ double-freeing through a shared_ptr because the base class wasn’t marked enable_shared_from_this… and so on and so forth. Over the years, I’ve come to appreciate good language ergonomics and despise the lack of it greatly.

However, we just seem to have different thresholds of sensitivity here, and therefore disagree on what needs changing. I think that the current syntax and semantics of unsafe in Rust is just fine (maybe with the exception of automatically marking unsafe functions’ bodies as unsafe), whereas you think it needs improvement.

We also seem to disagree on the issue of granularity. You insist that finer granularity would help learners understand the issues around unsafe code, whereas I assert this is not the case. Why do I think this? Well, here’s my take (it is my own experience mixed with some thoughts by which I try to be more systematic about explaining this):

  1. If you understand the workings of safe Rust, and are aware of the halting problem, you will not be surprised why certain constructs need to be marked as unsafe.
  2. You just ought to master safe Rust before diving deep into unsafe code. I accept no excuses. :slight_smile:
  3. Safe Rust is a very well-designed, self-consistent language with a powerful, orthogonal, crystal clear type system, and a compiler which guides you towards writing the right code.
  4. Therefore, if you know Safe rust, you will recognize the patterns in it, and it will come to you naturally why and when unsafe code was/had to be/will be used. Of course, for the fine details of the how, you may need to consult the Rustonomicon, but my point is, the language doesn’t really contain unpleasant surprises, so even without explicit syntactic differences between the different kinds of unsafe, it’s pretty clear what they do. (More on this later.)

Today, devising an unsafe abstraction involves a small set of complex decisions. Like “where do I need to write the unsafe keyword?”

Excuse me, but I have to disagree with that point too. It is very clear in which situations you have to use unsafe:

  • First of all, you probably don’t. This should be the default answer 99.9% of the time.
  • If you still do, however, then the compiler (in theory, modulo bugs) enforces the use of unsafe around every piece of code that might introduce a soundness hole. That is the whole point in the concept of differentiating between safe and unsafe. The code won’t compile if you don’t mark something unsafe as unsafe. So, in theory you could almost devise a minimally-unsafe implementation by starting out with not using unsafe at all, then placing it around the smallest possible scope and before the least number of impls, until the code compiles. Of course, nobody does this in practice, because we already – hopefully – know which abstractions are unsafe, so we can put most of the unsafes in upfront, but I’m sure you get the point.

Actually, that is the right question. In today’s Rust, the distinction already exists. The unsafe keyword is, grammatically speaking, context-sensitive. It can mean slightly different things based on where it is in the code:

  • Before a block, it means “allow me to use unsafe abstractions here” (e.g. calling an unsafe function or dereferencing a raw pointer).
  • Before a function, it means “this function does not enforce all guarantees of safe Rust, you need to be extra careful when calling it!”. If I understand correctly, the proposed unsafe(pre) and unsafe(post) annotations would further break down this scenario into two different case.
  • Before a trait impl, it means “I am marking this type as having an invariant which the compiler couldn’t prove”.

And so on and so forth. Ouch, that context-sensitivity might sound really bad to some people! I can understand that it is a valid concern. Yet, I found that in practice, since the distinction is already clear syntactically (you don’t confuse impl, fn and {, do you?), it works really-really well once you actually start using it, although it might not seem ideal or even good from the outside.

Isn’t that already how we ought to design unsafe abstractions? I think this sounds much more of an education problem. If, for example, people start by marking functions unsafe instead of trying to hide a piece of unsafe code inside the abstraction, that’s pretty bad regardless of how unsafe is formulated syntactically or semantically. I think this is something that needs to be in the culture, not in the language.

Of course, today we can’t do what you suggested in the 3rd bulletpoint:

because this would turn the entire function body into unsafe. But then again, that’s ~trivial to achieve by one (unfortunately, breaking) change, if unsafe fn just stops automatically considering the entire body to be an unsafe block. I would absolutely welcome this change, because I tend to try minimizing the scope of unsafe blocks myself, in order to reduce the number of expressions I have to reason about without the compiler’s aid.

I beg to differ here, too… I think that is somewhat of an exaggeration, I do not think that the unsafe situation is that bad, or that it’s bad at all. I have seen a lot of bad unsafe code in the wild. (I always grep for unsafe in any crate I plan to depend on, because I have trust issues.) I still think most of it didn’t have the “hard to write and read” problem. I don’t really remember asking myself, “why is this unsafe”? Instead, they had the “it shouldn’t even have been unsafe and could have been written differently” problem.

In other words, Rust programmers should already approach unsafe code gradually, instead of just jumping right in, and it is already possible with today’s unsafe syntax and semantics. I think a much more significant issue around unsafe is, again, that of documentation and culture. I am a former C and C++ programmer myself, and I see several other C and C++ programmers coming to Rust, then using unsafe on ecery second line in order to avoid having to learn the type system, or for “optimizations” which really should be written in a different, safe way. I often feel an urge to yell “That’s not what unsafe is for!” at them. (I usually don’t.) This is something that might be improved by better communication and docs, but ultimately that wouldn’t completely solve the problem either, as there will always be users who just abuse the language, as it is powerful enough to be abused.

Finally, I’d like to provide two alternatives to the concrete syntax you proposed.

First of all, I do like the library-based prototyping approach. I strongly prefer implementing features in libraries instead of incorporating them right into the core. My biggest fear about adding to the core language is that it shifts the entire language and ecosystem towards a very domain-specific paradigm. Hyperbolically speaking, if we unsafe(pre) and unsafe(post) today, we will eventually end up being a contract-based, dependently-typed, pure functional language which is only suitable for automated theorem proving, and has the quintessence of Coq, Agda, Idris, PROLOG, and Midori C# embedded in its macro system. :stuck_out_tongue: In all seriousness, everyone’s favorite niche feature/improvement could be in Rust, but then it wouldn’t solve the C++ bloat and complexity issue anymore…

Having said that, the first alternative is, I believe, a subset of your proposal. I suggest we do not touch the unsafe keyword, but we add the UnsafeData type and then use it in return position only. In this model, unsafe fn would be the equivalent of unsafe(pre) fn, and fn foo() -> UnsafeData would be the equivalent of unsafe(post) fn.

So, why is this good? Well, today many programmers associate unsafe fn with “unsafe to call because it does unsafe things”. This is a good first mental model, but not exactly accurate — it is partial towards assumed/unsafe preconditions. In particular, the intrinsic feeling about unsafe { some_unsafe_fn() } is that the unsafety is over once the function returns. The thinking could go like this:

  1. Oops, look, an unsafe function! I need to be prepared to call it correctly! *deep breath*
  2. Call!
  3. *phew* Finally, it’s over! I don’t have to think about it ever again!

However, this doesn’t take those effects into account which can manifest after the function returned. If, however, the function returned an UnsafeData<T>, this would be somewhat clearer.

(Eventually, if deemed really-really necessary, or a lot superior, fn foo() -> UnsafeData<T> could even be turned into fn foo() -> unsafe T or something, for syntactical symmetry…)

The idea for the second alternative comes from the safe/design-by-contract supporting variant of C# that I just mentioned, the implementation language of the Midori OS. If I recall correctly, instead of core language syntax, they extended annotations in order to denote preconditions and postconditions. Rust could do the same! Instead of adding more keywords (which also look really ugly in my humble opinion, by the way, but I digress), why don’t we just add two attributes as well? Something along the lines of:

#[precnd]
unsafe fn foo(arg: T) {
    …
}

#[postcnd]
unsafe fn foo() -> T {
    …
}

This would remain 100% machine-readable as well as human-readable, and since it’s an attribute, it would be much easier to extend in the future, without requiring further radical changes to the core language. It could even be used to encode additional pre-and postconditions, which are unrelated to unsafety, so I believe it is also much more general.

It also has the additional advantage of containing the good parts of both worlds, I think. Namely, I love how today’s simple one-keyword unsafe catches the reader’s attention, but doesn’t look uglier than e.g. pub fn — it is clear but not disturbing. Compare with Java’s public static final volatile synchronized iAlreadyLostTrackOfTheFuncName()… brrr!)

With this attribute-based approach, there would be a hierarchy: functions would be unsafe, that would be the first level; and the second level would be the attribute, which refines how they are unsafe. (This would also mean that it could be easier to implement in a backward-compatible manner. Once custom attributes and related improvements to the attribute system land, a compiler which doesn’t understand the attribute could just ignore it, or maybe have it “backported” as a no-op (?). But I went off on a tangent with this too much already…)

What do you think about the alternatives?

1 Like

Yes, I think this, using attributes to denotre the pre/post thing is a much more useful, extensible, and less disruptive direction.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.