List of major changes since this was first published:
- 2018-03-14 @ 22:06 CET -- Integrated feedback from @mark-i-m and @eternaleye
- 2018-03-15 @ 21:37 CET -- Integrated some of the feedback from @CAD97.
The many meanings of the "unsafe" keyword in Rust unncessarily reduce its ergonomics and learnability. That point is regularly brought up, two recent examples on this forum being "Ambiguity of “unsafe” causes confusion in understanding" and "What does unsafe
mean". Other threads of discussion also exist in other avenues, including RFC discussions and users forum posts.
So far, most discussions of this problem have stalled, and my understanding is that it is because they ended up with proposals which introduce excessive complexity or churn in the Rust ecosystem compared to their perceived usefulness. For this reason, I would like to propose here a purposely minimal and iterative syntax evolution, which I think would better fit Rust's stability-focused evolution process, while still fulfilling the intended goals of a clarification of unsafe abstraction semantics.
Current uses of the unsafe keyword
The unsafe keyword is used in several different circumstances, which I believe can be all narrowed down to two fundamental use cases:
- Unsafe blocks are regions of code where one can take actions that violate memory/type/thread safety, and becomes responsible for maintaining these properties.
- Unsafe abstractions are abstractions that rely on and/or provide safety-critical guarantees which are not (and usually cannot be) verified by the Rust compiler.
I believe that of those, unsafe blocks are by far the best-understood use case of unsafe in Rust, and can safely (pun intended) remain the way they are, whereas unsafe abstractions is an area that could use improvements. In the remainder of this post, I will thus focus the discussion on unsafe abstractions.
Much has been written about unsafe abstractions in the past, by illustrious minds such as @nikomatsakis and @withoutboats for example. I will not fully repeat these discussions here, but it is sufficient to state that they all revolve around the idea of an unsafe contract which is essential to memory, type and/or thread safety but cannot be checked by the compiler.
More specifically, I think that two ingredients from the theory of design-by-contract are critical to the correct building and understanding of unsafe abstractions:
- Preconditions are properties which the user of an abstraction must guarantee in order for the abstraction to behave as intended.
- Postcondition are properties which are guaranteed by the abstraction, and can be relied upon by the user of said abstraction.
The theory of design-by-contract also introduces the third notion of an invariant, which is not currently covered by this proposal, but can be introduced in the future if needed. This notion, and the motivation for not including it in this proposal, will be discussed further in the "Possible extensions" section.
High-level syntax proposal and transition plan
Coming from the above world view about the dual semantics of unsafe, two shortcomings of the existing unsafe abstraction syntax emerge immediately:
- It fails to strongly expose the notion of an underlying software contract, by being syntaxically too close to an unsafe block, whose semantics are merely "trust me".
- It does not mark a clear distinction between preconditions and postconditions, whose role in an unsafe codebase is however fundamentally different.
The way I propose to address these two shortcomings is to add two modifiers to the "unsafe" keyword, "pre" and "post", which can be used to clarify that the use of "unsafe" is about introducing unsafe preconditions and postconditions in abstractions. Using functions as an example, these modifiers could use the following syntax:
// This function is only safe if some input preconditions are met
unsafe(pre) fn i_trust_you(x: In) { /* ... */ }
// This function guarantees some safety-critical output postconditions
unsafe(post) fn you_trust_me() -> Out { /* ... */ }
// This function has both unsafe preconditions and unsafe postconditions
unsafe(pre, post) fn we_are_bound_contractually(x: In) -> Out { /* ... */ }
As I will detail further down these posts, these modified versions of unsafe
aimed at abstraction design will have slightly different semantics with respect to the current "unsafe-for-abstractions" syntax, leveraging the added pre-/postcondition clarification in order to more closely reflect the abstraction author's intent.
If that functionality is found to be useful, accepted and implemented, then a progressive rollout of the new syntax can easily be envisioned:
- Start by allowing usage of the modified forms of unsafe in abstractions
- In the next Rust edition, lint againts usage of un-modified "unsafe" in abstractions
- In the Rust edition after that, make it an error to use un-modified "unsafe"
Detailed design
Unsafe functions
As mentioned above, unsafe functions and methods can have unsafe preconditions, postconditions, or both. Adding these modifiers alters their semantics as follows:
unsafe(pre)
functions can cause unsafety if some manually specified preconditions are not met. Therefore, they can only be called from unsafe code blocks. This is the most widespread use of unsafe functions today.unsafe(post)
functions provide guarantees which can be relied upon by unsafe code. Therefore, they can only return to their caller inside of an unsafe code block. These functions can safely be used when collecting information that is to be passed to anunsafe(pre)
function, such as[T]::get_unchecked()
.unsafe(pre, post)
functions combine both of these semantics: they can cause unsafety if mishandled, and provide output which can be relied upon by unsafe code. A possible example would be a function which receives as a parameter a*const Vec<T>
which is assumed to be valid and returns its length, which is then used for unchecked indexing.
Contrary to current unsafe fns, these modified forms of unsafe do not implicitly make the body of the function an unsafe block. The rationale for this is that only the author of the function knows which parts of the function, if any, are unsafe.
In fact, since it is an API-breaking change to add preconditions, we could well imagine abstraction designers proactively adding unsafe preconditions to their functions, but not using them right away, instead starting with a safe implementation instead with the intent of moving to an unsafe one later on. No "accidental unsafety" should result from this API choice, and removing the unsafe precondition later on if it does not prove useful should not be a breaking change. And conversely, adding unsafe postconditions to an existing function should not be a breaking change for users either, as it only makes the function safer to use.
These last two statements reflect an important rule of design-by-contract, which is the Liskov Substitution Principle. That principle can be spelled out in natural language as follows:
It is okay for an abstraction's implementation to require weaker preconditions than advertised on the outside, or to guarantee stronger postconditions than advertised on the outside. But requiring stronger preconditions or guaranteeing weaker postconditions than advertised is a violation of the abstraction's contract, and changing a published abstraction's contract in that direction is thus a breaking API change.
Unsafe traits
Unsafe traits go one step beyond unsafe functions by defining a contract which any implementation of that trait must uphold. Unsafe traits are matched with unsafe impls using the same unsafe modifiers, which are essentially a legally binding statement from the person that is implementing the trait:
I have read and understood the unsafe preconditions and postconditions of this trait. In accordance to the Liskov Substitution Principle, I promise not to rely on any further unsafe precondition, nor to violate any expected unsafe postcondition. I am allowed, however, to have weaker preconditions or stronger postconditions.
Like unsafe functions, implementations of an unsafe traits follow an unsafe contract: anyone who uses an implementation of an unsafe trait that has preconditions must follow its preconditions or face memory/type/thread unsafety. And anyone who uses an implementation of an unsafe trait that has postconditions can rely on them in safety-critical code.
However, this raises the immediate question of what "using a trait" means. As will be discussed in the "Unresolved questions" section, this nut actually proved harder to crack than expected. But after multiple iterations around this concept, the author of this Pre-RFC is now leaning towards the following definition:
A trait is used by adding that trait's name to a list of trait bounds.
This definition matches the three existing examples of stabilized unsafe traits in the Rust standard library, namely TrustedLen, Send, and Sync, along with a fourth trait which also has safety-critical postconditions, but is not unsafe because a Rust programmer is not allowed by the compiler to implement it incorrectly: Copy.
All of these traits have the following properties in common:
- They are all marker traits, which do not provide any methods, and are used by being added to a set of trait bounds in generic code.
- They all guarantee properties which unsafe code is allowed to rely upon, and which could be violated by a careless implementor, resulting in unsafety.
- It is always safe to use them, they do not have safety-critical preconditions.
From these observations, I would tentatively reach the following conclusions:
- All of these traits should be marked as
unsafe(post)
. - Implementing them should require an
unsafe(post)
orunsafe
marker (see below), as a statement by the implementor that the trait's postconditions have been upheld. - We do not need
unsafe(pre)
traits at this point in time, onlyunsafe(post)
ones, and the first implementation of this pre-RFC should thus only provide the latter.
Here are some syntax mockups of what unsafe(post)
traits could look like:
unsafe(post) trait SafetyCriticalProperty {}
// Syntax option 1: Impls also use the "post" modifier. The rationale
// for this syntax is that it is consistent with the
// trait's declaration.
unsafe(post) impl SafetyCriticalProperty for MySafeType {}
// Syntax option 2: Impls do not use the "post" modifier. The rationale
// for this syntax is that it is consistent with unsafe
// blocks, which are about _doing_ something unsafe, and
// we are indeed doing something unsafe by implementing
// an unsafe(post) trait for a type.
unsafe impl SafetyCriticalProperty for MySafeType {}
Finally, like any trait, unsafe(post)
traits can have methods. These methods can rely on the host trait's postconditions, but if they play a critical role in ensuring said postconditions, then they should be marked as unsafe(post)
as well to clarify this point to implementors and users. Therefore, and in contrast with previous versions of this proposal, I am not proposing anymore that methods of an unsafe(post)
trait be implicitly marked as unsafe(post)
.
Unsafe associated methods
Like all methods, trait's associated methods can be unsafe(pre)
or unsafe(post)
. In the context of trait definitions, this sets a precondition which every implementation of that method can rely on, or a postcondition which every implementation of that method must guarantee.
For example, let us assume that we lived in an alternate universe where the way in which TrustedLen
modifies the semantics of Iterator::size_hint()
by adding an unsafe postcondition to it would have been frowned upon as "spooky unsafe action at a distance". How could we have handled that differently? One possibility would have been to explicitly add a new Iterator method which upholds the required unsafe contract:
// There is really nothing unsafe about implementing this trait...
trait TrustedLen: Iterator {
// ...as long as this dangerous method is correctly implemented.
// (Because it is dangerous, we need an unsafe block to do so)
unsafe(post) fn len(&self) -> usize;
// Unlike len(), this method promises nothing and is thus not unsafe.
// Reimplementing it as (len % 3) == 0, although evil, should thus not
// mislead TrustedLen users into performing unsafe actions.
fn len_is_even(&self) -> bool { (len % 2) == 0 }
}
Possible extensions
Unsafe invariants
In addition to preconditions and postconditions, design-by-contract introduces a third concept called invariants. Invariants are properties which are expected to remain true throughout the entire duration for which an abstraction is used.
Contrary to what was stated by a previous version of this proposal, invariants cannot be fully reduced to a precondition/postcondition pair which applies to each abstraction entry point, because such a reduction only makes a statement about properties which must hold true before and after the time where an abstraction is used, and not during that time. The difference between an invariant and a precondition/postcondition pair can be observed if an abstraction is concurrently used by multiple pieces of code, for example in the case where a Sync type with unsafe methods is shared by multiple threads.
I do not currently see a benefit in writing down the notion of an unsafe invariant in Rust code. It does not seem to introduce new constraints on the way rustc should process unsafe abstractions with respect to unsafe(pre, post)
, because it makes a statement about properties which must be true during the time where an abstraction is used, and Rust has no syntax for that. So in my view, this can probably remain a concept that is solely exposed through manually written documentation, at least initially.
However, should I be proven wrong, the proposed syntax could naturally be extended to encompass this concept by adding an unsafe(invariant)
modifier, which implies unsafe(pre, post)
and extends it with a third notion of the unsafe property's validity extending throughout the entire duration for which the unsafe abstraction is used.
Unsafe types
Much like unsafe functions can exist in isolation, one could envision allowing for "unsafe types", which follow an unsafe contract like an implementation of an unsafe trait.
// No matter how the definition of this type changes in the future,
// we guarantee that it will be forever possible to transmute pointers
// to Sized types into this type.
unsafe(post) struct OpaqueSizedPtr(usize)
In this case, every method of an unsafe(pre|post)
type should probably be implicitly declared unsafe(pre|post)
, because is subjected to the type-wide preconditions and postconditions.
However, I would advise against requiring unsafe(pre|post)
on the impl blocks of unsafe types for two reasons:
- Not every type definition comes with an impl block. For example,
type A = B
doesn't. - Impl blocks are allowed to target multiple types in Rust, and authors of blanket impls should not need to consider whether their impls may cover unsafe types or not.
Because of this required syntax inconsistency, and because a need for unsafe types has not emerged so far in Rust's history, I would personally be against allowing for unsafe types unless a strong rationale for them is provided.
At the same time, however, I can see how they would "close the loop" of unsafe abstractions by allowing every Rust abstraction to have an unsafe contract attached to it. So I could be convinced that they are a worthy addition. Should they turn out to be wanted, the above mockup should be enough proof that the proposed syntax is flexible enough to allow for them.
unsafe(pre)
traits
Although that could be considered an overall improvement to its consistency, this proposal does not suggest allowing for unsafe(pre)
traits initially, for the following reasons:
- The need for them has never arisen so far.
- They would have remarkably confusing syntax and semantics.
- Their simplest use cases seem to be fulfilled by
unsafe(post)
trait bounds.
According to the definition that was given above, an unsafe(pre)
trait would be a trait for which the very action of adding that trait as a trait bound to generic code could cause memory/type/thread-unsafety if some safety-critical preconditions are not met.
It is very hard to envision a situation in which this would be the case. And in fact, we do not even have syntax for this as of today (unsafe markers in trait bounds, anyone?). What we do have today, however, is a way for a trait to rely on safety-critical preconditions from the underlying type, and that is actually simple: just put a trait bound on an unsafe(post)
trait which guarantees said preconditions as a postcondition:
trait ImplMustBeSync: Sync {}
Given these considerations, I cannot think of any circumstance in which we would want to add unsafe(pre)
traits to our unsafe abstraction vocabulary. But should they turn out to be actually needed someday, the syntax for them can again easily be made available.
Extended static analysis of unsafe contracts
The clarification of unsafe contracts that is proposed by this pre-RFC could be leveraged to build static code analyzers which lint some suspicious constructs in unsafe code.
The simplest form of analysis that could be done would be to assert that the documentation of an unsafe abstraction features a "Safety" section, in which abstraction authors are expected to precisely state the unsafe preconditions and postconditions which the abstraction builds upon.
Going one step further, more elaborate static analysis could also be used to assert that any input which is fed into an unsafe(pre)
abstraction emerges from an unsafe
block or from an unsafe(post)
abstraction. Following this discipline would arguably smooth out the interaction between unsafe blocks and unsafe(post)
contracts. However, there are some roadblocks that would need to be overcome before this dream may come true:
- There should be a way to annotate individual inputs of unsafe abstractions in order to clarify which of these inputs are concerned by the unsafe preconditions and which aren't. Otherwise, false-positives would emerge from incorrect linting of information that is passed into non-safety-critical inputs.
- Unsafe traits like
TrustedLen
which turn a non-unsafe(post)
function into anunsafe(post)
one would need to be deprecated, as these would require manual rule injections in the static analyzer in order to be properly handled.
Unresolved questions
Exact meaning of "using a trait"
During the development of this pre-RFC, pinning a precise meaning on the expression "using a trait" has turned out to be surprisingly difficult. Successive versions of this RFC started by associating using a trait to using its public interfaces (associated fns/consts/types/fields...), then begun to acknowledge the importance of trait bounds, before the current definition of "using a trait is adding it to a set of trait bounds" was proposed.
The difficulty of pinpointing this meaning should be considered by reviewers of this pre-RFC as a warning sign that this part of the pre-RFC is not very solid yet, and reviewers are thus strongly encouraged to investigate this matter on their own, come up with alternate definitions of this sentence, and propose them to the author. A more refined understanding of traits as an unsafe abstraction will likely result from this process.
Prior art
Design-by-contract has a long history. Its notions were first popularized by Eiffel, and then remained with us throughout the subsequent history of programming, being for example used in many modern formulations of the Liskov Substitution Principle.
One particular use of design-by-contract that was strongly influential on the author of this proposal was the integration of contracts in the Ada language as part of its Ada 2012 revision, and its justification in the accompanying design rationale document from John Barnes.
Formalizing Rust's unsafe code in terms of software contracts also has a long fruitful history, of which the links above only identify the most recent events. Previous discussions have already let to proposals of introducing more contract-centric semantics to unsafe abstraction building blocks, though none of these made it to the implementation stage so far.
Alternatives
- Do nothing. Unsafe abstractions will remain confusing to implement and review, making unsafe Rust hard to learn and causing memory, type and thread safety issues that could have been avoided by improving unsafe code ergonomics.
- Turn this minimal proposal into something bigger, such as a fully generic effects system or an in-depth integration of design-by-contract into the Rust language. Prior discussions have shown that this is not perceived to be a reasonable price for a clarification of Rust's unsafe abstraction semantics, and is likely to be rejected. Therefore, this path seems to eventually lead to the "do nothing" alternative.
Conclusion
This pre-RFC proposes some syntax and semantics additions which have, I think, the potential to make the "unsafe" keyword much less confusing when used for the purpose of building unsafe abstractions.
The proposed changes are designed to be minimal, and to allow for coexistence with existing code and iterative migration of said code. No existing semantic of "unsafe" is changed, all changes are purely additive and only meant to be ultimately enforced via the progressive linting process that was also discussed in the context of dyn Trait
.
By implementing these changes, we can cleanly separate the currently confusing use of unsafe into three orthogonal concepts:
- Unsafe code, which is allowed to break safety and responsible for not doing so
- Unsafe preconditions, which must be upheld to guarantee the safety of an abstraction
- Unsafe postconditions, which unsafe code is allowed to rely upon from an abstraction
Only the two latter uses of unsafe require (long-term) syntax and semantics changes in this proposal, and I believe that they are the minority when it comes to use of unsafe in current code, with unsafe blocks dominating by far. Moreover, the proposed semantics changes generally go towards expressing concepts which unsafe abstraction authors could generally only state in more confusing ways before. So I think that they should be received positively overall.
By clarifying the meaning of "unsafe" in an abstraction context, new forms of unsafe abstractions are also naturally enabled (such as freestanding unsafe(post) functions), which should naturally go to enrich the abstraction vocabulary of the unsafe Rust programmer.
So, what do you all think about this idea? Is there something important which I overlooked? A problem which I did not envision? Something that I should change? Or should I proceed towards turning this idea into an actual RFC?