- Feature Name:
unsafe_reasons
- Start Date: 2024-12-30
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
Summary
Add a way to check at compile time that all unsafe assumptions has been met.
Motivation
Today, unsafe code is clearly separated in Rust, which is a good thing. However, when writing unsafe code, one cannot easily verify they do not forget some assumption that is required for the code to not be undefined behavior. The only way to check assumptions is the documentation of the unsafe functions you call, but while this is always necessary you can still easily forget some condition.
Worse, if a function's preconditions are updated (for example, in a breaking dependency update), your code will still compile, even though it may not meet the new preconditions. And you might get undefined behavior at runtime, which we by all means want to prevent.
Guide-level explanation
When writing unsafe functions, the unsafe
keyword can optionally accept a list of reasons. They are enclosed in parentheses, like the following:
unsafe(reason1, reason2) fn my_unsafe_fn() { /* ... */ }
When calling unsafe functions, you can optionally supply a list of reasons. The list of reasons must match the list on the function's definition. For example, you can call the above function like the following:
unsafe {
my_unsafe_fn().unsafe(reason1, reason2);
}
If you miss a reason, the compiler produces an error:
error[E0000]: unsafe reason `reason2` does not appear in the call
--> src/main.rs:4:5
|
4 | my_unsafe_fn().unsafe(reason1);
| ^^^^^^^^^ provided reasons list
|
= note: missing unsafe reason may mean you forgot to handle a precondition of the function
= note: consult the function's documentation for information on how to avoid undefined behavior
You can continue declaring and calling unsafe functions without reasons. If you have an unsafe function with reasons that you call without reasons, the compiler won't complain (it might lint/error in a future edition).
Unsafe reasons are supposed to help you make sure you don't forget any precondition. An unsafe function should include all preconditions it has in its reasons list. For example, the following function:
pub unsafe fn to_bool(v: u8) -> bool {
unsafe { std::mem::transmute(v) }
}
Should be written like the following (of course you can change the reason name):
pub unsafe(v_is_0_or_1) fn to_bool(v: u8) -> bool {
unsafe { std::mem::transmute(v) }
}
Of course, transmute()
has preconditions - and therefore eventually will include reasons - too, so if you want to be more safe you can specify them (imaginary reasons, see unresolved questions):
pub unsafe(v_is_0_or_1) fn to_bool(v: u8) -> bool {
// SAFETY: `u8` doesn't have uninit bytes, and the only invariant of `bool` (besides no uninit)
// is that it's 0 or 1, which is our precondition.
unsafe { std::mem::transmute(v).unsafe(no_unmatching_uninit, invariants_kept) }
}
For maximal safety, the list of reasons on function declaration should closely match the unsafe function's safety documentation, and the justifications for the reasons on calls should be explained in the // SAFETY
comment.
The compiler provides an optional lint, unsafe_without_reasons
, that will be fired on calls to unsafe functions without reasons provided, if the function has reasons in its declaration. The lint is allow by default (but this may change in an edition).
Reference-level explanation
The syntax of function declarations is extended like the following:
Function :
FunctionQualifiers fn IDENTIFIER GenericParams?
( FunctionParameters? )
FunctionReturnType? WhereClause?
( BlockExpression | ; )
FunctionQualifiers :
const? async? ItemSafety? (extern Abi?)?
ItemSafety :
- safe | unsafe
+ safe | unsafe DeclarationUnsafeReasons?
+ DeclarationUnsafeReasons :
+ '(' IDENTIFIER ( , IDENTIFIER )* ')'
Call syntax is extended like the following:
+ CallUnsafeReasons :
+ '.' 'unsafe' '(' IDENTIFIER ( , IDENTIFIER )* ')'
CallExpression :
- Expression ( CallParams? )
+ Expression ( CallParams? ) CallUnsafeReasons?
MethodCallExpression :
- Expression . PathExprSegment ( CallParams? )
+ Expression . PathExprSegment ( CallParams? ) CallUnsafeReasons?
When calling a not-unsafe function, it is an error to provide unsafe reasons.
When calling unsafe functions:
- If the function does not have reasons declared, calling with reasons will cause an error. This is done so adding reasons to an existing function that was without reasons won't be a breaking change.
- If the function does have reasons, and there is a reason from the declaration that is absent in the call, the compiler will emit an error.
- If there are unnecessary reasons, either in declaration or in call (duplicate or reasons that are not declared in the function declaration), the compiler won't err but will lint with the new lint
unnecessary_unsafe_reasons
. - If the function is declared with reasons but no reasons are provided in the call, the compiler will emit an allow-by-default lint
unsafe_without_reasons
. Users that want to make sure they have reasons coverage can turn this lint on. This is done so that adding reasons to a function that didn't have reasons won't be a breaking change. In a future edition this lint might be promoted.
Despite the fact that the reasons call syntax includes the unsafe
keyword, it does not replace the role of an unsafe block. Calling an unsafe function without unsafe block will remain an error, even if reasons are provided (but see Rationale and alternatives).
The type for an unsafe function with reasons (its FnDef
) contains the reasons (so the compiler can track them where it's called), but function pointers do not have reasons. This is based on the assumption that if someone does something that involves multiple unsafe functions (otherwise they wouldn't need fn pointers), their reasons can be different enough that it will be hard for the compiler to unify them, and the programmer already has greater scrutiny on such "general (or somewhat general) precondition" calls. So when coercing FnDef
s to fn pointers to unsafe reasons are lost.
Drawbacks
The first drawback is always that it complicates the language.
The second is that this makes the syntax for calling unsafe functions more heavy. While reasons are optional, we do encourage people to put reasons in their unsafe code. For that reason I believe that while reason names need to be clear, they also need to be short and it's okay if what a reason means is only clear for the documentation: as said above users are encouraged to accompany reasons with // SAFETY
comments. Reasons are mostly there to check preconditions weren't forgotten.
Rationale and alternatives
The most important question when it comes to alternative is where we put the reasons in the call. This RFC proposes to put them in the call, but an alternative is to put them on the unsafe block (unsafe(reason1, reason2) { ... }
). This the major advantage that this is less heavy, particularly because it does not require one to repeat the unsafe
keyword and attach the reasons to each call. However, this also has major disadvantages: first, some users like to have big unsafe blocks that cover the entire area that is infected by them (including safe code that the unsafe code relies on), and for them putting the reasons on the unsafe block will mean it'll be far from the call. Second, when you make unsafe blocks that contain multiple calls it may not be clear which reason belongs to which call. But the worst thing is that if we take this approach, adding reasons to a function will become a breaking change, because the user may already have reasons for other calls in the same unsafe block. This will mean libraries will be hesitant to adopt unsafe reasons, and even libstd will be forced to do it at the same time for all functions in libstd at the same time as stabilization of unsafe reasons, or never do it.
Another alternative is taking the opposite direction, and make call with reasons to not need an unsafe block. I'm avoiding this change because the community has been split in the past about whether adding more fine-grained unsafe control is a good idea, but it's definitely something to consider.
Of course there is also the alternative of doing nothing, but I believe the advantages for unsafe code writers outweigh the disadvantages, especially since unsafe code is already niche and should already have greater scrutiny. This also cannot be done by a macro due to the need of semantic information about the called function.
We can make fn pointers type with unsafe reasons (that coerces to one without). The reason I do not propose this, as outlined in the Reference-level explanation, is that I assume that when people use unsafe fn pointers their preconditions are different enough that the compiler won't be able to unify them.
Prior art
There are few languages that have a safe/unsafe model (e.g. C#), but I'm not aware of any language that implements the proposed feature.
Inside the Rust community I believe this idea was thrown somewhen, but I don't think it was discussed a lot.
A somewhat related discussion is whether unsafe should be more granular - that was discussed e.g. here and here.
Unresolved questions
What should be the syntax for reasons on call?
How precise should reasons be? This applies both as a general advice and to stdlib functions. For example, the documentation for std::ptr::read()
requires that src
is properly aligned and points to a properly initialized value, is not null, is dereferenceable, and is non-racing. Should they all be separate reasons, or should we group some, or perhaps omit some uncommon ones (e.g. that the read is non-racing)?
Should we have a lint for functions declared without unsafe reasons? Some users may want to make sure they don't forget to add them, but on the other hand for some (like FFI, where everything is unsafe) reasons don't help much. It can be an allow-by-default-lint, though.
Future possibilities
One future possibility mentioned above is to make the unsafe_without_reasons
lint warn-by-default (or stronger) in an edition.