Note that those 2 things are not incompatible. You can have both:
A way to indicate operations with manually-proved requirements.
A way to indicate the scope that may be used to prove those requirements.
The first already exists, that's the unsafe operations: unsafe block and unsafe impl, possibly with this proposal unsafe call (and I would hope unsafe operations like *unsafe ptr dereferencing a raw pointer or x.unsafe a accessing a union field).
The second could simply be an item attribute #[unsafe(scope)] that indicates item definitions on which unsafe code relies. You would usually put this on a module, but it could also be a function for local unsafe usage or a crate for more global ones. Today, this unsafe scope is implicit, and simply the smallest public item containing the unsafe blocks. Making it explicit has no impact on the language semantics, it's really just documentation, but using an attribute for tooling purposes.
You could even have this attribute on blocks.
#[unsafe(scope)]
{
let x = 42;
// SAFETY: foo expects to be called with 42 and x happens to be 42.
unsafe { foo(x) };
}
The rationale for that is to encourage writing safe code rather than unsafe. But the proposal also encourages less unsafe code by making it easier to minimize the scope of unsafe code.
I don't think that's necessarily required. I think it's OK to have a single safety comment for two unsafe operations if their requirements are related:
// SAFETY: Both indexing operations are in range because n + 1 < length.
let a = slice.unsafe get_unchecked(n);
let b = slice.unsafe get_unchecked(n+1);
Wouldn't this philosophy imply that any data structure that maintains invariants has to put all its mutating code in unsafe blocks? That's because someplace somewhere you might use those invariants for soundness, now or in the future.
Suppose you have a "small number" data type:
/// A number smaller than 10.
pub struct Small(u8);
Now you have a bunch of public functions that maintain this invariant:
If you say "no, as long as you're not relying on this for soundness" then that doesn't really work because the user might be relying on this for soundness, you can't know.
Or you might add a function in the future that relies on the invariant for soundness:
impl Small {
pub fn index_into_array(&self, a: &[u32; 10]) -> u32 {
let index = usize::from(self.0);
// SAFETY: The number is smaller than 10 by the invariant of the type.
*unsafe { a.get_unchecked(index) }
}
}
After you add this function, do you now have to go and add unsafe to the implementations of all the mutating functions?
I don't see how to apply this philosophy consistently without making all code unsafe (or at least a large portion of it).
The other philosophy:
boils down to: unsafe is for calling and exposing external APIs with soundness conditions, not for internally maintaining invariants. That to me is clear and unambiguous, you just have to decide where the abstraction boundary is.
In my post, I was trying to be as general as possible. Also, these are my observations rather than my opinions[1]
In such a simple case, yes, but if there are several operations between them, less so.
It depends on whether it's a soundness invariant or not. That should be decided and documented by the author and it is part of the API contract.
The standard library is full of cases like these:
A broken Ord implementation breaks a BTreeMap, but it will not cause UB. Thus it is not a soundness invariant.
It is a logic error for a key to be modified in such a way that the key’s ordering relative to any other key, as determined by the Ord trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.
A str being invalid UTF-8 on the other hand is allowed to cause UB, so that is a soundness invariant. Indeed functions that mutate string slices use unsafe.
Rust libraries may assume that string slices are always valid UTF-8.
Constructing a non-UTF-8 string slice is not immediate undefined behavior, but any function called on a string slice may assume that it is valid UTF-8, which means that a non-UTF-8 string slice can lead to undefined behavior down the road.
...unsafe fields would make this decision more explicit, and it would indeed require mutation of the field to use unsafe.
I would like to have unsafe fields, because they do make this more consistent.
I'm not sure what you mean by internal and external here. I do mark internal (private) functions that need it unsafe too (I think everyone should). If by internally, you mean "in the same function", then I agree with you.
My opinion is closer to 1, I usually write my unsafe to be as small as possible. ↩︎
The boundary is the module or crate that has access to the representation of some abstract entity vs the rest of the world that doesn't.
Example:
// Represents numbers smaller than 10.
pub struct Small(u8);
impl Small {
// External API, no preconditions, no need for unsafe.
pub fn assign_3(&mut self) {
// This code maintains internal invariants.
// No need for an unsafe block.
self.0 = 3;
}
// External API.
// Precondition: x < 10.
// Must be marked unsafe so that external callers are forced to
// check the precondition we are requesting of them.
pub unsafe fn assign_unchecked(&mut self, x: u8) {
// No need for an unsafe block here for the same reason as before.
// We're satisfying an internal invariant.
self.0 = x;
}
}
In another crate:
fn foo(x: &mut Small) {
// Must be an `unsafe` block because we're calling an external API
// with a precondition.
// SAFETY: 7 < 10.
unsafe { x.assign_unchecked(7); }
}
That's inconsistent with philosophy 1. Functions private to the module that implements a safe abstraction never need it.
I think it's important to look at why it's common.
My hypothesis would be that it's because it's the only compiler-supported way to make it difficult to accidentally introduce another unsafe operation without proving it safe, as opposed to large blocks where adding more unsafe stuff just keeps compiling because it's in the big block.
So I'd rather see something that address that problem rather than just add something to help make the blocks even smaller.
Philosophy 1. unsafe indicates API contracts (from both sides) whose violation leads to UB.
Philosophy 2. unsafe blocks delineate code whose correctness is required for soundness.
1 leads to small unsafe blocks and is supported by this proposal.
2 leads to large unsafe blocks.
I've explained why I don't apply philosophy 2 myself: I don't see how it can be made consistent without encompassing a very large amount of code in unsafe blocks, which nobody does. Doing this would trigger the unused_unsafe lint that is warn by default. Bugs in a lot of code, including in much library code, can potentially break soundness. So this doesn't seem like a useful criterion for what unsafe is supposed to mean.
This is Rust's philosophy for sure, and it affects many situations where the unsafe operation may pay a syntactical tax. However it might not have the intended effect. Making low level code noisy and harder to read may make it harder to audit for safety bugs.
That's actually an appeal of Zig for some - if your code is mostly unsafe anyway you may prefer to make it less cluttered, for the benefit of code reviewers.
For me I just wished that rust-analyzer highlighted unsafe operations differently. We could have coarser unsafe blocks, and inside those blocks have each individual unsafe operation in bold or something like this.
As has been pointed out above, what is unsafe in calling an unsafe function is not the invocation itself but the computation of the function arguments — these need to uphold invariants that are inaccessible to the Rust compiler. Therefore, I find it counterproductive to focus on the syntax for function or method calls. The necessity for deeper scrutiny during review does not even include the point of that call, strictly speaking.
But it does. If you're looking for potential UB, you start from all the unsafe calls and go from there to figure out what else needs to happen for the preconditions to be satisfied.
I feel like the "small unsafe blocks" vs. "large unsafe blocks" argument is becoming circular. However ...
... this seems like a legitimate problem that (I hope) everyone can agree is not well addressed by the current language. It seems to me that when an unsafe function takes a callback, it's more likely than not that the callback shouldn't be treated as within the unsafe block for the call to the unsafe function. But, in the current language, if you're using a closure for the callback, lexically, it is.
Like the other thread about "safe inside unsafe" and macros, I think I'd rather see this addressed by a change to default semantics, than by changing how people are expected to write code. What if we "just" changed things so that the effects of unsafe { ... }don't inherit to the body of any lambda expression inside the unsafe block? How much existing code would that break? It should be machine fixable when it does break.