[Pre-RFC] Single function call `unsafe`

Motivation

The typical purpose of an unsafe block is to call a single unsafe function. There is no clean notation for this.

Wrapping the call in an unsafe block has the downside that it also makes the evaluation of parameters unsafe. So this is not a minimal unsafe block:

// SAFETY: foo is safe because ...
let x = unsafe { foo(expr1(), expr2()) };

What you typically really want is only for the single foo call to be unsafe, not expr1 and expr2.

Proposal

The unsafe keyword in front of a function call indicates the single function call is unsafe. Parameters are still evaluated safely.

// Only the `foo` call is unsafe.
let x = unsafe foo(expr1(), expr2());

It works with method calls and call chains:

let x = a
    .abc()
    .unsafe def()
    .xyz()
    .unsafe ghi();

This syntax is slightly ambiguous:

let x = unsafe foo()(); // which call is unsafe?

This can be resolved by precedence rules, i.e. only the first call is unsafe.

Alternative postfix notation

To avoid parsing ambiguity postfix notation could be used:

// foo is unsafe
let x = foo.unsafe(expr1(), expr2());

This also works with method calls and call chains:

let x = a
    .abc()
    .def.unsafe()
    .xyz()
    .ghi.unsafe();

Edit: Made the prefix syntax the primary proposal.

12 Likes

What about having the arguments be variables? The advantage is that it works today. The disadvantage is that you have to find good names. So far I've done this and I didn't feel the need for something else.

let arg1 = expr1();
let arg2 = expr2();
// SAFETY: foo is safe on arg1 and arg2
let x = unsafe { foo(arg1, arg2) };
1 Like

I've been using this workaround but it's annoying. In practice many other people don't bother because it's so verbose. One is forced to choose between conciseness and safety. This also doesn't work well with call chains. I feel that the addition of .unsafe is small enough to be worth it because it targets a common pattern.

Yes, I agree it can feel annoying. But the question is whether the pros outweigh the cons. For example, by explicitly naming the arguments, one can talk about them in the safety comment, which is actually convenient.

I would argue that if you have call chains, the safety comments might get messy. It's probably better to have at most one unsafe block (or call) per statement, such that safety comments apply to a given statement.

Couldn't this just be defined to be the innermost one, i.e. the unsafe _ () operator has higher precedence than the _ () operator?

Another option would be disallow it, since the two meanings can be expressed as (unsafe foo())() and unsafe (foo())().

Honestly to me this syntax looks much better than the proposed one.

Doing this with let has different semantics wrt temporaries, you'll have to use match expr1 { arg1 => ... } instead if you want to be fully generic, which is however much more verbose.

3 Likes

Good point.

Also, don't get me wrong, I think having unsafe on unsafe operations directly, the same as for unsafe impl, is definitely good. I'm just not a big fan of additional syntax. From the suggested one, the postfix is obviously better because it's on the application syntax node. It could also be foo unsafe(args, ...) without the dot. The same would be needed for unsafe operations like *unsafe ptr, etc. But I still believe that the problematic of safety comments will be present. If you have multiple unsafe operations in the same statement, the documentation might get messy.

The unsafe block already allows you to have multiple unsafe operations in a single statement. If you want to do that, the proposal makes it easier to mark both explicitly, which can only help documenting safety.

I personally prefer the prefixed unsafe foo. It’s almost as if the unsafe prefix became part of the name of the function, and reminds me of common practice in Haskell, where unsafe API is done (with no language support) by unsafe_ prefixing the function names.[1]

Regarding ambiguity, I don’t think foo()() is common enough to be of concern. To avoid confusion, one can simply enforce restrictions against it. Perhaps comparable to how x.foo() can’t be a call of an FnOnce()-typed field of x called foo, requiring (x.foo)() instead.

Conservatively, one could disallow unsafe foo()(); then (unsafe foo())() would be a valid alternative for explicitly calling the inner function unsafely, and the outer one needs not be supported at all; requiring the workaround of let f = foo(); unsafe f(); is fine for now.[2]

For chains, I see no problems at all, either. What’s wrong with the following syntax?

let x = a
    .abc()
    .unsafe def()
    .xyz()
    .unsafe ghi();

Regarding SAFETY comments, IMO that’s a separate/orthogonal concern – it’s also a question of style. SAFETY comments are underused, anyway; unsafe blocks often simultaneously too large and too small. Too small to contain a whole section of code where “all enforcement of invariants is contained within them” and too large to keep the effort of “find all the operations with preconditions” easily enough in review.


  1. I say “unsafe_” for the Rust audience, but there’s not really an underscore because lowerCamelCase is common practice ↩︎

  2. Allowing unsafe (foo())() is confusing IMO, because it’s too close to unsafe { foo() }() but with the opposite meaning! On the other hand, it’s of course still useful to explore this question a bit – eventually – in the context of discussing future possibilities. ↩︎

2 Likes

Yes, it's partly orthogonal. The proposal is to have the unsafe keyword on the syntax node that needs it, rather than being a scope over a syntax tree. It is already possible to follow this style today by moving the children of a syntax node outside its syntax tree (e.g. using let before and possibly drop after). This style is useful for safety comments, by having statements with exactly one operation, that operation being unsafe, and that statement being commented. But it's true that you don't need this style because of safety comments. So it goes only one way, hence partly orthogonal. I still believe that to motivate this change, the impact (ideally positive) on safety comments needs to be considered.

True.

I don't think unsafe blocks can be too small. That's actually the point of this proposal, to have them as small as possible, targeting exactly one syntax node.

The "enforcement of invariants" part is just the scope of unsafe. Unsafe blocks should not be used to define the scope of unsafe. It is implicitly defined by the set of public functions that cover the unsafe block.

Do note that this can become pretty bad with closure callbacks. If you have a foo(bar, |x| { …code… }) kind of API, it’s hard on readability to move the closure out of the call expression, and if the closure involves generic lifetimes, it might be impossible without significant additional effort to define the lifetimes relationships in a different manner.

Even without lifetimes, it’s just natural that users wouldn’t care to change their code like this; much realistically, you will end up with users that wrap the whole call in unsafe and call it a day. Working around this with … creative … API design is also a hassle, and somewhat unsatisfying. (E.g. in this discussion such an issue came up; and a possible workaround I’ve mentioned there would be an API in the style of foo(unsafe { foo_safety_marker() }, bar, |x| { …code… }) – but IMO unsafe foo(bar, |x| { …code… }) would be much, much nicer.)

3 Likes

This works OK today with clippy::multiple_unsafe_ops_per_block enabled. Although I see that not everyone uses that lint and having this feature would encourage minimal unsafe even by those who don't use the lint.

This is a great selling point IMO.

I have edited the proposal to make the prefix syntax the primary proposal because I also feel it looks much nicer than the postfix .unsafe notation, and I've been convinced the syntactic ambiguity can easily be resolved by precedence rules.

The postfix notation is still there as an alternative.

1 Like

You can talk about more complicated expressions in a safety comment:

// SAFETY: `n + 1` is in range because `n + 1 < ...`
let x = 2 * slice.unsafe get_unchecked(n + 1);

vs

let index = n + 1;
// SAFETY: `index` is in range because `index = n + 1 < ...`.
let x = 2 * unsafe { slice.get_unchecked(index) };

To me this is unnecessary, and even a move in a wrong direction.

It's very rare that a single call by itself encapsulates all of the unsafety. Unsafety usually depends on function arguments, on various preconditions, and may also need keeping an eye on destructors and panics. I would prefer unsafe blocks to be larger to cover all of these concerns together, because the entire context must be checked for safety.

If you're calling something like get_unchecked(index), computation of the index is the actual unsafe thing. I'd prefer index calculation to be added to the unsafe block, not the block made even slimmer and more incomplete.

4 Likes

Actually, I think the latter could lead to confusion. Unsafe function calls shouldn't support arbitrary expressions, only an identifier followed by parentheses:

// Allowed:
unsafe foo(..)
foo.unsafe bar(..)

// NOT allowed:
unsafe (foo())()
unsafe { foo }() // parses as a regular unsafe {} block instead
unsafe foo.bar()
unsafe foo?()
// etc.
1 Like

Agreed. What I want help with is making sure I satisfied all the preconditions, not just knowing which things have preconditions.

I'll re-use this previous post about it:

(If we can write proofs of predicates, then all these individual unsafes become unnecessary.)

There are multiple competing concerns being served by unsafe. The one you mention is potentially well served by just regular blocks and comments. I don't know how this will shake out, but I would like to be able to write code along the lines of:

// SAFETY: we're going to get all the right inputs for `super_unsafe`
{
    // We need to add 2 for reasons.
    let a = safe_thing() + 2;
    let b = also_safe(*p);
    let c = unsafe super_unsafe(a, b);
    more_safe_things(c);
}

To me a large unsafe block is making that harder to review, but structuring it and documenting it nicely certainly does help.

6 Likes

It's also common that soundness requirements extend beyond a single function, so clearly you can't expect that one unsafe block contains all the code that interacts with these requirements because blocks are restricted to a single function. "Unsafe blocks" and "safety abstraction boundary" are two different concepts by necessity. Often the whole module or the whole crate is the safety abstraction boundary.

Your preferred style may be valid, but surely the "minimal unsafe" style is also valid and common, so I'm trying to cater to it. It's definitely what I do in my code.

4 Likes

unsafe does not have the purpose of providing a nice syntax, quite the opposite: it is supposed to stick out, be an eyesore and inconvenient. Encountering unsafe during a review is supposed to stop you right there and go “what the heck is going on here?” The result hopefully is that after close scrutiny the response is “ah, right, okay.”

So, as a bystander on the peanut gallery this whole discussion feels like it started from the wrong premise.

2 Likes

Before we go much further into this discussion again, let me summarize the way I see this:

There seems to be two[1], quite contradictory approaches to unsafe blocks. Both of them has valid arguments for it, and there seems to be no consensus on which is the "official way".

  1. Make unsafe blocks as small as possible

    The point of unsafe is to mark each place where some operation has invariants that have to be upheld. Ideally, each unsafe block contains one, clearly identifiable, unsafe operation. The invariants of the unsafe operation are addressed in the safety comment, the analysis of how the invariants are upheld goes beyond the unsafe block.

    Advantages:

    • Each operation whose documentation needs to be checked is clearly visible
    • Adding a new unsafe operation also means adding an unsafe block, so it's harder to forget to check the invariants

    Limitations:

    • May require restructuring code so that only one unsafe operation ends up in the unsafe block

      Addressed by this proposal

    Disadvantages:

    • Requires multiple safety comments when performing multiple unsafe operations using the same set of invariants
    • Can hinder readability in complex, highly unsafe functions
  2. Make unsafe blocks encompass a set of operations that are key to upholding the invariants

    Bugs in safe code that is responsible for upholding invariants also counts as unsafe. Invariants specified in the safety comment are upheld by the block, those invariants can be used to do unsafe operations. Multiple unsafe operations in a single block may be allowed, as long as they share the invariants. Ideally, the analysis of how the invariants are upheld is contained inside the unsafe block that has the unsafe operations.

    Advantages:

    • Invariant reasoning is more delimited
    • unsafe blocks are less of an eyesore in complex, highly unsafe functions, allowing better readability

    Limitations:

    • Certain unsafe invariants are non-local, meaning they can not be encapsulated in a block

      unsafe fields would help in many cases, as they allow moving certain invariants into struct fields

    Disadvantages:

    • It is less obvious which operation(s) in an unsafe block contribute invariants, the documentation of all operations inside the block need to be checked
  3. It's All Unsafe Anyway™

    Wrap all the code in a function in a single unsafe block. Used by users of C or in code that interacts with the hardware/ffi where almost every line of code contains an unsafe operation. Usually a code smell that calls for smaller, more fine-grained abstractions.


  1. and one more, but that one's not great ↩︎

3 Likes