Suggestion: Target-focused `unsafe` call

chrefr · September 29, 2020, 12:51am

I hate unsafe blocks because they're too broad. Today I've met a chain of calls, where the last method was unsafe. Here's how it looks:

unsafe {
    x
        .method1(a, b, c)
        .method2(1, 2, 3)
        .method3(1 + 4, concat!("I", " ", "am cool"))
        .unsafe_method()
}

Now I can easily do an unsafe operation mistakenly inside, and the compiler will tell me nothing.

I think that a target-focused unsafe call syntax will be better. My suggestion is:

unsafe unsafe_fn(1, 2);
x.unsafe unsafe_method("Hi");
self.unsafe unsafe_method(unsafe_fn()); // Error!
// Correct:
self.unsafe unsafe_method(unsafe unsafe_fn());

Now the compiler can know that only the call is meant to be unsafe, and catch mistaken unfortunate other unsafe sections.

steffahn · September 29, 2020, 1:04am

Iʼve had the same idea myself. Unfortunately it is not particularly pretty, but I wouldn't know any better way and it seems immensely useful. More than the unintentional use of unsafe things, this also prohibits the situation where someone reading code has literally no idea where in that huge unsafe block the actual unsafety is happening and whether one has actually spotted every single unsafe thing there is or maybe missed one.

Another thing would be that there should also be syntax for unsafe language features that are not function or method calls, like e.g. dereferencing raw pointers. And of course unsafe functions should not have their bodies be unsafe blocks by default. Then one could eventually even have a tool convert unsafe blocks into explicit unsafe-on-every-use syntax to aid autiting overly unsafe-block reliant or old code.

A downside to this approach is that sometimes an unsafe block communicates which “safe” code is relevant to the unsafe operations and what things belong together. On the other hand there is literally always also safe code that's relevant to the soundness of unsafe code but not inside of the unsafe block, so relying too much on such implicit extra information is dangerous anyways.

Lastly, of course it's more verbose. For a sequence of statements with lots of unsafety the extra verbosity could make readability harder. For a potential alternative, I was thinking of perhaps there should be syntax highlighting from the IDE that (very clearly) points out exactly where in an unsafe block the unsafe methods, functions, pointer derefs, etc happen. But it feels like a bit of a weak language design if the only way to really understand Rust code is by looking at it thorough an IDE.

chrefr · September 29, 2020, 1:10am

I don't think we have any another syntax that needs that. The idiom of Rust "everything is an expression" allows you to write code like unsafe { *p }.method() which I'm really conformable with. I don't know any other syntax of Rust expect calls that suffers from this problem.

The problem with calls is double: all calls cannot be marked unsafe without their parameters being unsafe too (i.e. unsafe { f(g()) } executes g() in an unsafe context, too), and methods are even more problematic since they cannot be marked unsafe without their receiver (i.e. unsafe { g().f() } executes g() in an unsafe context in addition to f()). The methods' problem can be solved using a syntax like g().unsafe { f() }, but that still has a new syntax while not solving all problems (so my suggestion is preferred over), and for me at least, this syntax doesn't intuitively ascribes f() to g()'s return value.

This is already on the roadmap: RFC 2585. The RFC says that the plan is to eventually make it warn-by-default then err-by-default (because this is a breaking change).

I strongly prefer this verbosity over the cons. And if you're doing something really unsafe, you can still wrap the whole code with an unsafe block. Your choice.

I think Rust should not rely on IDEs, and that still doesn't solve all problems, only the readability, which I think it's less important than the potential unsafety.

steffahn · September 29, 2020, 1:42am

If p is a more complicated expression, the same problems apply that unsafe { *p } will allow unsafe things to be called inside of p.

chrefr · September 29, 2020, 2:21am

This may be an issue, you're right, but I don't worry so much about that. Mainly because pointer dereferencing is rare and extracting to a variable is easy in this case, but also because I haven't met this (yet)

Edit: Also, I don't have an idea for an appropriate syntax in this case.

Edit 2: Maybe unsafe(*)p or unsafe{*}p?

josh · September 29, 2020, 2:33am

Without necessarily trying to argue against this, note that the current best available alternative would be something like the following:

    let y = x
        .method1(a, b, c)
        .method2(1, 2, 3)
        .method3(1 + 4, concat!("I", " ", "am cool"));
    unsafe { y.unsafe_method(); }

Or, if the methods just chain by returning &mut self:

    x
        .method1(a, b, c)
        .method2(1, 2, 3)
        .method3(1 + 4, concat!("I", " ", "am cool"));
    unsafe { x.unsafe_method(); }

Either of those seems relatively reasonable as an alternative. That's the baseline we need to weigh a potential syntax change against.

I do tend to try to make unsafe as narrow as possible, and I've felt the resulting awkwardness as well when doing so requires introducing new temporaries. I feel like the syntax proposals thus far, though, are less self-explanatory and harder to mentally parse.

Aloso · September 29, 2020, 4:07am

Note that rust-analyzer can already do that, using semantic highlighting.

That said, I don't think introducing a temporary variable is too much to ask. The current syntax is straightforward and easy to understand. This is important when dealing with unsafe (and therefore security-critical) code.

chrefr · September 29, 2020, 4:28am

@Aloso and @josh,

First of all, extracting into variable solves the methods problem, but it's not so good for arguments evaluation. Although we can pull them out, this gets unreasonable quickly.

And about methods: you're right when we're talking about self or something like the builder pattern, but for instance in my case, I'm dealing with a chain of tightly coupled operators. Something like get a stack stored in a field - then get the last item inside it - then unwrap. Naming operands will result in meaningless names such as this_scope_before_unwrap. I can just call it this_scope then let this_scope = this_scope.unwrap(), I know, but outside of noisy and boilerplate code, I also think this does not fully express the value stored inside the variable.

H2CO3 · September 29, 2020, 7:17am

What do you mean? You can put and omit unsafe blocks separately for each argument, because they are distinct expressions. foo(bar, unsafe { baz() }) works just fine.

By the way, did you consider why the syntax is the way it is? unsafe applies to an expression. It need not be a function call. However, .some_unsafe_method() is not a fully-fledged expression by itself. It is the right-hand-side of a method call expression, and it's useless without the self on the LHS.

This syntax change would suggest to alter the semantics of unsafe so as to apply to parts of expressions or incomplete expressions. Working with partial constructs in a highly regular language is always a headache to readers of the code and compiler writers, and is consequently not a good idea in general.

Especially when the solution is literally trivial – why would you even consider creating a single variable such a burden as to warrant completely new syntax? That seems unreasonable to me. The bar for features should be much, much higher than "it annoys me sometimes", because new features are expensive. You should not be taking this stuff lightly.

chrefr · September 29, 2020, 11:41am

No, I mean unsafe { foo(bar, baz()) }. You can do let b = baz(); unsafe { foo(bar, b); } but function arguments are not just one (like the receiver) and usually it's not so easy to name them.

scottmcm · September 29, 2020, 5:00pm

Hopefully one day we'll have some form of tuple splatting so one can always do something like

let args = (bar, baz());
unsafe { foo(...args) };

if naming all the arguments individually is unappealing.

josh · September 29, 2020, 6:53pm

As one potential idea: what about a safe block?

unsafe { foo(bar, safe { baz() }) }

We could even have an (optional, off-by-default) lint that warns you if you have a non-trivial expression inside an unsafe block that doesn't need unsafe, and a suggestion to use safe for that.

Similarly:

unsafe { safe { x.chain1().chain2() }.chain_unsafe() }

I don't think all code should be written this way, but it'd be an option for people who want to use it.

atagunov · September 30, 2020, 1:21pm

It might be possible to mark just a particular function invocation:

let a = f{unsafe}(a.safe1(), b.safe2(), c.g{unsafe}());

Here {unsafe} "binds" to the following parenthesis. This means that we're okay with this being an invocation of an unsafe function in a regular safe context

H2CO3 · September 30, 2020, 1:47pm

That looks ugly: it's not obvious which part it binds to, and it's backwards compared to the current place of unsafe (which is prefix w.r.t. the curlies but infix here), as well as inconsistent because it is no longer the scope of the curlies that is being marked as unsafe.

kornel · September 30, 2020, 3:33pm

I'm not sure if this is a problem at all. Over-reaching unsafe doesn't disable any safety features in safe code, so it doesn't cause any direct harm to the code. It increases the scope of code that needs to be carefully reviewed, so it's a bit of a waste of time/attention.

Once you start bending the syntax to micromanage every bit of every expression, that syntax itself will add more noise and complicate code review more than the slightly too big unsafe block.

scottmcm · September 30, 2020, 3:33pm

It's not obvious to me that the unsafe token is the whole problem here. I feel like any code that's trying to be this focused about the location of the blocks probably also wants to be writing // SAFETY: comments on all of them, which is so much more syntactic overhead than an extra let or two.

I wonder if instead there'd be a good way for the compiler to help you record which of the unsafe obligations you've met, both to help you make sure you've checked all of them and so that adding a new call needing new things in an unsafe block would be able to warn you.

Of course, at some point that's called "make a newtype that promises it"...

dhm · September 30, 2020, 4:47pm

mcy · October 1, 2020, 8:17pm

The bit about SAFETY reminds me of this silly macro from a few weeks ago: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7e11bfcdd85e83c16cbcef7ace2b8848

chrefr · October 1, 2020, 8:25pm

Why? This is really nice!

mcy · October 1, 2020, 8:32pm

Specifically about compiler-checks to make sure that each and every unsafe operation has an associated safety comment. I'm not a fan of the version I posted, since it fails to do it on a per-operation basis.

Topic		Replies	Views
Unnecessary `unsafe` block… language design	22	3010	May 18, 2022
[Pre-RFC] Single function call `unsafe` language design	31	661	February 10, 2025
Half-baked idea: postfix/monadic `unsafe` language design	18	1758	August 19, 2019
Pre-RFC: making unsafe more safe to use language design	12	3263	March 25, 2019
Unsafe categories	18	1446	December 14, 2021

Suggestion: Target-focused `unsafe` call

Related topics