Ability to call unsafe functions without curly brackets

Currently stuff like this is disallowed:

unsafe fn unsafe_fn() {
    5 + 5;
}   

fn main() {
    unsafe unsafe_fn();
}

The call to unsafe_fn() has to be called via unsafe { } instead of just unsafe:

unsafe fn unsafe_fn() {
    5 + 5;
}   

fn main() {
    unsafe { unsafe_fn() };
}

I can't think of any reasons why it should be required by the language to have the curly brackets though. Is there any place where the parser could get confused on what I'm trying to do?

The parser wouldn't get confused. The human reader might.

4 Likes

In general, Rust uses {} for things that affect the whole scope, rather than a value, which is why discharging obligations uses braces.

2 Likes

I guess, but I can't really picture it being an issue at scale. At most I'd picture it being used across a function call and it's return arguments (i.e. something like unsafe func().method1().method2().

That's fair. I've got some cases where I believe it does make the code look better, even without making it less legible though.

My first case is with https://github.com/hwittenborn/husk/blob/ae7cdc338a49cb0f4f299e382b6ab9d2c13d6078/src/syntax.rs#L99-L102:

let ptr = unsafe { bindings::HuskSyntaxNewPos(offset, line, column) };
let offset = unsafe { bindings::HuskSyntaxPosOffset(ptr) };
let line = unsafe { bindings::HuskSyntaxPosLine(ptr) };
let col = unsafe { bindings::HuskSyntaxPosCol(ptr) };

In my opinion, this could be written much more elegantly like this:

let ptr = unsafe bindings::HuskSyntaxNewPos(offset, line, column);
let offset = unsafe bindings::HuskSyntaxPosOffset(ptr);
let line = unsafe bindings::HuskSyntaxPosLine(ptr);
let col = unsafe bindings::HuskSyntaxPosCol(ptr);

I know I could wrap all four of those calls in a single unsafe { .. } block, but then indentation can start to become an issue as the project size increases. https://github.com/hwittenborn/husk/blob/ae7cdc338a49cb0f4f299e382b6ab9d2c13d6078/src/syntax.rs#L168-L174 shows an example where indentation can start to make it look a bit clunky:

let ptr = unsafe {
    bindings::HuskSyntaxNewParser(
        keep_comments as u8,
        stop_at_ffi_ptr,
        lang_variant.to_ffi_int(),
    )
};

I feel like this would look much better to the eye if written like this:

let ptr = unsafe bindings::HuskSyntaxNewParser(
    keep_comments as u8,
    stop_at_ffi_ptr,
    lang_variant.to_ffi_int(),
);
2 Likes

That's actually a good counterexample, similar to why await became a postfix dotted keyword, because it's just difficult for humans to remember the binding rules for a prefix operator, and parentheses make things awkward and cannot be applied in the middle of method chains. (Hmm, maybe we should have .unsafe? :smile:)

15 Likes

One thing I'd like to bring up is my RFC for unsafe fields. Constructing a struct with unsafe fields would be itself unsafe, and people were concerned about having to place the entire constructor in an unsafe block. One idea was to do unsafe Foo { … }, which would allow constructing it without introducing an unsafe context to all field declarations.

It's important to keep concepts like this in mind imo. It is possible that this actually happens, and allowing unsafe <expr> would necessarily conflict with this and any other similar future extensions to the language.

5 Likes

Though, my assumption for a non-block-based unsafe keyword is that it wouldn't affect a whole scope, just the single operation it is applied to, so in

let ptr = unsafe bindings::HuskSyntaxNewParser(
    keep_comments as u8,
    stop_at_ffi_ptr,
    lang_variant.to_ffi_int(),
);

you would know that to_ffi_int is a safe function, without having to extract that out to a separate variable.

4 Likes

This is not true. unsafesuggests there is something beyond safe rust.

With unsafe foo().bar().baz(), the unsafe keyword both told the compiler such function is unsafe, and told reader this line contains unsafe calls.

For readers, there is no need to know which term is unsafe.

BTW, I prefer allow adding unsafe to a statement.

unsafe let x=foo().bar().baz(); // OK
let x=unsafe foo().bar().baz(); // OK
// for static mut X
unsafe x=X; // OK
x=unsafe X; // OK
// advantages to allow unsafe occurs before the original statement:
unsafe X=foo().bar().baz(); // OK

There is no need to know whether foo(), bar() or baz() is unsafe, since when people write such code, people would just write

unsafe{foo().bar().baz()}

rather than

unsafe{unsafe{unsafe{foo()}.bar()}.baz()}

Counterexample:

unsafe foo()
    .bar()
    .baz()

Syntactically now the .bar().baz() is further from the unsafe and might be interpeted as not being unsafe. With unsafe { foo().bar().baz() } there's no such misunderstanding because you just have to assume all of them might be unsafe.

6 Likes

How would it conflict? I can't see any reason the compiler would get tripped up, considering it should have access to type information already (i.e. what's a function, what's a struct, etc).

I think that'd make the most sense to me. With a block you know everything inside the block is affected, but with the non-bracketed syntax I feel like it leans more towards only appearing to affect the immediate function being called. Even (potentially) with stuff like this:

let ptr = unsafe bindings::HuskSyntaxNewParser(
    unsafe keep_comments.unsafe_fn(),
    unsafe stop_at_ffi_ptr.unsafe_fn(),
    unsafe lang_variant.to_ffi_int_unsafe(),
);

I still feel like the code is pretty clean.

2 Likes

I think as long as it was written that the unsafe applied to the end of the current statement, then it'd be more than fine.

You'd assume something like this would all be part of one statement:

let x = 5 
  + 5;

So I don't see any reason why having an unsafe block apply to the end of a statement should cause much confusion.

1 Like

I think this thread has shown that it can cause confusion.

6 Likes

I'd definitely like some way to declare one operation as unsafe without an entire scope.

In my code, I use // SAFETY: comments to annotate unsafe { ... } blocks with an argument for why the code in that block is safe. I might have something like this setup:

/// A safe function, might become unsafe later
fn foo(x: usize) -> usize { .. }

/// An unsafe function.
///
/// # Safety:
/// `x` must be a prime number.
unsafe fn bar(x: usize, y: usize) -> usize { .. }

/// Part of the external API of this crate
pub fn baz() {
    // SAFETY:
    // `bar` must be called with the first argument prime, which 5 is prime.
    let x = unsafe { bar(5, foo(7)) };
}

If I change foo to be unsafe, then the current unsafe block effectively hides the potential unsoundness of calling foo(7) inside baz, and might result in UB sneaking into my program on accident. I try to code defensively against it by instead writing:

pub fn baz() {
    let intermediate_value = foo(7);
    // SAFETY: ...
    let x = unsafe { bar(5, intermediate_value) };
}

But it'd often be more readable to not have a separate statement assigning some variable to the value and then plugging it in below (especially since, in less contrived cases, the soundness of calling bar might depend on some property of foo, and the indirection hides where that second value came from).

I'd love some way of annotating just the outermost level of an expression as unsafe and not infecting the inside, but I don't know how to do it without any confusion. The only idea I can think of is to only allow it to be placed on expressions where it's unambiguous (i.e. unsafe foo(bar(), baz()) is okay and annotates foo, and foo(unsafe bar(), baz()) is also allowed and annotates bar, but unsafe foo(bar()).baz() would error as ambiguous). This would majorly limit where it can be used, and at this point it might be so rarely applicable that it's not even worth adding anymore, but I can't think of anything better.

It seems like postfix unsafe would cover this case.

pub fn baz() {
    // SAFETY: Only `bar` is unsafe.
    let x = bar(5, foo(7)).unsafe;
}

This and similar ideas have been discussed before, for instance subcategories of unsafe e.g. unsafe("union_access") { ... } would disallow other forms of unsafety within the block. I'd love to see some coherent collected ideas for what makes real life audits easier and more self contained, I personally don't use enough unsafe to really comment.

1 Like

Note that, as much as I love postfix, I don't think discharging safety obligations is a good use of postfix. Other postfix things -- like .await or ? -- work on a value because they're about things being in dataflow order, and can be done later by putting things in a variable.

Whereas

let x = something_unsafe();
info!("your log here");
let y = x.unsafe;

just doesn't make sense.

7 Likes

I've seen a couple of macros that wrapped an unsafe API that were accidentally unsound because they were putting user tokens as arguments of an unsafe function call. This would help avoid those mistakes.

1 Like

Why postfix unsafe, wouldn't an unary prefix unsafe works better?

pub fn quux() {
    // SAFETY: Only `bar` is unsafe.
    let x = unsafe.bar(5, foo(7)).baz();
}

Is there some grammar for that

let x=foo(); // safe
unsafe {x.bar()} // unsafe
.baz() //safe

If there is no grammar for that, the scope of unsafe should be the whole expression, or the whole sentences.

Unsafe does not like any other grammar, extend the unsafe scope would yield no ambiguous. even if the whole block is marked as unsafe, the program would generate same results in case compiler does not refuse to compile it.

Thus an unsafe x().y().z() should be regard as the whole expression is unsafe.

what's more, if we restrict the unsafe that must occur before a stmt/block, there would be no ambiguous.

unsafe let mut x=2; // legal
let mut x=unsafe 2; // this is not legal
// X is static mut
unsafe X=2; // legal

This might help for avoid your "Syntactically".

IMHO verbosity of unsafe is a feature. The single boring syntax for it is unambiguous.

Misused unsafe code can break way more than its own expression, so marking individual expressions as safe or not isn't really marking what's risky and what isn't. For example, you may have "safe" let offset = calculate_offset() and unsafe { get_unchecked(offset) }, but the first function is the one that actually can cause unsafety.

6 Likes

As I see it, the value of narrower unsafes isn't in identifying the code that needs to be correct to be sound (that's potentially the entire function, or even module, as you note); it's in identifying which operations being called have safety requirements that need to be upheld by this code. It's saying "if you're reviewing this, check it matches the documented requirements of those functions".

For example, it's potentially surprising that pointer arithmetic operations have their own safety requirements; such a call might go overlooked next to a more obvious unsafe system call. So if we had a per-call-unsafe syntax, rather than the current “this entire block and all its subexpressions”, it might be easier to review:

⟦unsafe sys::do_something_with⟧(self.cursor.⟦unsafe offset_from⟧(self.head));

than

unsafe {
    sys::do_something_with(self.cursor.offset_from(self.head));
}

which could be naĂŻvely read as having only one set of safety requirements to meet.

Of course, it needs to be easy to spot unsafes when reading, and there needs to be a good conventional place for a // SAFETY comment. So I don't mean to claim that a syntax like I sketched above is a good idea in itself; only that it has some value that unsafe {} doesn't, particularly in being “shallow” rather than applying to the entire expression and all its subexpressions.

9 Likes