Ability to call unsafe functions without curly brackets

I'm not sure what you mean, this is how unsafe blocks work today.

Do you mean the scope of unsafe as a block or unsafe not-a-block? And why should it be the whole expression? I think this thread has shown that this is something that might be confusing. Not sure what you mean with "senteces" though.

We are not arguing about ambiguousness for the compiler, but for the humans that will read the code. Even if some syntax is well defined in the compiler/reference and not ambiguous at all for the parser, if it isn't intuitive for humans it is bad syntax and should not be included.

There is also an argument to make that currently it is not verbose enough. See @JarredAllen's comment about including only the function call in the unsafe scope, not its arguments as well.

I think this is really important. The current trend of "granular unsafe" is a mitigation for not having it, but it's really not a great solution.

I keep imagining something where the requirements are spelled out separately, like

#[safety_requires(aligned(ptr), readable(ptr))]
unsafe fn read<T>(ptr: *const T);

Where the compiler doesn't actually know what aligned and readable mean, but when you use the function, you do something like

#[hold_my_beer(aligned(p), readable(p), aligned(q), readable(q))]
unsafe {
    read(p) + read(q)
}

And some tool would check that you listed all the requirements for the variable, doing the name substitution as needed.

So you'd get

#[hold_my_beer(aligned(p), readable(p), aligned(q))]
unsafe {
    read(p) + read(q) // ERROR: `readable(q)` is required
}

but you could also do

#[hold_my_beer(aligned(p), readable(p))]
unsafe {
    read(p) + read(p)
}

because using the same fact twice is fine.

(This is all just a sketch; details and keywords and structure and everything are all placeholders.)

10 Likes

I really like this approach to being more explicit about safety requirements, but you're limiting it to only direct properties of variables assigned before the block begins. This doesn't help much with my desire to not extract intermediaries into a separately-named value.

As for how to address that, I'm torn on if it'd be better to annotate the function producing the value:

#[promises(aligned, readable)]
fn make_readable_ptr() -> *const u32 { .. }
// No `#[hold_my_beer(..)]` because `make_readable_ptr` promises its output meets the requirements
unsafe { read(make_readable_ptr()) }

or if it'd be better to do inline annotations on expressions:

#[promises(aligned, readable)]
unsafe { read( #[hold_my_beer(aligned, readable)] make_readable_ptr()) }

The former would allow for automatically matching a function whose output is guaranteed to meet common safety criteria (there are a lot of functions that produce aligned and readable and/or writable pointers), but doesn't play so nicely if different people use the attributes to mean different things (or if a library author doesn't use the annotations I concern myself with in my code), whereas the latter is more verbose (especially since I'd feel the need to // SAFETY: comment each #[hold_my_beer] attribute).

And of course, neither of those help with safety requirements that mix multiple inputs together, like I imagine std::slice::from_raw_parts would be annotated:

#[safety_requires(aligned(data), readable_for_count(data, len))]
pub const unsafe fn from_raw_parts<'a, T>(data: *const T, len: usize) -> &'a [T] { .. }

So there's probably room for an even better annotation interface than what I thought of.

Yeah, there's a bunch of further expansions to this -- like if I'm writing an unsafe fn that requires the pointer be aligned already, I wouldn't have to list it again in the body because the parameter would already have it. Avoiding the "// SAFETY: The caller promised it" comments would be really nice.

If you want to be more explicit about safety requirements you can just create a newtype wrapper.

#[promises(aligned, readable)]
fn make_readable_ptr() -> *const u32 { .. }
// No `#[hold_my_beer(..)]` because `make_readable_ptr` promises its output meets the requirements
unsafe { read(make_readable_ptr()) }

Can be rewritten as:

pub struct ReadablePtr(/* invariant: this ptr must be safe to read from */ *const u32);

pub fn make_readable_ptr() -> ReadablePtr { .. }

pub fn read_readable_ptr(ptr: ReadablePtr) -> u32 {
    unsafe { ptr.0.read() }
}

unsafe code is not just unsafe itself, an unsafe code could damage safe code if the safety check failed. Example:

pub struct Bomb{.../*(with private fields)*/}
pub struct Brick{...}
impl Bomb {
    /// make bombs are highly unsafe! Never move Bomb near fire.
    /// SAFETY: never call `move_to_campfire` for a Bomb.
    unsafe pub fn new()->Self{...}
}
trait Moveable {
    fn move_to_campfire(&mut self){...};
    fn move_to_quarry(&mut self){...};
}
impl Moveable for Bomb {
    /// you should never move a bomb to campfire
    fn move_to_campfire(&mut self){panic!("Boom!")} // suppose this is really unsafe......
}

In this case,

let mut x:Box<dyn Moveable>=Box::new(unsafe{Bomb::new()});
...
x.move_to_campfire(); // boom!

The unsafe code seems safe, it is the safe code which violate the safety check.

Thus, in case you wrote

unsafe let a=foo().bar().baz();

You should at least check whether the whole statement is safe.

even if only bar is unsafe, you should ensure both foo() yields safe values, and calling baz() for output of bar is valid.

I don't see why this is an argument for making the inline unsafe apply to the whole statement, the x.move_to_campfire() is a whole different statement so it will fall outside the unsafe scope in any case.

Why only the whole statement? Why not the full function? Or the full module. After all the safety of something could very easily depend on all the code that has visibility access to it (see for example Vec's len field).

If anything, yours is an argument for putting everything in a single giant unsafe block.

2 Likes

I'm using a simple example to show how an unsafe statement ruins safe code.

This is the point. If the program does not violate SAFETY rules, unsafe label won't bother either reader or compiler. Since there is no bug, we could simply ignore all the unsafe labels.

But when a bug occurs, unsafe labels shows all the SAFETY rules. Users have to check all the rules, and even the whole function, and even the whole program.


Actually I found you ignore the safety rules. unsafe without safety check is not welcome.

unsafe without rules:

unsafe let a=foo().bar().baz();// no safety rules, just a unsafe keyword.

Rust's suggestion:

// SAFETY: init() must be called before bar() is called.
// This function is called after init(), thus it is safe.
unsafe let a=foo().bar().baz();

simple unsafe with no SAFETY check is not what rust suggests.

What's more, safety check allow programmer mark unsafe block carefully:

unsafe let a=foo()
    // SAFETY: init() must be called before bar() is called.
    // This function is called after init(), thus it is safe.
    .bar()
    .baz();

It is obvious that only bar() is unsafe.

You could assign those arguments to local variables first, to avoid any computation there. But what about allowing syntax like this:

unsafe {myunsafefunc}(mysafefunc()) 

It is verbose. It would not apply to the arguments.

How about this syntax?

let a = foo().unsafe bar(baz()).xyz(); // unsafe only applies to bar
let a = unsafe foo(bar()); // unsafe only applies to foo, not bar

I think this works well because it's maximally conservative.

If somebody gets confused and thinks bar would also be marked unsafe in the second example, they will just get a compile error.

The problem is that, lack of SAFETY checks. If both foo and bar is unsafe, you might write 2 unsafe, or just roll back to unsafe block.

I propose unsafe works on whole statement, it is safety rules that provide the unsafe indicators.

unsafe let a=foo()
    // SAFETY: bar meets ... thus it is safe
    .bar()
    .baz();

You could only using such unsafe in a safe function, thus SAFETY check is needed.

I don't understand what you mean by that. If you mean "lack of SAFETY comments", I don't see how my proposal prevents you from writing those comments.

So again, why should inline unsafe apply to the whole statement/expression? You're basically saying "it doesn't matter since you need SAFETY comments anyway", but this is not an argument in favor of a specific choice.

No, it's only obvious that the writer of this code documented that calling .bar() is ok, but you have no way to know if foo() or .baz() are really unsafe too. You have to either look at their source code or hope the writer of this code didn't make a mistake (which is what you should be checking in the first place though!)

I see two problems with this syntax:

  • it is already valid, although only if myunsafefunc has type ()

  • intuitively what's unsafe is naming/referencing myunsafefunc (notice how the call (notice how the (...) are outside the unsafe block)

The space between unsafe and bar in the first line looks kinda bad (it "splits" the expression) but I don't think we can do much better.

Why are you assuming people won't write SAFETY comments if inline unsafe had a reduced scope?

1 Like

unsafe usually means "you should check safety rules", one for them is enough in a statement.

Although your grammar is more flexible, but it requires additional unsafe if multiple unsafe function is called:

// mine:
unsafe a=foo()
    // SAFETY: bar meets ... thus it is safe
    .bar()
    // SAFETY: baz meets ... thus it is safe
    .baz();
// another version
    // SAFETY: bar meets ... thus it is safe
    // SAFETY: baz meets ... thus it is safe
unsafe a=foo()
    .bar()
    .baz();

In this case yours must write down 2 unsafe.

a=foo()
    // SAFETY: bar meets ... thus it is safe
    .unsafe bar()
    // SAFETY: baz meets ... thus it is safe
    .unsafe baz();

If more unsafe and safe function occurs, you might using a simple unsafe block.

We are talking about grammar, it is precise grammar, rather than complex grammar is better.

That's a feature, not a problem. If you're using two unsafe functions in a chain of calls, it's good that you would point them both out.

1 Like

Fair points. What about:

unsafe {myunsafefunc(}mysafefunc()) 
  • Definitely not valid currently.
  • The open parenthesis being in the unsafe block indicates a call.
  • Ugly though. Really ugly.

This doesn't work with how macros deal with token trees -- they require matching parentheses and braces.

Maybe slightly bad, but spaces in expressions are already possible:

let x = 7 + if true { 3u32 } else { 6u32 }.trailing_zeros();

currently, in practice we might write

let a=unsafe {foo().bar().baz()};

even if baz is safe. To ensure whether foo and baz are safe, manually calculate them is needed

let a=foo();
let a=a.bar(); // Error, bar is unsafe.
let a=a.baz();

If bar is unsafe, currently unsafe block would cover the whole foo, make it difficult to figure out whether foo is safe.

This is the current status quo, but it doesn't mean we can't improve over it.

That's why I used

let a = foo().unsafe.bar(baz()).xyz(); // unsafe only applies to bar
let a = unsafe.foo(bar()); // unsafe only applies to foo, not bar

But I don't think syntax is importantant at the moment since we should be more focused on

  • is an unsafe statement useful? (I don't think so)
  • is an unary unsafe useful? (I personally do)
  • is the current unsafe syntax enough?