Explicitly marking unsafe macro expressions

I am concerned with the behavior of the unsafe keyword regarding macro expressions as it allows unsafe code to be used implicitly in a codebase.

I originally submitted a request for a Clippy lint (https://github.com/rust-lang/rust-clippy/issues/3738) to detect macros which evaluate expressions in an unsafe block, like the following:

unsafe fn unsafe_operation() -> u8 {
	1
}

macro_rules! value_equals_unsafe_operation {
    ($a:expr) => (
        unsafe {
            $a == unsafe_operation()
        }
    );
}

Where the above macro could be used to hide unsafe behavior in a program like so:

fn main() {
    let a: [u8; 1] = [0; 1];
    
    let x = value_equals_unsafe_operation(*a.get_unchecked(100));
    
    println!("{}", x);
}

However, after some more thought, I decided to raise this issue with the Rust internals forum as I believe this could be solved better via a change to the Rust compiler itself.

The idea would be that for macros which evaluate an expression in an unsafe block, the unsafe keyword would only be applied to the code inside the macro itself, and not to any code passed in as an expression.

In this way, any code using the macro without any additional unsafe code in its expressions would be fine:

fn main() {
    let x = value_equals_unsafe_operation!(0);
    
    println!("{}", x);
}

However, if the expression itself requires any unsafe code, a second unsafe keyword would need to be added to the macro invocation, like so:

fn main() {
    let a: [u8; 1] = [0; 1];
    
    let x = value_equals_unsafe_operation!(unsafe { *a.get_unchecked(100) });
    
    println!("{}", x);
}

I believe that requiring both places to add unsafe blocks more accurately describes that there is unsafe code used both inside the macro, and inside the expression for this single invocation of the macro.

10 Likes

Consider

macro_rules! value_equals_unsafe_operation {
  ($a:expr) => {
    unsafe {
      (move || $a)() == unsafe_operation()
    }
  }
}

Whether this is the sane default is up for discussion. I think the correct thing here is to not use unsafe blocks in macros at all, and require

unsafe { value_equals_unsafe_operation!(..); }

If you’re invoking unsafe blocks through an interface not marked as unsafe, it is the callee’s, not the caller’s, responsibility to make sure that every possible call is safe, since the whole point of unsafe is “static analysis can’t prove this is safe”, so I doubt static analysis can meaningfully capture this sort of unsafe hygine, nor do I think it is the compiler’s responsibility to do so.

@mcy I don’t think this will work, unsafe “bleeds into” closures.

@chertl I think what you are basically asking for is “unsafety hygiene”, right? Yeah that sounds like something that we would want.

1 Like

How is this being in a macro any worse that it being in a function?

Compare the macro example with this:

unsafe fn unsafe_operation() -> u8 {
	1
}

fn value_equals_unsafe_operation<T: PartialEq>(a: T) -> bool {
    unsafe {
        a == unsafe_operation()
    }
}

So overall this feels more like “I want to know if crates I’m using use unsafe”, not something specific to macros.

1 Like

The above post echoes my thoughts on the subject. The length function for a slice uses unsafe. Does that mean that we would have to write

let length = unsafe { vec[..].len() };

and such like?

Edit: Corrected method syntax

This would only be equivalent if the inner unsafe {} would allow T to come from an unsafe source.

The critical point is

let x = value_equals_unsafe_operation!(*a.get_unchecked(100));

which has an unsafe expression without an unsafe block or context.

A more similar comparison would be if closures could contain unsafe operations without an unsafe block just because they are evaluated inside unsafe {}.

One solution might be a (clippy?) lint requring macros using expr inside unsafe to be prefixed with unsafe_.

7 Likes

But, uh, they can?

fn main() {
    unsafe {
        let x = || std::ptr::null::<i32>().offset(4);
    }
}
1 Like

Evaluated, not constructed.

2 Likes

Yuck.

For what its worth, I think you can just hoist let a = $a;...

There are multiple things to pay attention to, here:

  • Should a macro using unsafe become unsafe itself? I don’t think this is desirable. See ::std::await! or ::pin-utils::pin_mut! : safe constructions using underlying unsafe. It is the same as a non-unsafe function wrapping unsafe behavior in a (hopefully) safe manner, thus not needing to be unsafe itself.

    On the other hand, if a macro expansion cannot be proven to be always sound, then the macro has to be defined without unsafe in it, thus forcing the caller to explicitly use an unsafe scope. Sadly, the “unsafeness” of the macro would not be visible at the “header level”; it could just be documented (and maybe also with a naming convention, like unsafe_foo!), which means that the error message raised might not be great.

  • unsafe hygiene, i.e., an input argument $macro_arg:expr should not be allowed to “unsafe-ly evaluate” $macro_arg (e.g. feeding *::std::ptr::null() as $macro_arg:expr should raise an error since it requires "unsafe evaluation", but feeding unsafe { *::std::ptr::null() } as $macro_arg:expr should be allowed (unsafe being the keyword making all the evaluations in its scope be automagically tagged safe, hence its unsafety)). I’m not being very rigurous but you get the idea.

    In the meantime, the macro maker needs to be careful to never “evaluate an expression argument” (that is, use a $macro_arg:expr) inside an unsafe block, using the following pattern:

unsafe fn get_foo () {}

macro_rules! eq_to_foo {(
  $macro_arg:expr
) => (
  match $macro_arg { macro_arg => unsafe {
    // Look, no metavariables within the unsafe block !
    macro_arg == get_foo()
  }}
)}

// Note that for this very particular case, we could more simply write:
macro_rules! eq_to_foo {(
  $macro_arg:expr
) => (
  $macro_arg == unsafe { get_foo() }
)}
  • unsafe detection: there are many talks about, as a library user, being able to detect / warn and even forbid unsafe usages. The problem is that current “solutions” do not detect unsafe usage through macro invocations. For instance:
    • ::cargo-geiger, a tool to walk through the dependency graph and count the number of unsafe usages encountered in each crate (imho counting is not the best metric, but that is out of topic here):
      • Unsafe code inside macros are not detected. Needs macro expansion
    • Using #![forbid(unsafe_code)] (or RUSTFLAGS=-Funsafe_code):
      • It detects if the macro was defined within the same crate. But a macro from a different crate, even when in the same workspace, will sneak its way past the forbid(unsafe_code) restriction (See here, look at the difference between 1m48 and the end).
3 Likes

Wouldn't it still detect the unsafe in the crate that defines the macro?

I think this is already the desirable behavior anyway, because of your first bullet point: Either the unsafe keyword belongs inside the macro, or it belongs outside the macro, based on whose job it is to uphold whatever the uncheckable invariants are. The crate whose job it is to uphold the invariants ought to be the one that gets flagged by tools like cargo geiger, so if geiger "blames the wrong crate" then the macro probably needs changing.

I... have no idea what's going on in this part. Sorry. Could you maybe try re-explaining that with complete examples?

Is this just about making sure that "you're missing an unsafe keyword" errors only refer to code you wrote, rather than the internals of some macro you called?

Yeah, I clearly need to rephrase that part :sweat_smile:

macro_rules! identity {( $expr:expr ) => ( unsafe { $expr } )}

fn main ()
{
  // look, no unsafe !
  identity!(
    ::std::mem::transmute::<(), ()>(())
  );
}

Imho, if someone wanted to create such macro (e.g. unsafe_identity!), then another "metavariable type" than $...:expr should be used; thus allowing to forbid the above code (since the transmute... expression would not have been valid if it weren't for the macro), as a safety guard to prevent code like the one in the OP. Maybe creating a new "metavariable type", e.g. $...:unsafe_expr ?

2 Likes

I think you can just use let:

macro_rules! eq_to_foo {(
  $macro_arg:expr
) => ({
  let macro_arg = $macro_arg;
  unsafe {
    // Look, no metavariables within the unsafe block !
    macro_arg == get_foo()
  }
})
}

We have macro hygiene for identifers, meaning names declared inside the macro don't leak into code that is passed in at the macro use site. We should have the same for unsafey, ideally: To decide whether code is considered to be in an unsafe block, we should be aware of macros, so that in your identity example, transmute is not considered to be written inside an unsafe block -- because the code was written outside the macro body but the unsafe block was declared inside the macro body.

4 Likes

As an aside, I generally don’t advocate for using unsafe in a macro as it can be very hard to reason about how it may be (ab)used. Consider for example this contrived library for reading from a 2D array, where the library initially appears safe when looked at in isolation, but if the macro is passed custom types which overload the arithmetic operations all assumptions are broken and the macro can read out of bounds.

However, this post aims to propose a more sensible default behavior for interacting with macros intending to create safe constructions which require underlying unsafe.

To clarify, the problem is that unsafe is an implementation detail of this macro, and shouldn’t affect how external safe code interacts with it, but in my example it does:

let x = value_equals_unsafe_operation!(*a.get_unchecked(100));

It’s valid to pass in additional unsafe code (dereferencing an out of bounds pointer) as an expression to this macro without an error.

I don’t think this current behavior is a preferable default, as it is not obvious to the macro author that this is possible, and the suggestions for preventing this (matching the expression, or assigning the expression to a temporary variable) are error prone (easy to forget when writing or refactoring macros), and are not aesthetically pleasing.

I think if code does intend for this behavior (creating wrappers for unsafe, to allow hiding external implicit unsafe code in expressions) it is bad hygiene, and shouldn’t be allowed by default:

macro_rules! identity {( $expr:expr ) => ( unsafe { $expr } )}

fn main ()
{
  // look, no unsafe !
  identity!(
    ::std::mem::transmute::<(), ()>(())
  );
}

The suggestion of creating a new unsafe_expr to retain the current behavior would work, but I really think any use cases that depend on this are bad hygiene and should prefer always being explicit when their expressions require unsafe:

macro_rules! identity {( $expr:expr ) => ($expr)}

fn main ()
{
  identity!(
    unsafe { ::std::mem::transmute::<(), ()>(()) }
  );
}
1 Like

Thanks, “unsafe hygiene” makes sense to me now.

If genuine unsafe hygiene is a long ways off, maybe we could have a clippy lint against using potentially complex matchers like $item, $expr and $tt inside an unsafe block? A macro with $ident in unsafe seems much harder to abuse.

1 Like

Yep, that is what I meant.

Byt ideally, we should still allow some way to circumvent it (the problem is accidentally wrapping a user-provided expression inside an unsafe context before it gets evaluated; but we should still allow macro creators to do that on purpose, hence my "special metavar type" suggestion).

There can be a lifetime related issue regarding temporary variables within the expression, so the ::std::dbg! macro uses the match with a single irrefutable pattern trick to get around that. I thus think using the match instead of the let should give the macro a slightly better usability.

You can, as @RalfJung pointed out, use a simple let instead of the ugly match, it won't change usability 99% of the time (see above).

It is not that much error prone with a single rule of thumb: no metavariables (macro vars) within an unsafe scope (unless intended, of course).

PS: on mobile so I apologie for typos and not linking to the lifetime issue, will do so later.

The compiler already has support for this. We'd just need to emit it at the appropriate sites, but that would cause a breaking change. I think the idea solution is to create an unstable safe block feature that the macro expansion code can wrap around all expression arguments passed to a macro. Instead of erroring with a hard error, we'd emit a future incompatibility lint.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.