Hidden unsafe due to unintentionally abusable macros and include

Disagree - the author intended for the macro to be safe to use as they didn't expect new unsafe code to be insertable - they wouldn't have marked the macro as unsafe even if that was a supported feature, because this isn't their intent for it to be used that way. If the default of the language was that a new unsafe block was required here as I'm proposing, then the macro would have been safe:

alloc_pages!(unsafe { *(0x41414141 as *mut usize) } );

Agree with the general sentiment here. I've previously raised that using unsafe in a macro is difficult to reason about because of things like overloading basic traits like arithmetic and comparisons. Generics is an interesting point to consider too.

1 Like

I'm not disagreeing with the proposal to be able to mark macros as unsafe, like we can do with functions; I think this is a good idea too.

Specifically, I think it's worth recognizing that expressions are not just traditional function inputs that can potentially tweak the function's code to be unsound, they are standalone code, and as code they should be subjected to requiring their own unsafe block if they want to use the unsafe dialect.

1 Like

I would have in byteorder. And if I didn't, I would consider it an oversight. It would be unsound and thus a bug to fix.

1 Like

Another workaround I use is to include unsafe_ in the name of the macro, so for example the byteorder macro would be called unsafe_write_num_bytes. I find this especially useful when I want to use macros in different modules.

1 Like

Let's take it back to the mmap example. The point is that this is only unsound because you can insert new unsafe code into the $length expression. To fix the macro to behave as the author intended, the trick should be fixed - not to mark the whole macro as unsafe:

#[macro_export]
macro_rules! alloc_pages {
    ($length:expr) => (
        let x = $length;
        unsafe {
            mmap(0 as *mut c_void, x, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0)
        }
    );
}
12 Likes

Oh, I've missed that detail before. I see your point now.

1 Like

Yes, that's a good idea. Done: impl: prefix unsafe macros with 'unsafe_' · BurntSushi/byteorder@5d9d038 · GitHub

4 Likes

Ahhhh, I see now. Yes, I also missed your point. So the problem is the scope of unsafe itself being applied to the expansion of the macro. I agree that in such a case, it wouldn't make sense to mark the macro as unsafe. In byteorder's case, it's not as easy to make the macro safe for all inputs like your alloc_pages macro.

I'm not quite sure what to do about it though. This might just be one of those things that you have to be careful about when opening an unsafe block inside of a macro.

2 Likes

There are ways safety hygiene could be implemented. Maybe it's only a macros 2.0 thing. Maybe editions could be used (though I think it would have to be based on the caller's edition, not the macro's crate edition). CAD97 response was a reasonable starting point, IMO.

Should this be forked into a new topic? We're at >30 replies now, most of which have been based on misunderstanding the OP. I think it would be nice to start with a fresh topic focused on macro safety hygiene and exploring possible ways it could be implemented without breaking the universe.

1 Like

Regardless of what other steps we take, implementing a lint (in Clippy?) as suggested by toc above would be useful. While a lint doesn't stop malicious crates from using this pattern deliberately, it would help non-malicious macro authors and code reviewers detect and work around the problem today, and would also smooth the migration to a new edition (or new syntax) where some form of hygiene is used to solve the problem.

2 Likes

Summarizing what was said above, there are two aspects to this problem, that need to be solved separately:

  • For cases where the macro is supposed to be always-safe-to-use but makes use of an unsafe block, we need unsafety hygiene (or carefully pulling code outside of unsafe blocks in the macro, as with let x = $length).
  • For cases where the macro is not supposed to be always-safe-to-use, we need a notion of "unsafe macros", or establish a convention that such macros should have a name starting with "unsafe" (or similar). This has come up before in the plutonium advisory discussion, where the question was "what does soundness mean for macros".
14 Likes

Sure, but this doesn't have to do with what OP is asking for. Whatever level of unsafety unsafe means, allowing it to "leak" is not great because it defeats its purpose then. This doesn't only apply if unsafe means "potential security hazard/boundary" (which it indeed isn't), it works just fine with its actual meaning, "memory safety boundary".

Re #[link_section] is safe: it shouldn't be, because it can cause arbitrary memory corruption. The corresponding issue is officially l-unsound.

1 Like

Fantastic! If anyone else is interested in this sort of thing, I opened a tracking issue to try to put together a list of such attributes and track progress on adding similar lints:

1 Like

For the case of macro that are supposed to be unsafe to call, i don't think the unsafe_ prefix is good because that's not the unsafe keyword. But I prefer if the macro force the use of unsafe, for example:

macro_rules! write_num_bytes {
    (unsafe $ty:ty, $size:expr, $n:expr, $dst:expr, $which:ident) 
       => { ... }
}

meant to be used as

write_num_bytes!(unsafe u16, 2, n, buf, to_be)

That way the unsafe keyword is present.

That's what I did in the macro of the cpp crate: cpp::cpp_class - Rust

2 Likes

Should there be a corresponding safe { ... }1 construct that can be used inside an unsafe block to temporarily disallow unsafe operations? It would make it a lot easier to manually fix macros with this issue, at least.


1 This is likely a bad keyword choice; that can be bikeshedded later if this idea goes anywhere.

1 Like

And a lot easier to automatically fix them, as well. (For example, “safe blocks” could be used to easily provide a suggested fix for macros that trigger the proposed lint.)

I've seen this proposed before, IIRC for the same reason... it would be a "low-tech" alternative to proper unsafety hygiene. (I could also imagine that some macro authors do not want unsafety hygiene, they might want their macro to act like an unsafe block.)

3 Likes

It seems difficult to make automatic macro hygiene perfect: How should tt munchers that build expressions be handled, for instance? Other forms of code generation, like proc macros and build scripts, might also benefit from an easy way to provide unsafe hygiene as well. One path forward could look something like this:

  1. Implement an explicitly-safe block, probably something like #[safe] { ... } to avoid a new keyword for something that'll only be used in generated code.
  2. Add a new annotation (#[hygienic_safety]?) for macro_rules that changes how argument parsing is performed. If the macro is expanded in a safe context, expr and block arguments get automatically wrapped in #[safe] { ... }. Add unsafe_expr and unsafe_block argument types that retain the old behavior.
  3. Let proc macros know whether they're being expanded in a safe or unsafe context, so they can implement similar hygiene behavior as macro_rules.
  4. At an edition boundary, apply #[hygienic_safety] to all macro definitions. Anything broken by this can be fixed by replacing expr and block arguments with their unsafe equivalents.

This means that any effects of the hygiene system show up explicitly in the macro's expansion, and #[safe] { ... } can be used to explicitly opt-in to hygiene for any code not covered by these simple rules.

I was able to produce a working Playground demo for the include! idea - gaining access to unsound macros that are not supposed to be exposed, and then abusing them to introduce unsafe behavior (accessing global mutable without our own unsafe block). This macro was just using ident instead of expr so we can't introduce arbitrary code, but I think it's still interesting.

1 Like

Unsafety hygiene would work the same way that standard hygiene (name lookup and edition) works today.

Essentially, each token is tagged with whether it was lexicographically within an unsafe block when written. When an identifier that refers to an unsafe operation is used, the safety check is not done by checking if we're currently in an unsafe block, but if the token was tagged as lexicographically in an unsafe block.

If there's a case where this doesn't work, I'd love to hear it. If you can break hygiene with that specification you can probably break the existing hygiene :grimacing:

5 Likes