Unsafe code annotations


#1

I had a random 3am idea about unsafe code blocks. Unlike most of my such ideas, it might just not be terrible, so I’m sharing here.

The basic purpose of unsafe is that the compiler can’t prove what’s inside is safe, but we as programmers can promise that it’s safe, based on additional reasoning. I think it would be good to have a structured annotation in which the programmer can describe their reasoning as to why they believe it’s safe.

This morning, I read The Secure Code WG’s 2019 goals post. Under the “Use of unsafe code” heading, a very important point is made:

Many widely used libraries use unsafe code where it’s not strictly necessary. Typically this is done for performance reasons, i.e. there are currently no safe abstractions to achieve the goal safely and efficiently.

The goal here is to reduce or eliminate the use of unsafe code throughout the ecosystem where it is not strictly necessary without regressing correctness or performance. The action items for that include:

  1. Investigate why exactly people resort to unsafe code on a case-by-case basis. Compile a list of case studies so that we can identify missing safe abstractions or idioms.

This is a good goal, and made for a timely collision of ideas.

By “structured”, above, I mean:

  • something that allows describing the human reasoning why the author believes the block is safe,
  • with enough convention and structure that, it can capture and classify code blocks as above, and support this effort, ideally with some statistical data collection
  • I was thinking of something like regular doc comments, as an annotation on the unsafe block, with a template and suitable headings, tags, and keywords - much like there are conventions already for describing when a function panics.
  • However these are probably not user-facing documentation, so there might need to be some additional tweak to make the comments hidden by default in rustdoc output, at least when the docs are generated for public consumption.
  • It could also be a mix of comments and other attributes for the more structured data (similar to the way deprecations and feature gates are annotated).

The initial goal (as in the blog post above) is discovery and classification:

  • Once an author (or reviewer) has looked at an unsafe block and made some kind of decision about why it’s there or whether it’s of concern, we should be able to remember that
  • An annotation and classification gets added, by the author or via PR
  • Everyone can then spend effort on either analysing others, or prioritise fixing/removing ones more likely to be problematic.

Eventually:

  • Once some conventions, classification tags/nomenclature, etc are established we could consider a lint that warns authors about unsafe blocks lacking the relevant annotations, to prompt additional distributed work to aid the effort and tracking.
  • If this whole effort is wildly successful and reduces the number of unnecessary or unknown unsafe blocks, the remainder might be worth some extra visibility and we can think about presentation and discovery for end users about what kinds of unsafe is in their dependency tree.

Note: this is not the same as unsafe fn, where the author should already be describing why the function is publicly unsafe, but that case might borrow some of the same annotation conventions.


#2

Relevant:


#3

If someone feels like writing a tool to manage this, adding the macro is as simple as a no-op:

// note: I'm simplifying the structure a little bit here
#[proc_macro_attribute]
fn safe(input: TokenStream) -> TokenStream {
    let block: ExprUnsafe = parse(input)?;
    block.attrs = block.attrs
        .into_iter()
        .filter(|attr| attr.path != parse_quote!(safe))
        .collect();
    block.into_token_stream()
}

Usage works so long as you put a ; after the unsafe block (which may require changing the macro though):

fn main(){
    #[safe(true)]
    unsafe {
        println!("hi");
    };
}

I could definitely see a tool that would scrape for unsafe blocks and their annotated #[safe] being used. A lot of high-quality codebases already put comments on unsafe blocks to explain what they’re doing (it’s good practice!) and there’s defnitely benefit to be had to structure it.

There’s also been suggestions to tie it to reviews. The possible metadata I see a #[safe] wanting, currently:

#[safe(
    reason = "Description of what this unsafe is guaranteeing",
    review = "link to some review of the code",
    author = "person to blame for unsafety",
    hash = "hash of the unsafe block contents, to prevent staling",
)]

But I still think as far as the “plugin” part of the macro goes, I think it definitely should no-op and let the external scraping tool do the real work.

Maybe @dpc would be interested after crev is production-ready, or otherwise the Secure Code WG could “sponsor” a tool for this.

(Unfortunately this is far from what I’m good at and I can’t really commit time to it, though it does seem both useful and interesting.)


#4

reading through that RFC, the emphasis was on the obvious case of explaining code that has to be unsafe. That’s important, but I think the vital realisation is that a lot of unsafe could be eliminated “if only there was a [missing thing]” to let it be done safely.

It’s not just “safe because I said so” but also “unsafe because I can’t X”.

We would like to track both the demand for X, and also where to go back and update code if X eventually appears. So the metadata should include X. If X is already an rfc or a nightly-only feature, it can obviously just be a direct reference to that named item. There might also be a safe variant of the code right next door, in a conditional cfg block on that feature. Otherwise an id/link to a tracking issue, or some well-recognised tag name for wished-for features.


#5

We shouldn’t trust humans.


#6

A lot of times the reason is “unsafe because Y is slow” which makes me wonder whether there would be any use to having a way to also include a safe (but slow) version of the code inside an unsafe block.


#7

That’s my most common reason as well, either for my own use of unsafe or for using other better-vetted fns that themselves use unsafe.


#8

Just as an idea to throw in the pot:

#[safe(
    reason = "Data has trivial destructor which is not optimized out",
    alternative = { vec.truncate(0) }
)] unsafe { vec.set_len(0) };

And the #[safe] attribute will, when RUST_SAFETY_OVER_SPEED or some other environment variable is set, substitute the alternative block for the unsafe block. If people used it, that would allow for some good data on what can be improved. (And you could even do automated testing to see if the unsafe performance is actually helping anymore, as well!)


#9

Ostensibly this could be made through #[cfg(safe)] or something like that? (i.e. do you need the runtime switch? ISTM that a compile time switch is sufficient?)

I agree with the general sentiment of #[safe(...)] (bikeshed TODO…) and denoting manual “proofs” in a deliberate manner tho.


#10

To make sure the prime real-estate safe as a crate name doesn’t go to waste, I took the liberty to reserve the crate safe here. Let’s explore the attribute together out of tree in this repo?


#11

Two quick thoughts:

  1. In order for this to be meaningfully different from a comment, there must be a tool that reads these and does something with them, right? Maybe an extension to cargo crev, since you’d presumably want a whole-crate review in addition to a review of any specific unsafe block.

  2. safe is a uniquely problematic name, even for a strawman, because if “safe unsafe code” is a legitimate thing to say, that makes the meaning of unsafe even more subtle than it already is. We definitely don’t need a serious bikeshed right now, but I’d like to suggest “sound” / #[sound()] for code that doesn’t cause UB and “unsound” for code that does cause UB (so “all safe code is sound”, “unsafe code may be sound or unsound”, “unsound code is always bad”, “unsafe code is fine if you can prove it’s sound” would all be true, which seems reasonable to me).


#12

The intent was for RUST_SAFETY_OVER_SPEED to be checked at compile time. E.g. to compile that to something along the lines of

if cfg!("RUST_SAFETY_OVER_SPEED") {
    { vec.truncate(0) }
} else {
    unsafe { vec.set_len(0) }
};

On naming bikeshed: I like safe as the crate name (that’s what it’s about, safety of the code), but #[sound] for the attribute, to avoid accidentally making the safe/unsafe split even more subtle. (Unfortunately the sound crate is taken.)

So just to bikeshed a starting naming: library attribute safe::sound, and the tool is the safe package binary (probably exposed through cargo safety or something) and/or integrated with cargo crev.


#13

I assume “something along the lines of” was intended to look more like

#[cfg(RUST_SAFETY_OVER_SPEED)]
    { vec.truncate(0) }
#[cfg(not(RUST_SAFETY_OVER_SPEED)]
    unsafe { vec.set_len(0) }

Which was what I had in mind when talking about cfg tests - but I was still thinking more about the discovery phase; that some putative tool could see a pattern like the above and infer it as an instance of the doc-comments pattern, with an unsafe block and a safe equivalent gated on some condition.

(Oh, perhaps your version is still compile-time because cfg! is constant and the compiler optimises it away, even though it looks like a run-time check?)


#14

This is true, the Rust compiler will optimize it away.


#15

Yes, it’ll optimize to straight-line one way or the other; the advantage of using if cfg! though is that both sides are typechecked.

And yes, the intention is for the #[sound(..)] to be written for the ease of discovery, and just expand to that to make a consistent pattern for utilizing the effort people put into providing two ways to do the same thing.


#16

Neat.

To be abundantly clear, in this portion I was talking about code that might already be written before whatever conventions are established: we could go searching for patterns like the above as additional automated information-gathering, beyond just finding all the unsafe {} in a big long undifferentiated list.

For establishing new conventions, I like #[sound(..)] a lot!