Unsafe categories

I have been tangentially involved in a trial which was meant to try to introduce Rust in an old and large project which has regular audits for certain components and code paths.

The trial was concluded recently with a "lessons learned" meeting, and the auditors wondered if it's possible to annotate unsafe blocks to constrain what can be done within it. There was a specific case in the trial where an unsafe block was used to access members in an union, and the question was if it's possible to put constraints on the code block to explicitly disallow any other types of unsafe features.

I occurred to me after the conversation that this might not even be as useful as one might think, since some unsafe features may potentially be expressable using other features, which would make the strict categorization (for the intended purpose here) much less useful.

In their hypothetical example they had put an unsafe_allowed = "enum_member_access" (or something along that line) attribute on an unsafe block. Is there already any such feature, are there any plans for one, or has it already been rejected?

I think they found the idea of constraining certain types of operations within unsafe-blocks to be appealing to their auditing process, but on sober reflection I can't help think it's a little odd -- these auditors are quite happy to tell you that you can never trust a compiler (or even hardware..) to do what you tell it, so I'm not sure how such an unsafe constraint would be trusted within that frame of reference.

1 Like

I personally feel like it would be nicer to just add the ability to use the unsafe keyword in a smaller scope than a whole block, namely, on each individual unsafe operation.

In terms of syntax, a possibility for each unsafe feature, i.e. each of

The following language level features cannot be used in the safe subset of Rust:

  • Dereferencing a raw pointer.
  • Reading or writing a mutable or external static variable.
  • Accessing a field of a union , other than to assign to it.
  • Calling an unsafe function (including an intrinsic or foreign function).
  • Implementing an unsafe trait.

could be

  • Dereferencing a raw pointer
    let x: *mut u8 = …;
    let old_value = unsafe *x;
    unsafe *x = new_value;
    
  • Reading/writing a mutable/external static
    unsafe FOO += 1;
    let foo = unsafe FOO.clone();
    
  • Accessing a field of a union
    let some_union = …;
    let f = some_union.unsafe field;
    
  • Calling an unsafe function or method
    let x = unsafe f(42);
    let y = foo.unsafe bar(baz);
    
  • Implementing an unsafe trait already has fine-grained unsafe annotations. You cannot suddenly dereference an unsafe pointer in an unsafe impl Trait for Foo {} without using unsafe another time.

Note for clarification: In the proposed syntax above, the unsafe always only allows the single operation it applies to. For parsing, consider unsafe to bind to a single succeeding token-tree, nothing more: Here’s some pseudo-code using parentheses to clarify precedence: (not valid Rust code under the proposal!)

let x: *mut u8 = …;
let old_value = (unsafe *)x;
(unsafe *)x = new_value;

unsafe FOO += 1;
let foo = (unsafe FOO).clone();

let some_union = …;
let f = some_union.(unsafe field);

let x = unsafe f(42);
let y = foo.(unsafe bar)(baz);

In a chain of unsafe methods, called on a mutable static, returning a pointer which is dereferenced, could thus write

let x = unsafe *unsafe FOO.unsafe bar().unsafe baz(unsafe qux());
// FOO is a mutable static
// the * dereferences a raw pointer
// bar, baz are unsafe methods
// qux is an unsafe fn

instead of

let x = unsafe { *FOO.bar().baz(qux()) };

If foo and bar are unsafe fn, then unsafe foo(bar()) won’t work, only unsafe foo(unsafe bar()) would. With this approach, there’s no accidental or non-obvious additional usage of unsafe language features like it’s possible in an unsafe {} block. I.e. a reader of unsafe { foo(bar()) } not immediately notice that there’s two unsafe functions being called here.


This feature would – of course – come with an opt-in/allow-by-default lint to disallow unsafe {} blocks (and I guess, that lint would/should also imply unsafe_op_in_unsafe_fn).

7 Likes

Personally I don't think such a feature would pull its weight, since the nature of each unsafe block (or operation) should be pretty obvious. Any unsafe block where this feature would possibly catch an unwanted operation is an unsafe block that is too large.

If anything, such an annotation would IMHO only potentially make sense on a much larger scale: files, modules or whole projects.

4 Likes

Wouldn't such fine-grained requirements in addition to annotating each unsafe usage with an explanation end up looking like this in practice?

let x =
    unsafe * // SAFETY: deref justification
    unsafe FOO // SAFETY: static mutables, yay
    .unsafe bar() // SAFETY: frobbed the latch above
    .unsafe baz( // SAFETY: we have a lock elsewhere
        unsafe qux(), // SAFETY: fun
    );
1 Like

I guess, if you like it, that’s an option. Note that today, unsafe {} blocks can be sub-expressions, too

let x = unsafe { foo() } + unsafe { bar() }; // where does the `SAFETY` go?

I guess, it would also be okay, to put the // SAFETY comment before the entire statement.

IMO still

// SAFETY: justify all 5 unsafe usages
// either one-by-one, or by providing an argument that explains how
// all of their safety conditions are met at once.
// (Or something in-between i.e. group a few unsafe operations, etc…)
let x = unsafe *unsafe FOO.unsafe bar().unsafe baz(unsafe qux());

is better than

// SAFETY: justify 3 or 4 unsafe things, forget about the rest,
// nobody will notice during review anyway, unless they look up
// every single function to double-check if it’s `unsafe`,
// and check every static if it’s mutable, and every pointer type
// that gets dereferenced for if it’s a raw pointer type
let x = unsafe { *FOO.bar().baz(qux()) };

You could even still use a block (just not an unsafe block) if you want to group stuff, e.g.

// SAFETY: …
{
    …some unsafe operation;
    …another safe operation;
    …some unsafe operation;
    …another safe operation;
}
1 Like

I'd like to affirm steffahn's perspective here - It has always seemed strange to me that unsafe applies to blocks, such that unsafe { foo(bar()) } gives no immediately-legible indication of whether bar is safe or not. (And if you want to be explicit about only doing a single operation inside the unsafe block, you have to say { let temp = bar(); unsafe { foo(temp) } }, which is dreadfully verbose.)

I don't have an opinion about whether it's worth the churn to fix, but it definitely seems like an unfortunate wrinkle in Rust's rigor about requiring unsafe things to be explicit.

3 Likes

I think your best bet for now is probably a linter like clippy. It can't do exactly what you want here, but it provides a number of lints relating to unsafe. An important one is always having a // SAFETY: ... comment to go with each usage of unsafe.

But perhaps you could also contribute some optional lints to clippy to constrain what you're allowed to do in unsafe code throughout your program. You could optionally allow some of the behaviors you're constraining in certain places with an annotation.

3 Likes

FWIW, IDE plugins such as rust-analyzer and IntelliJ Rust are capable of highlighting individual unsafe operations within an unsafe block. Not fully a replacement, but an available tool.

2 Likes

A related previous conversation:

2 Likes

One nice thing about rust-analyzer is that it highlights exactly which operations in an unsafe block are unsafe using semantic tokens. For example in case of unsafe function calls it highlights the function name. I am not sure if you need to configure a color for the unsafe token modifier or if it is red by default though.

Heh, I use IntelliJ Rust a lot and I've never noticed this, which I guess is because I never write unsafe code :sweat_smile:

Possibly related:

1 Like

Oh no, that looks absolutely egregious. That's way too much line noise, it doesn't help with clarity at all.

1 Like

And rightfully so! You shouldn’t ever actually be accessing a static mut with a chain of 2 unsafe methods and call an unsafe fn for a method argument and dereference a resulting pointer, all in a single line.

If e.g. just 1 or 2 of these operations were unsafe, it’d look much less noisy (and safer, so a nice correlation between more unsafety and more noise!):

(1 unsafe operation)

let x = unsafe *FOO.bar().baz(qux());

or

let x = *unsafe FOO.bar().baz(qux());

or

let x = *FOO.unsafe bar().baz(qux());

or

let x = *FOO.bar().unsafe baz(qux());

or

let x = *FOO.bar().baz(unsafe qux());

(2 unsafe operations)

let x = unsafe *unsafe FOO.bar().baz(qux());

or

let x = unsafe *FOO.unsafe bar().baz(qux());

or

etc… [click to show remaining possibilites]
let x = unsafe *FOO.bar().unsafe baz(qux());

or

let x = unsafe *FOO.bar().baz(unsafe qux());

or

let x = *unsafe FOO.unsafe bar().baz(qux());

or

let x = *unsafe FOO.bar().unsafe baz(qux());

or

let x = *unsafe FOO.bar().baz(unsafe qux());

or

let x = *FOO.unsafe bar().unsafe baz(qux());

or

let x = *FOO.unsafe bar().baz(unsafe qux());

or

let x = *FOO.bar().unsafe baz(unsafe qux());

Also, I’m not proposing to remove unsafe {} blocks from the language. If you’re really hitting a case where you can’t avoid lots of unsafety and the noise becomes too much, you can always fall back to unsafe {} blocks.

2 Likes

Then how is it supposed to back a proposal?

Anyway, I disagree that there shouldn't be more than one unsafe operations in one "line", and I don't agree that lines are a good unit of code to measure complexity with. They are a purely syntactic, lexical, extremely low-level property of code.

Your example could be made clearer even without additional syntax. Just declare a new variable for every unsafe operation. In that way, you can put exactly one unsafe operation per block if you insist, which is a way more well-defined and sensible unit of code. In fact I have been doing that in a C FFI wrapper project of mine. There's no need for an additional language feature in order to accomplish this sort of clarity or separation.

I didn’t say so. I’ve in-fact provided lots of (syntactical) examples above with 2 unsafe operations in one line.

are you referring to this statement?

What I mean here be “all in a single line” is nothing but “all in a very short span of code”. The emphasis of that statement was not supposed to be on the word “line”. I deliberately put an unrealistically high amount of unsafe operations in a small piece of code when first creating this example; among other things, with the intention to see that even in such an extreme setting, the syntax is still sensible and reasonably readyble.

In my opinion, if we’d be talking about “real”/practical Rust code, this code example

let x = unsafe *unsafe FOO.unsafe bar().unsafe baz(unsafe qux());

contains a ridiculous amount of unsafe operations. As does this code example

let x = unsafe { *FOO.bar().baz(qux()) };

but the former one at least also looks appropriately ridiculous for what’s the sequence of operations that it describes.


Now you’re proposing to put each unsafe operation into its own line. Something that I don’t necessarily agree with.

I don’t insist on that. My main point is that you currently cannot easily tell how many unsafe operations there are in an unsafe block.

Even if I’d want to put only one unsafe operation per unsafe block, I have got no tooling that helps me enforce this convention. And also, instead of restructuring my code this drastically (inserting extra (unsafe) blocks and extra variables for immediate values everywhere), I’d rather simply have a straightforward way to make visible where exactly the unsafe operations are.

3 Likes

Actually, maybe I don’t need it if I wouldn’t just

but instead actually declare up to two new variables for every unsafe operation.

Then

let result = foo(unsafe bar(baz()));

could be written as

let first = baz();
let intermediate = unsafe { bar(first) };
let result = foo(intermediate);

Edit: Ah, now I’ve focused too much on giving every operation its own line :sweat_smile:; of course

let first = baz();
let result = foo(unsafe { bar(first) });

works, too. So I guess just one new variable per unsafe operation is sufficient after all.

1 Like

and if it was let result = foo(unsafe bar(&baz()));, then to avoid changing the lifetime of the temporary, you'd have to write

let result = {
  let first = &baz();
  let intermediate = unsafe { bar(first) };
  foo(intermediate)
};

(Here's a playground illustrating the drop order.)

2 Likes