Detect and Fix Overscope unsafe Block

lwz · October 7, 2024, 9:03am

In actual development, I often see unsafe blocks of code that are abused. For example, in one version of the Smallvec library there is a function that looks like this:

pub fn grow(&mut self, new_cap: usize) {
        unsafe {
            let (ptr, &mut len, cap) = self.triple_mut();
            let unspilled = !self.spilled();
            assert!(new_cap >= len);
            if new_cap <= self.inline_size() {
                if unspilled {
                    return;
                }
                self.data = SmallVecData::from_inline(mem::uninitialized());
                ptr::copy_nonoverlapping(ptr, self.data.inline_mut().ptr_mut(), len);
            } else if new_cap != cap {
                let mut vec = Vec::with_capacity(new_cap);
                let new_alloc = vec.as_mut_ptr();
                mem::forget(vec);
                ptr::copy_nonoverlapping(ptr, new_alloc, len);
                self.data = SmallVecData::from_heap(new_alloc, len);
                self.capacity = new_cap;
                if unspilled {
                    return;
                }
            }
            deallocate(ptr, cap);
        }
    }

It obviously uses an unsafe block more than necessary, putting the entire function in an unsafe block. This can be disruptive to subsequent developers, who are unsure what part is truly unsafe. It would be better to use an unsafe block only to the smallest extent, using several small unsafe blocks of code, rather than an entire unsafe block that contains all functions. I want to try to implement a feature that can automatically detect (or, one step further, fix) such situations. Can rust-analyzer do this now, and is it necessary to add this feature?

tguichaoua · October 7, 2024, 11:36am

You have the clippy::multiple_unsafe_ops_per_block lint that can help you detect that.

afetisov · October 7, 2024, 12:26pm

IDEs can already trivially mark all unsafe operations in your code. That "unsure what part is truly unsafe" in a non-issue in practice, unless you don't use an IDE (in which case you'll have plenty of other issues, like non-obvious type inference).

Unsafe blocks should not be used to mark unsafe operations. They should be used to mark blocks of code which share specific safety preconditions and postconditions. Which can be as small as single operations, sure, but usually they are not. Typically several unsafe (and somtimes safe) operations are used as a tightly knit block which can't be meaningfully taken apart for reasoning purposes.

Which mean that you can't do this kind of transformation automatically. The only thing you can do is restrict unsafe blocks to single operations, which is useless in practice, since it doesn't tell you anything about the relations between the operations, duplicates already existing information, and actually breaks any specified relations between operations.

Sure, this specific block from SmallVec looks too big. For starters, the initial operations look totally safe, so there is no reason to put then inside of unsafe block. But that doesn't mean that just mechanically downscoping the blocks would somehow make the code more maintainable.

lwz · October 7, 2024, 12:29pm

Thanks, I read this lint. But I'm not sure if lint will work properly in cases where there is only one unsafe operation in an unsafe block of code. Consider the following code:

unsafe {
let value: i32 = 42;
let ptr: *const i32 = &value;
let double_value = value * 2;
println! ("Double value: {}", double_value);
let dereferenced_value = *ptr; //only one unsafe op here
println! ("Dereferenced value: {}", dereferenced_value);
}

I am not sure whether this lint is able to handle this situation.

rne · October 7, 2024, 12:34pm

I mostly agree with @afetisov (not regarding the IDE, though ).

The actual issue I see with this code is the missing // SAFETY: ... annotations, which would communicate to the novice developer what safety considerations have been made.

Vorpal · October 7, 2024, 1:21pm

There is a clippy lint for that: clippy::undocumented_unsafe_blocks. I wish it was on by default.

lwz · October 7, 2024, 1:57pm

I think what @afetisov said makes sense, perhaps it would make more sense to trace the parameters of each unsafe operation and put them into the same unsafe block as possible?

lwz · October 7, 2024, 2:02pm

I understand that //Safety may be useful in this situation, but //Safety can sometimes go wrong(In fact, the grow method itself is a Rust Bug-RUSTSEC-2019-0009,RUSTSEC-2019-0012). I have seen some cases where //Safety was written because the developer didn't think it through. So perhaps it is more maintainable to try to put both the pre - and post-conditions about the same unsafe operation into an unsafe code block.

afetisov · October 7, 2024, 4:31pm

I'd like to push back on the "novice developer" part. Safety annotations are needed just as much by experienced developers as by novice ones (novice developers should avoid unsafe Rust anyway). The compiler cannot verify the proof that your code is valid, so programmers must do it themselves. How would they do that if there is no written proof anywhere?

Even in simple cases safety annotations help to ensure that no condition was forgotten (which occasionally happens even for easy cases), and helps to review any future changes.

In fact, in my experience it's the expert devs which tend to obsessively annotate all unsafe blocks.

Not that it's important for the proposed feature, though.

tczajka · October 7, 2024, 4:42pm

I think Rust is supposed to be usable without an IDE. I often look at Rust code without an IDE. doc.rust-lang.org generates links to pure text source code on a website. Code reviews are often done on text diffs, not in an IDE.

I always prefer minimal unsafe blocks. I'd much prefer if individual function calls could be directly marked.

Every unsafe function call should be checked to see if the function's preconditions are satisfied as documented by the function.

afetisov · October 7, 2024, 4:48pm

docs.rust-lang.org have all necessary information to mark all unsafe operations in its code listing. It's purely an implementation issue. And I would never do a code review of a complex PR on a diff, and most stuff dealing with unsafe code falls in that bucket. Regardless whether unsafe operations are marked or not, there is just too much important context which is impossible to check in a simple diff.

Sure, that's an option. And I know some examples of code which does that. In practice, most of the unsafe code examples are more like the one quoted in the OP. You're lucky if the unsafe blocks are reasonably scoped and have any safety comments at all.

Merely mechanically splitting up unsafe blocks wouldn't improve that kind of code in any way.

rne · October 7, 2024, 5:06pm

Sorry, I worded that badly. I was referring to programmers that are new to the codebase.

tczajka · October 7, 2024, 5:15pm

If you believe that there should be safety annotations with proofs, then splitting up the unsafe block does help. For example, the already mentioned clippy::undocumented_unsafe_blocks will trigger for each block missing a safety comment. If you put a bunch of unsafe calls in a single block and forget to prove preconditions of one of those calls, the clippy lint won't notice the missing proof.

afetisov · October 7, 2024, 7:09pm

I think it's a non-sequitur. But let me rephrase it this way: the fact that an operation is safe doesn't mean that it doesn't have unsafe pre/postconditions or doesn't rely on external unchecked assumptions (e.g. the classic Vec::set_len, or the closure pinning debacle). With your approach, safe code should be unconditionally moved outside of unsafe { }, which actually inhibits properly documented reasoning.

Just as some safe operations would be better placed inside of unsafe { }, so should individual operations sometimes be merged in a single block.

tczajka · October 7, 2024, 7:12pm

I don't understand this example. Only unsafe operations have safety preconditions. Vec::set_let is an unsafe operation because it has safety preconditions (as documented in the docs).

afetisov · October 7, 2024, 7:17pm

Mutating the field itself isn't, though. If a bare field mutation is present in code, do you think it should be outside of unsafe { }? Or would you rather wrap all unsafe field accesses in getters and setters, even within the implementing module?

tczajka · October 7, 2024, 7:27pm

unsafe is an API concept. It defines contracts between abstraction boundaries. It only makes sense to use unsafe when calling some unsafe API from outside the abstraction boundary. Thus I wouldn't use unsafe for code that just internally maintains invariants without calling any external unsafe APIs. Vec::set_len is marked unsafe because it's an external API. If it was a private helper function it wouldn't need to be marked unsafe.

afetisov · October 7, 2024, 8:02pm

I.e. instead of some sort of separation between low-level unsafe plumbing and the high-level pretty interface, which usually can be implemented entirely in safe code, you propose to roll up the entire module, with possibly multiple submodules, into a single possibly-unsafe ball of mud? I can't see how that is a win. You're abandoning Rust's compile-time safety enforcement exactly when it's needed most: in the implementation of some complex component.

I guess you could still try to enforce some internal API boundaries, e.g. extracting the type's definition and some core methods into a private submodule, but that's not usually the way that code is structured.

Even so, your unsafe code is likely to depend crucially on the correctness and specific behaviour of the safe code that you call, like in the PollFn example I linked. Imho it's best when such critical safe operations are also included in the unsafe { } block, because an error in them can cause unsoundness or worse.

scottmcm · October 7, 2024, 9:33pm

TBH, I think we need tooling to better help computers help double-check that humans at least considered everything, even if it can't check the proof itself. (Though stuff like kani being able to check the proof would also be nice!)

And my hope is that in doing that, we do it in a way that does allow big-scoped unsafe blocks again, because it would allow "proving" the thing once, but in a smart way.

Like if you have a local let p: *mut SomethingCopy = ...;, then there shouldn't be a need for two unsafe blocks if you need to *p = 0; then *p = 1;, because the safety considerations are the same. But today there's no way to talk about *p's without also letting you *q, so I've been thinking we need some sort of symbolic thing where you could "pass" variables to it, and have those connected to the preconditions of whatever you're calling.

For example, we could have ptr::write have a writable(p) predicate that links back to the same rules as https://doc.rust-lang.org/std/ptr/index.html#safety, then if you blah.write(x); the tool could check that you had a #[safety(writable(blah), reason = "..."] on the unsafe block, and that'd let you write to that pointer in the block, but not do other unsafe things in the block until you also "proved" those things for the other things you try to do.

(And the tool wouldn't have to know what the writable predicate actually means, just match up the property and variable names from the call and the callee.)

tczajka · October 7, 2024, 9:42pm

I would have to see some example. I disagree that I proposed anything remotely close to what this sounds like. I'm in favor of the exact opposite.

If you want a "separation between low-level unsafe plumbing and the high-level pretty interface", then modules are exactly the right tool for the job. Modules are the Rust way of creating such abstractions. I don't understand why you wouldn't want to structure the code that way, that's exactly how I normally structure my code. Again, an example would be useful.

I'm not. Exactly the opposite. If you want compile-time enforcement for some abstraction, you have to put it in a module that only exposes the right APIs with unsafe markers where appropriate. If you don't put it in a separate module, you're not going to get compile-time safety enforcement.

Topic		Replies	Views
How rustc handle unsafe Rust Unsafe Code Guidelines	2	207	September 25, 2024
Allow safety checks inside unsafe functions Unsafe Code Guidelines	5	1201	December 22, 2024
Overflow checks and unsafe code Unsafe Code Guidelines	10	1819	March 25, 2019
Split keyword `unsafe` into `unsafe` / `checked` language design	8	767	June 2, 2024
[Pre-RFC] Fixed-capacity view of Vec libs	51	10540	April 27, 2019

Detect and Fix Overscope unsafe Block

Related topics