Extending the unsafe keyword

Oh damn, i totally overread that. Yes that would also be a way i'd be completely fine with, since it also eliminates what annoys me the most.

1 Like

I think you and @kornel are ascribing a different meaning to unsafe blocks from what they actually mean. To me they specifically mean "I request permission to invoke unsafe APIs here". They don't mean some vague information to the reader "warning, the correctness of this code is important for safety".

So to me, there is no grey area, no balancing act: putting safe code in unsafe blocks is similar to putting mut on variables that don't need to change: a useless permission request. It exposes your code to unnecessary risk.

A lot of safe code can be important for safety elsewhere. For instance, if HashMap had bugs, it would probably break somebody's unsafe assumptions.

4 Likes

The compiler doesn't need unsafe to exist to invoke unsafe things. This concept and this syntax exists solely for humans, so I am looking at it from perspective of how useful it is for humans, not just its raw implementation technicalities.

So you're overlooking a lot of code that can cause issues. Creation of pointers and pointer arithmetic is "safe" in Rust. Changes to vec.len are "safe" in the Vec's module. These are also dangerous constructs that need special attention when reviewing code for correctness.

We don't have a dedicated keyword for "technically this is not invoking the few special unsafe abilities directly, but it can break those things indirectly, so you have to pay special attention to this too", but extending the unsafe block to cover these things works well enough.

2 Likes

Moreover (and without searching for contemporaneous discussions over the reasoning), the language design currently (granted this is the topic of the present discussion) requires that humans do this by means of entering a block rather than escaping individual expressions, which supports the notion that more than just specific unsafe expressions should fall within it.

Yes, I'm broadly on the same page as you on this: it was, as I said, a bit of a fleeting thought (and I'm conscious it's getting a bit OT for this thread). However it need not be mutually exclusive: the module could be marked unsafe (to alert programmers to the fact that safe code within can violate memory safety and therefore greater care must be taken than normal) whilst still requiring that unsafe expressions appear within unsafe blocks.

Though now I think of it, perhaps a better approach would be to mark fields (eg Vec's len field from your earlier example) as unsafe such that any access of them must be within an unsafe block.

I'm not overlooking it, my whole point was about such code.

I claim that marking code that can cause issues if buggy but that doesn't use unsafe APIs is not the job of unsafe blocks. Using an unsafe block for this purpose would be actively misleading!

Instead, you can say something like:

// CAREFUL: We need to maintain invariant XYZ.

For example, it is very important for maintaining str invariants that str::from_utf8 actually validates that the bytes are UTF-8. Does this mean all this validation code should be in one big unsafe block? No! It doesn't need to be, it shouldn't be, and it isn't.

2 Likes

There are benefits to both tighter and looser scoped application of unsafe blocks, and neither side is meaningfully "more right" than the other; rehashing the argument isn't going to do anyone any benefit.

That said, I do believe both sides would agree that the better way to write from_utf8 is /* validate */; unsafe { from_utf8_unchecked(v) } and not unsafe { /* validate */; from_utf8_unchecked(v) }. The difference is primarily in how you handle oddball APIs that have not only unchecked preconditions but also postconditions, such as str::as_bytes_mut which requires that the accessed buffer is well-formed UTF-8 again by the time the borrow lifetime expires.

1 Like

The proposal doesn't call for eliminating unsafe blocks, so even if one wants to use larger than necessary unsafe scopes in oddball cases for some reason, the proposal doesn't prevent it.

This doesn't feel "unsafe", but something like #[safety_sensitive] might read better.

For the specific case of writing "C with Rust syntax" where there are a larger number of related unsafe properties being maintained, I find a larger unsafe scope is legitimately useful. For one routine I've written that initializes a slice in-place from an iterator, I found it nicer to write

fn bad_len() -> ! { /* â€Ļ */ }
move |ptr| unsafe {
    let mut this = Builder::new(len, ptr);
    for _ in 0..len {
        let Some(item) = items.next() else { bad_len() };
        this.push(item);
    }
    let None = items.next() else { bad_len() };
    this.finish()
}

than to use tighter unsafe, e.g.

fn bad_len() -> ! { /* â€Ļ */ }
move |ptr| {
    let mut this = unsafe { Builder::new(len, ptr) };
    for _ = 0..len {
        let Some(item) = items.next() else { bad_len() };
        unsafe { this.push(item) };
    }
    let None = items.next() else { bad_len() };
    unsafe { this.finish() }
}

and this is a nicely structured[1] routine using a local helper type to get full cleanup if the iterator panics. If the helpers aren't factored out like this, the difference between coarse "this block does unsafe things" and finer unsafe scopes is even bigger, because the unsafe cause shifts from logical[2] to incidental[3].

I bring up this example specifically because in more idiomatic Rust where there's less unsafe being used, tightening unsafe is genuinely beneficial. Organized like this with the helpers, this example is definitely on the edge of being better with the fine unsafe scopes; if Builder weren't an entirely local helper type I'd already prefer this form. With an "item unsafe" syntax, it could be

fn bad_len() -> ! { /* â€Ļ */ }
move |ptr| {
    let mut this = Builder::unsafe new(len, ptr);
    for _ = 0..len {
        let Some(item) = items.next() else { bad_len() };
        this.unsafe push(item);
    }
    let None = items.next() else { bad_len() };
    this.unsafe finish()
}

and I think that sugar pushes this example over into preferring the fine unsafe scope.


  1. To be fully fair, it was written before MaybeUninit got its more useful API helpers, so some of it might be able to be improved now. ↩ī¸Ž

  2. "ptr is correctly allocated," "push no more than len times," and "finish only after pushing exactly len times" ↩ī¸Ž

  3. using raw pointers ↩ī¸Ž

2 Likes

Assuming the general view that every unsafe block should be accompanied by a comment explaining why this specific is usage is safe, an unsafe without such a comment can be assumed to implicitly mean something equivalent to

// SAFETY: is left as an exercise to the reader
unsafe { ... }

With that interpretation, there is a clear difference between marking individual operations as unsafe versus using an unsafe block: The first invites the reader to consider the safety of each operation individually, and the latter instead draws attention to what the block as a whole is doing.

Surely the best approach depends on the specific case, since the proof of safety is, by definition, not understood by the compiler. If multiple unsafe operations rely on the same invariants (e.g. derefs of the same pointer), an unsafe block seems appropriate. If there are unrelated aspects of unsafe, that could be one reason to use a more fine-grained split.


I'm liking the idea of expanding the set of things that can be marked unsafe if the principle used is that it always applies to the one operation that follows (where a block is "an operation" composed of many, so is the only way to apply unsafe to more than one operation).

One usual argument against eliding braces is that it creates potential confusion about what the exact scope is, but that doesn't apply to unsafe since it (a) doesn't affect runtime sematics, and (b) applying to a smaller scope than intended can only add compiler errors, not remove them.

1 Like

I'm genuinely curious: why would anyone want to do that? It is not like you are new to the language (being a rustc developer!), so I expect you have a pretty good reason for it. But apart from in the bindings layers of FFI (or HAL for embedded) I can't see a use for this myself, and those bindings should be as small as possible for safety reasons (so I would argue tight unsafe is good there).

It's a more niche application, and most of the time you're better off sticking to a more idiomatic dialect, because it's easier and safer to work with and generally gives more then good enough results due to Rust's very predictable resource consumption. And "C with Rust syntax" itself exists on a sliding scale. Generally it's not as extreme as the moniker might suggest — I still use the various API design tools Rust has that C doesn't — but it's characterized by a lower level of abstractions and a willingness to deal in raw pointers.

To me, there's two main reasons I find myself using this. The first is perhaps more legitimate — using references means complying with all of the reference invalidation rules, and sticking to the world of raw pointers, despite being more nominally unsafe, is actually simpler to make sound. Additionally, if it isn't being made into public API surface, it can be easier to just do a bit of code duplication and keep a few more ambient invariants around rather than try to design a nicely factored safer API underneath the actual API surface you're building out.

The second is perhaps more abstract — I enjoy exploring the absolute edge of what's expressible in Rust, and that fundamentally means flirting with unsafe crimes. The snippet I provided above is in support of custom compound DST types, e.g. struct { head: Data, tail: [Item] }; Rust already supports working with these once they exist but creating them fundamentally requires manual allocation and piecewise initialization, which is a comparatively "C" concept.

Orthogonalizing any unsafe reasoning is always a goal I keep in mind, but sometimes for internal usage making it more safe is fundamentally much more complex than just doing it the unsafe way. Whenever possible I do try to debug_assert! any assumptions and/or write a safe proof of concept of an API (with tracking overhead) before going about eliminating that (borderline negligible) overhead. I even have a (yet unpublished) crate for "unchecked locks" whose entire purpose is to be UnsafeCell in release but use locks in debug mode to panic and report the unsoundness if the lock would ever block (including loom support) just to justify my sometimes brazen disregard for safety.

The joy in recreational Rust development for me is in taking nominally unsafe techniques and trying to design a reasonably usable safe and zero additional overhead API around them. In chasing "premature" space or time optimization opportunities at a low building block level where tiny wins have the possibility to be magnified by widespread application, even if that's never practically going to happen.

Nit: while in fairness I have landed code in rustc, it's been mostly obvious in terms of actual code changes and more heavy on the justification (e.g. the lint for unstable syntax before macro expansion was sitting effectively just commented out until I enabled it). My actual rust-lang/rust contributions have been mostly stdlib, design, and reporting issues found while abusing the edges of the language. (Although I have ~three potential pet initiatives I've considered attempting implementation for which would shift that balance[1].)

So I guess yes but I'm not quite willing to fully accept that label just yet, as my work has been adjacent to "proper" rustc dev work.


  1. ?DynSized, Unsize friendly ptr_metadata, and adjusting how macro_rules! metavar resolution works to make $$crate more useful and other predefined $keyword metavars possible (unfortunately absolutely no chance of making the edition2024 cutoff even given immediate lang team buyin). ↩ī¸Ž

3 Likes

Honestly between the 'mut' argument and this, it seems almost like what is wanted is that actually an unsafe block gives exactly one superpower: the ability to use postfix 'unsafe' to mark individual operations as checked.

That is probably too much boilerplate, but I think this view highlights the problem here: the block is being used for two things

  • To mark operations as "we checked the preconditions of this operation"
  • To mark blocks as "this is the block to check"

The fact that both are done at once creates the "accidentally used an extra unsafe function" risk of larger blocks.

In principle, with both required, making a previously safe function unsafe is itself perfectly safe. Everyone will have to update their calls and so have an opportunity to check the new precondition.

But that is substantial boilerplate.

Possibly, if things were being designed from scratch, it might be better to have something like "unsafe should always be in a block with a safety comment, but is syntactically setup to apply to exactly one operation".

1 Like

My understanding is that having a one-to-one mapping between unsafe operation and unsafe block is already possible with the multiple-unsafe-ops-per-block and unused-unsafe lints. The undocumented-unsafe-blocks is also the other aspect of what you were mentioning. All of this is documented in the unsafe mental model book in the recommended lints: Unsafe reviews - Mental model for unsafe in Rust

1 Like

specifically unsafe fn means "if you use this wrong you can get undefined behavior", whereas a safe function containing an unsafe block should never invoke UB for any usage from safe code. (unsafe code can always invoke UB)

This might be slightly off-topic, but are there any plans/discussions about extending unsafe in a way (or using a different keyword) to indicate that regardless of how you call this function it will not end up in undefined behavior or memory safety issues, but special care must be taken to use it correctly.

Something like

/// # Safety
/// Guide on how to use the functions this is used on correctly
// 
// Using a unit/marker struct is probably the easiest
pub struct SomeIdentifier {}

unsafe<SomeIdentifier> fn hello() {}

unsafe<SomeIdentifier> { hello() }

SomeIdentifier could either be any type, or something that implements a special trait. (unsafe<u32> wouldn't make much sense for example).

Maybe with a memory identifier making these two equivalent: unsafe = unsafe<memory>, thus allowing the following if the developer/auditor has to look at both memory safety and something else (like the call order between functions or whatever): unsafe<memory, SomeIdentifier>

Biggest issue I can see with that are:

  • the risk of over-use
  • perhaps problems with syntax parsing
  • Adding a bit more complexity to the language, which likely wouldn't be used that often but also isn't something everyone needs to know/understand. With this syntax it would be like making the unsafe keyword generic over an arbitrary amount of identifiers/types (though they fundamentally differ of course).
  • Searching for memory un-safety wouldn't just be a grep unsafe, as that also returns the other places you need to have a deeper look at (which I'd argue is a good thing)

I think this would cleanly solve the issue of people (ab)using unsafe for non-memory-related issues, effectively add the existence of a # Safety block in the docs into the function signature and make use of the existing unsafe functionality without breaking it, without adding too much overhead from what I can tell.

I don't know how to draw any line for this, really. Does reading a file need special care because it might be outside the directory you expected and leak information? Does HTTP need special care because you might be uploading your password to pastebin instead of your password manager? Does implementing Ord need special care because you might be binary searching for something you need to actually find?

The simple thing here is to make the function name longer for something that needs to be used carefully. Rather than unsafe<SomeIdentifier> fn foo you can fn foo_some_identifier. So it's not obvious to me that specific extra syntax is worth it overall.

Today's unsafe is distinguished because if violated you can't trust what you see -- telemetry might be wrong, checks that you see might not happen, etc. That's very different from correctness issues, where the code might be wrong but at least you can debug it, check logs, etc.

5 Likes

It seems to me what you want is perhaps a capability system? If so take a look at GitHub - bytecodealliance/cap-std: Capability-oriented version of the Rust standard library

But it isn't terribly clear what exact use cases you imagine for this. Some concrete problem statement or use case would help.

2 Likes

Maybe I would agree if unsafe were a sparse thing. In my experience, if a project needs unsafe, it usually needs a lot of it. I don't think it really stands out when unsafe is littered all over the place.

I don't really like this characterization I often see that unsafe should be verbose to stand out, or to deter people from using it. Sadly, I use unsafe because I have to, not because I want to. Unsafe code is too verbose, it is too hard to read, and that is not a good thing.

I've had one occasion (project internal only, not in a public library crate) while learning Rust where wanted something like that. I think I ended up rewriting it to a type-based state machine (probably for the better, but more annoying to use). At the time I wished for something like this, but today I'm not sure if it would've helped much.

The issue of over-use is certainly a problem, and it would be hard to decide when this should be used, true.