Private unsafe fields are a poorly motivated feature

The problem with that is that we know where it ends up: people saying "oh, I'll just use an unsafe block because it's fine even if it doesn't hold" and get into that habit so they still do that even when it isn't fine.

To me it's incredibly important to keep the strict interpretation because otherwise you get a mess.

The argument for "well we also need a danger_will_robinson system" I can agree with, but right now I think it's best done with a naming convention and debug_assert!s, not with unsafe.

3 Likes

Even if I'd prefer a wider scope for unsafe, I see your point. I'm not sure how strong it is, but it's a valid concern.

To give a bit more context, even if I use new-types to strongly enforce invariants, I usually avoid relying on them for anything unsafe and keep using checked constructors and checked methods / panicking methods. The strong invariant enforcement is still helpful to help with reasoning (you don't have to worry about invalid states once constructed) and forward compatibility in case the invariant starts to matter for performance. I feel that the rigor with my coworkers or comaintainers is such that they understand they should never reach for unsafe unless absolutely needed.

The risk of "trivializing" unsafe would be more likely if it was the only way to construct a value. This is not what I'm advocating for. I'm advocating for checked constructors as the default, with unsafe unchecked constructors as an optional extra affordance for advanced use-cases. I usually don't reach for unchecked constructors for manually written code, but use it occasionally when doing codegen and the risk of misuse is unlikely.

The most frequent situation where I reach for an unchecked constructor is when defining const values and the runtime check is not compatible with const. In particular, when defining a string newtype enforcing a regex pattern: the regex crate is not const-compatible, so a const unsafe unchecked constructor + #[test] is useful for defining constants for these types.

1 Like

I don't find this to be a real problem in practice. Do people often call u32::unchecked_add just because it exists and usually works fine? Do they often use Option::unwrap_unchecked just because it exists when they know the value is Some? I doubt it.

unsafe APIs are not usually more convenient than regular APIs, so I don't see why anybody would automatically gravitate to using them for no reason.

4 Likes

Do they call str::from_utf8_unchecked because they don't want the overhead and assume "I'm sure it's fine"? YES.

And the only way to fight that is to ensure that the answer to "but how could it be UB just because it's not UTF-8" is always "yes, it's depended upon over here", similarly for all unsafe.

4 Likes

Sounds like the correct answer shouldn't be "sorry you have to use from_utf8 and suffer from overhead" but rather: here is a better type for you. If they don't want UTF-8 validation for performance, they should be using a type without the UTF-8 invariant, such as Vec<u8> or BString.

3 Likes

The answer to this is - yes. (I also included Result::unwrap_unchecked, but that's basically the same as Option::unwrap_unchecked for this discussion)

I just searched unwrap_unchecked on Github, and found this (minimized so that this post doesn't get too long). A note about my list, I skipped anything related to embedded, code competitions/events (like AOC), or if it looked like there might be any invariants in place that the compiler couldn't infer.

In all cases, there are alternatives (like just calling unwrap) that I believe will have minimal to no perf difference compared to unwrap_unchecked. So they are unwrap_unchecked is just unsafe for the sake of "unsafe == fast".

I could add more links, and I saw a bunch more cases, but I think 5 cases here is enough to illustrate the point.

a lot of links to uses of `unwrap_unchecked` with descriptions and my own commentary
2 Likes

I completely agree -- I proposed Tracking Issue for `ascii::Char` (ACP 179) Ā· Issue #110998 Ā· rust-lang/rust Ā· GitHub for similar reasons -- but you still need "no, it really is UB on str" first to overcome the "I'll just use str because I don't want to pull in a crate" or "I'll just use str because that's what the other things I'm calling takes".


unimportant side note about one of those `unwrap_unchecked` finds

Well, for the

using unwrap_unchecked on the result of usize::try_from instead of just casting

one it's already in unsafe from the offset_from, so what they actually want is https://doc.rust-lang.org/std/primitive.pointer.html#method.offset_from_unsigned rather than needing the conversions -- after all, this is why I added that :stuck_out_tongue: Add `sub_ptr` on pointers (the `usize` version of `offset_from`) by scottmcm Ā· Pull Request #95837 Ā· rust-lang/rust Ā· GitHub

The unwrap_unchecked there is serving like a hint::unreachable_unchecked(signed_offset >= 0), so isn't completely unreasonable, though I do suspect that just as usize would have been completely fine too.

4 Likes

There is a lot of Rust code at this point. I get 6.7 million results for unwrap and 18.2k results for unwrap_unchecked (filtered on lang:rust). That is very approximate, not all results of either will be true hits. The former includes unwrap_or_else for example. Both includes a lot of definitions of functions with this name, not just uses.

Still, that is about 0.2 % of the search results that is unchecked. Not a lot.

4 Likes

Sorry for the essay :frowning: I couldn't figure out a way to condense it without losing info.

TLDR: unsafe is useful because it is rare, and making it less rare by applying it to general invariants/preconditions doesn't actually help ensure correctness.

That is true, however I only used unwrap_unchecked because @tczajka mentioned it.

The larger discussion here is about relaxing what Rust should endorse as correct usage of unsafe. @scottmcm has been arguing for unsafe being exactly only things which can lead to memory-unsafety. @tczajka and others have been arguing for relaxing this to allow more general invariants.

Since I mentioned @scottmcm, let's take a peek at his example (from_utf8_unchecked). Just raw numbers:

  • from_utf8_unchecked (search) - ~72k hits
  • from_utf8 (search) - ~470k hits

This is fairly damning. About 13% of all usage of both from_utf8_unchecked and from_utf8 is from_utf8_unchecked. This is why making unsafe about arbitrary invariants is so bad. For str there is good reason to make this invariant (just see the perf difference between str::chars and bstr::ByteSlice::chars), but even then we still have people abusing unsafe when from_utf8 would probably be fine.

So yes, it is true that unwrap_unchecked (link to github search, about 15k at the time of writing) is used very rarely compared to unwrap (link to github search, about 5.6M at the time of writing). But unwrap_unchecked was never really the problem here (and even then it's still being abused).

I would also like to point out that we have docs on unsafe in std (docs)

Code or interfaces whose memory safety cannot be verified by the type system.

This is very clear about how unsafe should be used. This is a guiding light for most people, and I think this is why unwrap_unchecked usage is so low


So with that said, I am sympathetic to wanting some way to guarnatee that the behavior you see is correct, and have a hard and fast rule on who you can blame for bugs (that's basically what unsafe does after all!). unsafe is the wrong tool, exactly because it should be rare to use.

Using the pain of UB to declare behaviors incorrect is exactly what C++ does. That's why signed integer overflow is UB in C++, same with ODR violations. This doesn't make code easier to reason about. It does the opposite, since with UB you can't even trust that what you see when debugging is actually what's happening.

We can see how Rust does this differently, instead of UB for integer overflow, we panic with overflow-checks turned on and wrap when their off. And we have a robust module/package system to avoid most cases of ODR violations (the only remaining case is unsafe with extern blocks/functions). In both cases, we've made the behavior predictable so that we can debug buggy code without the compiler hiding it's miscompliations.

So the key different that I see with Rust is that we don't need to declare an API unsafe in order to declare that some usage of it is wrong. We can simply declare that the usage is wrong, and design the APIs to prevent the bugs in the first place (module system). If we can't design the API to avoid the bugs, then we will try to detect the bugs at runtime (integer oveflow), and if even that's impossible, then we design types with weaker invariants (bstr).

4 Likes

Clearly UTF-8 is the exception since the validation is extremely expensive and very often unnecessary.

I think that something like unsafe should exist for other categories of dangerous code but that it should have another name. So I don't disagree with that aspect. What I disagreed with was the questionable evidence for "what the ecosystem does". Citing 5 examples (as happened originally) does not make for good statistics. Lack of rigorous statistics (or at least admitting the issues with the data) when making a claim is one of the things that really annoy me.

That said, in you most recent reply you provide much better numbers. What remains would be to try to estimate to what extent those numbers over/under-reports and see if those effects cancel out or not.

3 Likes

That is an interesting and plausible-sounding hypothesis, but two examples (unwrap and utf8) really isn't enough evidence to use the phrase "clearly" here.

It would be interesting to try to categorise the types of unsafe used in different segments of the ecosystem, but that would be fairly large work.

1 Like

It's very different from what C++ does because C++ doesn't have the unsafe keyword and doesn't require unsafe blocks for such unsafe APIs. How we like to use the unsafe keyword is what the discussion is all about.

This example demonstrates the opposite of what you're trying to say. In addition to well-defined safe addition, Rust does in fact also have unsafe signed addition APIs! See i32::unchecked_add.

Nobody is saying to replace safe APIs. We're talking about providing unsafe APIs in addition to regular safe APIs. Just like i32::unchecked_add exists in addition to + and i32::checked_add.

Moreover, note how i32::unchecked_add is unsafe not because the compiler wants to use some sort of hardware undefined behavior in the processor. Machine code addition is all safe code. It's purely to allow the compiler to make assumptions that certain properties hold.

It's exactly about the kind of properties I like to protect in my libraries, the only difference is that i32 is a compiler builtin and my type is not a builtin, and rather than the compiler using these properties for its internal logical deductions, it's my third party library using the properties for its own internal logical deductions.

The often-cited article "What The Hardware Does is not What Your Program Does" talks about this:

Of course, there is a reason for this undefined behavior; there is a reason the ā€œabstract machineā€ is defined the way it is. Compilers don’t just want to annoy programmers. Ruling out operations such as comparison on uninitialized data is useful, because it means the compiler does not have to ā€œrememberā€ which exact bit pattern an uninitialized variable has!

And very analogously I could write a very similar article talking about how "What The Base Rust Abstract Machine Does is not What Your Library Does". How the library is implemented should be completely irrelevant to the users.

My libraries don't just define some operations unsafe because the Rust Abstract Machine doesn't support some operations at a lower level. It's also not there just to annoy users. Ruling out such operations is useful because it means the library then does not have to deal with certain internal states it doesn't want to have to deal with.

And more importantly: it allows the library to make promises to its users it wouldn't otherwise be able to make. Just like Rust the language can make certain promises to its users thanks to certain behaviors being defined unsafe.

When thinking about Rust (or C, or C++), you have to think in terms of an ā€œabstract machineā€, not the real hardware you are using.

And similarly, every library defines its own little "abstract machine" in terms of its API, and it gets to decide what kind of operations the machine supports. What I'm saying is that these don't always have to correspond to what kind of operations are supported by the lower level operations that the library uses to implement its abstract machine.

Just like you don't have to explain Rust undefined behavior in terms of x86 undefined behavior, I don't think I have to explain Prime undefined behavior in terms of undefined behavior of Rust builtin types.

7 Likes

I think this is an idea that only seems reasonable if you concentrate on the core language and the compiler, but doesn't really work if you start thinking about higher level abstractions. Compiler people tend to think of the language as the abstraction. But users of higher level abstractions don't think in terms of "what is allowed in the core language", they think about "what is allowed in the high level types". That's why we're building higher level abstractions. Thinking only about language-level invariants require you to look behind abstraction boundaries and breaks encapsulation.

Circling back to unsafe fields, to me that essentially duplicates the notion of private fields. The reason we make fields private is that we don't want to think about what would happen when they are changed willy-nilly, breaking abstraction boundaries. The moment we say it's not unsafe to change them arbitrarily, we have to start thinking about all possible values they might have, not just the ones that the module sets them to. Same as what we'd have to think about if the field was public.

But let's say there is a weak_unsafe keyword. How would this work for the TwoDigit problem?

Do I declare TwoDigit::new_unchecked weak_unsafe?

pub struct TwoDigit(u8);

impl TwoDigit {
    pub fn new(n: u8) -> Option<Self> {
        if n < 100 { Some(Self(n)) } else { None }
    }

    /// Weak Safety: n must be < 100.
    pub weak_unsafe fn new_unchecked(n: u8) -> Self {
        Self(n)
    }

    /// Returns a digit in 0..10.
    pub fn top_digit(self) -> u8 {
        self.0 / 10
    }

    /// Returns a digit in 0..10.
    pub fn bottom_digit(self) -> u8 {
        self.0 % 10
    }
}

Again somebody might reasonably create this function in a separate crate:

pub fn is_top_digit_large(x: TwoDigit) -> bool {
    match x.top_digit() {
        0..5 => false,
        5..10 => true,
        10.. => {
            // Safety: TwoDigit::top_digit() says it returns a digit in 0..10.
            unsafe { unreachable_unchecked() }
        }
    }
}

A third crate:

/// Weak safety: x < 100
pub weak_unsafe fn is_large(x: u8) -> bool {
    // Weak safety: x < 100 so we can call new_unchecked.
    let t = weak_unsafe { TopDigit::new_unchecked(x) };
    is_top_digit_large(t)
}

But now we have messed up the unsafety levels. Calling is_large(200) is only weak_unsafe but causes language-level UB.

The proper solution would require fixing the documentation of TwoDigit::top_digit:

impl TwoDigit {
    /// If weak safety is maintained, returns a digit in 0..10.
    /// After weak safety is violated, returns a digit in 0..26.
    pub fn top_digit(self) -> u8 {
        self.0 / 10
    }
}

so that is_top_digit_large knows not to assume that the output is always in 0..10.

Now the library and its users always have to consider both scenarios: weak safety maintained and weak safety violated. Just because we used the keyword in one obscure function.

This would be a huge pain. The whole reason I'm making a TwoDigit type is to avoid having to consider numbers with more than two digits. Otherwise I'd just be using u8 not TwoDigit.

People have mentioned multiple times that Ord and BTreeSet do this. They have to consider both scenarios: Ord is a total order and Ord is not a total order. Sure, that's a choice they made[1]. Most other types don't do this sort of thing. It's a lot nicer not to have to do this when I can actually maintain an invariant I want, like I can in TwoDigit.


  1. BTW I think BTreeSet is really badly documented about what happens for non-total orders ā†©ļøŽ

5 Likes

I want to ensure, in my own private methods, that I remember to uphold the safety invariants I defined for myself. Right now, I’m on my own in that regard. In one case, I have a type that implements Sync even though a particular field does not, and the safety invariant for that field is that it is not read outside the destructor. Doing as much as #[derive(Debug)] could lead to UB there.

4 Likes

The point is that it's essentially always true that you have to remember some invariants you're trying to uphold, you can't just randomly be changing your fields, so not much changes. Maintaining struct invariants is pretty much always very important. Any part of your module or user modules might end up relying on them, including in unsafe blocks.

If, as has been suggested, we do separate a weaker weak_unsafe keyword for "correctness invariants" whose violation has to be tolerated everywhere, then pretty much all private fields would have to be marked either unsafe or weak_unsafe. This distinction doesn't exist yet [1], so in practice I'd have to be marking all the fields unsafe in most structs (such as TwoDigit) if I was going to use this feature consistently.


  1. and I don't think it ever will because I think the distinction is unworkable ā†©ļøŽ

If weak_unsafe were added, I would want it to denote ā€œthis code is being handled extremely carefullyā€, not solely ā€œit sure would cause a lot of problems if there were a bug in this codeā€.

If the invariant is so widespread that it’s a hassle to even mark it with a weak_unsafe keyword, then it should be an even greater hassle to verify that all that code doesn’t have bugs that violate the invariant.[1] That is, strictly handling invariants is a hard problem whether or not there’s a keyword for it.

To give more context on where my thoughts are coming from, I’m working on some open-source database software spanning ~40 kLOC, where a single bug could plausibly corrupt gigabytes of user data. However, no way am I making ā€œinvariants that must be upheld to not corrupt all your dataā€ into safety invariants that anyone’s unsafe code should ever rely on, since there’s far too much code for me to treat it all like fragile glass. I’d rather let bugs cause errors, corruption, or crashes and emit log messages instead of triggering UB.

Using something like db_corruption_unsafe might be ideal for that situation, but I wouldn’t actually do that unless I was treating it with nearly the same caution as I would unsafe. I wouldn’t spend the time to do that unless someone paid me for it. Instead, I have a bunch of (visibility-restricted) safe methods which, if misused, can corrupt a bunch of data. I have numerous ā€œsafeā€ correctness invariants on private fields, and I certainly try not to have any logic errors, but I’d rather put effort into testing my code instead of analyzing and annotating the code to show why it’s free of logic errors.[2]


  1. I’m the sort of person who writes thorough SAFETY comments for every unsafe operation, even the ones that are obviously sound, as to give minimal trust to past me and future me. If weak_unsafe or similar were added, I’d do the same. ā†©ļøŽ

  2. I don’t think ā€œall my test cases succeedā€ is sufficient to justify the soundness of unsafe code, no matter how extensive the tests, so I’d treat weak_unsafe similarly. However, extensive tests would presumably be sufficient to justify ā€œyou can feel safe using this database, the risk of it losing data is lowā€. ā†©ļøŽ

2 Likes

That is an interesting use-case actually. Even if unsafe fields did nothing except forbidding the use of derives, there would already be some use in that.

I think there is some misunderstanding going on here. I'm not advocating that anybody should trust the correctness of your DB when calling other unsafe code. It's up to your policy what types, crates, etc you may want to trust in your code. It makes sense not to trust everything. Your DB doesn't even have to trust its own coherence, it makes sense to program defensively.

Having your own unsafe APIs called incorrectly doesn't mean you automatically cause language level UB and destroy the logging. It's not any better if you mark those same APIs "safe" and they get called incorrectly: the end result is the same.

The danger of the DB causing language-level UB comes when your DB calls other unsafe APIs, not when it provides unsafe APIs to its users.

The important thing here is that the unsafe keyword on a function interface means that the callee trusts the caller. Not the other way around. The keyword doesn't modify the amount of trust in the other direction. So if your DB exposes an unsafe function, that doesn't mean the users have to trust you more when calling this function. It goes the other way: it means that your DB gets to trust the user.

Creating unpredictable mayhem is pretty much library-level UB, but the only things the DB can possibly damage are the things it actually touches. If your DB doesn't call any external unsafe APIs itself, nothing it can do (including inside its own unsafe functions) will cause language-level UB, destroy the logger, etc. Just marking a function unsafe doesn't increase this kind of risk.

Since you don't want to trust your DB's coherence, just make sure that whatever your DB does, it doesn't completely trust its own coherence, especially when calling outside unsafe functions, and then everything works the way you describe.

So I'd probably still mark the dangerous APIs with responsibilities on the caller unsafe. But I would have to look at a more specific examples to really understand what you're talking about.

1 Like

So what's the difference between this and what I've been calling "arbitrary safe misbehavior" that makes this unpredictability without unsafe (potentially) okay, but safe Prime::new_unchecked undesirable?

I understand that you'd want to look at more specific examples to decide, but can you think of a loose vibe for what situations you'd use unsafe for or not? Is it based on the simplicity of the function precondition? On the ease of accidentally relying on an invariant? On potential fallout from hitting a bad state? On the benefit of skipping checks? On the documentation burden of allowing bad state? All of the above? Something else?

aside

I apologize if asking this comes off as combative or trying to catch you out in a contradiction or similar in any way. That is not my intent, and I am genuinely asking this from a position of wanting to understand how the usage of terminology influences how people interpret the concepts being discussed. I simply do not know how to phrase the question in a less direct manner.

I definitely use unsafe for more than just the strict minimum. I also use it for type based invariants that are both "simple" enough that I can imagine a dev (likely me) using it in an unsafe proof later on and "complex" enough that bypassing the checks of the safe API is reasonably desirable[1]. But I'm also constantly aware that I use unsafe more freely than I probably should, debug_assert! what I can, and eagerly await stabilization of cfg(ub_checks).


  1. My rule of thumb is that using unsafe needs to reduce the asymptotic complexity to be meaningful if there's a safe alternative. (Exceptions apply.) ā†©ļøŽ

1 Like