I started to type out all the logic explicitly and understood that safe reads fit there too, so there isn't a benefit in comparison to the case where reads are unsafe too.
(Assuming NonZero gives a safety guarantee and val gets marked as unsafe)
EDIT: Still I have a strange feeling about the fact that a function without unsafe inside can give a safety guarantee, in the context of already having unsafe fields. My motivation was to bring unsafe there, as a way of "exposing" the invariant. But it won't have as you can just remove the unsafe read and just type 0.
Option<NonZero<T>> is guaranteed to be compatible with T, including in FFI.
Which rules out correctness guarantees. Since if that was merely a correctness guarnatee, using it in FFI would be unsound.
Earlier in the documentation it also says:
This enables some memory layout optimization. For example, Option<NonZero<u32>> is the same size as u32:
Which also implies soundness guarantee. So while it doesn't explicitly state it, it is implied in the documentation that this is a soundness guarnatee.
I don't follow how size and layout are related to the guarantees about behavior. One could imagine NonZero::new(42u32).get() returning 0 while still having Option<NonZero<u32>> be 4 bytes in size. If we know it doesn't return 0 it's not because of size.
It doesn't clarify whether that's a "correctness guarantee" on the return value, or a "soundness guarantee". If we are to take seriously the statements that every guarantee needs such a clarification, then well, almost nothing in std or any other crate does that.
It doesn't clarify whether that's a "correctness guarantee" on the return value, or a "soundness guarantee". If we are to take seriously the statements that every guarantee needs such a clarification, then well, almost nothing in std or any other crate does that.
I think the standard library (for types/functions in core or alloc) is a bit special in this regard, most guarantees unless stated otherwise are soundness guarantees. And we can rely on the documentation being correct for soundness. (this is the burden of being std, and a blessing for all of us users of std).
If you don't want to take that, then you can't even rely on <u8 as Mul>::mul(x, y) being the same as x * y for soundness. (note that for primitives, multiplication doesn't go through the Mul trait, it is open coded).
But even here we have types like Vec, which explicitly document implementation details to give more guarantees to users, and allow types like typed-arena to be sound.
For instance, core::marker::UnwindSafe is explicitly not a soundness guarnatee, but a correctness guarantee.
Note, however, that this is not an unsafe trait, so there is not a succinct contract that this trait is providing. Instead it is intended as more of a “speed bump” to alert users of catch_unwind that broken invariants may be witnessed and may need to be accounted for.
-- from docs of UnwindSafe
For things in std, but not core or alloc, things are a bit more murky since they deal with OSes and FFI, so things which Rust can't really make guarantees about.
I would say it's just correctness, and I would not make that new_unchecked function an unsafe fn.
Remember that even if your prime numbers were used in a security-critical way (say in RSA), that's not a memory-safety-soundness condition.
It's not UB to upload your bank password to pastebin. It's not UB to transfer all your bitcoin to a hacker. It's not even UB to give someone a fatal dose of radiation. Those are "just" correctness properties.
To step back a bit, the only reason to make something a memory-safety guarantee is because it's needed later in a proof of non-UB. It's u32: Div<NonZero<u32>> that wants the safety promise of non-zero-ness to be able to call unchecked_div in std::intrinsics - Rust, for example.
Unless you are guarding something that would be UB otherwise on the primality of the number, you don't need it to be a # Safety precondition, just correctness.
I find this a very strange, counter-productive recommendation. I would never not mark that function unsafe.
My reasons have nothing to do with memory safety (although it helps with that too -- somebody might need prime numbers for memory address computations in a hash table or whatever).
The reason I'd never do that is that by exposing a safe new_unchecked I've just thrown out encapsulation, type-based correctness out of the window. As a user of the number theory crate, when I see a Prime, I want to know that no matter what I do (barring language or library UB, obviously), that is going to be a prime. Correct by construction. Make invalid states unrepresentable. I want there to be no way to create a Prime that is not a prime.
When a user of my crate shows me his binary that produces a Prime { p: 9 }, and I can see that his binary doesn't have any unsafe blocks, I want to instantly know: aha, obviously my crate has a bug.
By not marking new_uncheckedunsafe, you've just gotten rid of this whole ability to perform type-based reasoning about correctness. To debug a Prime { p: 9 } I now have to study both the library crate and the user crate.
D'oh, good catch, I meant to switch the order of NonZeroU32 and NonZeroU64 here.
The larger point I wanted to make is that to understand whether a std type's invariant can be relied on for soundness or correctness, users have developed the muscle memory to hopelessly click-through and navigate a tree of links. For better or worse the contracts are not documented in standard/predictable locations, but are often diluted and need to be pieced together into a big picture. I suppose this isn't an argument against private unsafe fields per se, but just a reminder that they don't solve the problem of invariants being misunderstood.
That said, I agree that the majority of the time (if not always), the field invariant is going to be a subtype of the type invariant, so reads are safe.
That's a very good question. In a world where robustness is a thing, my suggestion is: "To avoid increasing the burden of unsafe reviews, it is important that items are not documented as robust unless it is known that a proof relies on them." Your 2 following questions depend on your answer to that first one.
The thing is, we don't live in a world where people document robustness (at least not at scale, since apparently@ddystopia does it). In practice, unsafe code relies on correctness guarantees (since those are the only documented guarantees outside unsafe traits). In doing so, they take the responsibility to make sure the thing (they did not write) is actually correct, either by reviewing it and pinning it, or by trusting the author (like those of the standard library).
Suppose I go with that and so don't claim that Prime is "robust". So you think that in that case I should just go ahead and publish Prime::new_unchecked(n) as a safe constructor?
This goes against all software engineering principles. It defeats the whole purpose of encapsulating types with private members in a module. The whole reason Prime has private members is so that the library can enforce the primality. The type Prime is supposed to wrap an important piece of knowledge: the library certifies that the number is prime.
The only reason I even publish new_unchecked is for performance of some users who want take the responsibility of verifying primality. But by publishing the function I do not mean to destroy the guarantees that the type provides.
Publishing a safe new_unchecked would significantly increase the burden of safe code reviews (and of debugging) because I can no longer trust that a Prime is actually prime anywhere in the codebase.
when creating a software system, build it in a way that incorrect usage is impossible, or at least impractical
unsafe seems like a perfect tool for indicating "impractical" pre-conditions required for maintenance of private type invariants that can't be verified by the library. If you say there are two levels of pre-conditions, and unsafe is too strong for Prime::new_unchecked conditions, we'd need another "weaker" level of unsafe(weak). Otherwise I would have no way of enforcing type invariants other than just not providing new_unchecked at all, which I don't want to do since some users want it for performance (primality checks are very expensive!).
I think the argument that providing this unsafe function "increases the burden of code reviews" is bogus. Users can always use the safe version if they don't want to burdened with reviewing unsafe blocks. Having to review an unsafe block may be a reasonable price to pay for performance, similar to how using [T]::get_unchecked may be a reasonable price to pay for performance.
Not necessarily. This is a semver question. It depends if you want the option of providing robustness within the same major version, or if you're fine bumping the major version.
I believe you're giving too much credit to unsafe. It is only meant for soundness purposes, not correctness purposes. I agree that some people (like me and you apparently) value correctness as high as soundness (one of the reasons I don't think the unsafe mental model is so useful).
It is legitimate to want new_unchecked() callers to know that they should only call it with prime numbers, however unsafe is not the tool for that. Not if it's only about correctness. If a caller violates this correctness requirement, then the library won't provide any correctness guarantees in return. And that's it. It's not the problem of the library author (of course as long as the library remains sound).
Suppose I see a Prime(91) come up and I'm trying to figure out how in the world that happened. Normally I'd blame the number theory library and look for undefined behavior in user code, but now I have to manually figure out which functions have correctness preconditions in their docs and search the whole codebase to see if somebody called any of those functions, with no help from the compiler?
It's very valuable to know that a bool can only be true or false when I'm not messing with unsafe code, and I want the same kind of guarantees for Prime.
What you suggest seems to go against the long-standing consensus in Rust that unsafety is for memory safety risks only. We do not use unsafe to mark things that are "suspicious" or "logically inconsistent" or anything like that, we only use it to mark things that can cause Undefined Behavior. That's why, for instance, Ord is a safe trait. It also means that BTreeMap must be memory safe under arbitrary (safe) nonsensical Ord implementations. This approach has served us well, it avoids "watering down" the signal value of unsafety and it leads to code that is robust in the face of logical errors (see also e.g. this analysis).
The same as in every other language: just write a type that makes it impossible to construct illegal values. If for some reason you need a constructor that bypasses the check, give it a clear name.
True that other languages also don't have language mechanisms for this, but seems like a wasted opportunity in Rust given that it already has unsafe. Perhaps a solution could be to have different variants of unsafe: unsafe(memory), unsafe(primality), ...
If I were to follow this advice, what it would usually mean in practice is that I don't provide Prime::new_unchecked at all. Seems like a strictly worse outcome than providing an unsafe API that users are free to ignore.
Or I could come up with an excuse: some rare use case where primality is actually used for memory safety. This way I'd have an excuse to provide an unsafe API, and as a side effect improve usability by both having the API and having compiler-checkable primality guarantees for all other scenarios.
Say I define an enum:
#[repr(u32)]
pub enum Color { Red, Green, Blue }
For memory safety reasons Color can't be created with other values, so e.g. mem::transmute to Color is unsafe. But as a side effect of that, Color is much more usable in scenarios where I don't have unsafe code at all because I don't have to think about the possibility of invalid values (unlike in C++ or Python where I do have to consider that possibility).
In that case, you should make new_unchecked() unsafe. Because by not "thinking about the possibility of invalid values" in other places, it might be the case that it would be unsound to have a non-prime number. That said, those places should probably be marked unsafe too, so it should be possible to go over them without too much effort. But I can see how it makes sense to not bother and force users to use an unsafe block and put the blame on them.
I think you have reduced usability that way -- now everyone using this constructor has to ensure primality on penalty of Undefined Behavior. Overall, you made things more fragile.
I think this is a false parallel. For unsafe, we have a very clear and hard rule for what that mechanism is about. For the kind of invariant you are talking about, things are a lot more fuzzy, and it becomes much harder to say whether something should be "unsafe" or not. The amount of damage that can come from a violated logic invariant can differ wildly, and there's a big risk of ending up with a mismatch of having one side cause a logic invariant violation because that's not so bad and another side relying on the logic invariant for something super critical.
UB gives us a clear line here. The mechanism was never meant to be used for anything else. Maybe a similar mechanism could be useful for something else, but that's getting into the territory of new language design -- and I cannot remember seeing any language even try this, indicating that either it is very hard or it is not very useful.
To illustrate why a similar syntax to unsafe but for correctness instead of soundness would hurt more than help, one can look at u32::add():
/// Adds 2 numbers.
///
/// # Unspecified
///
/// If the addition would overflow, the behavior is defined to either panic
/// or wrap around, but which one it is is unspecified outside debug mode
/// (where it is defined to panic).
pub fn u32::add(x: u32, y: u32) -> u32;
This is a safe function (no unsafe keyword) but it has a correctness requirement. If this requirement is violated, then the function replaces its correctness guarantee (which is to return the sum) with a fallback (which is to panic or wrap around in a partially specified way).
If we were to have a special syntax to remind users that adding 2 numbers may overflow and they shouldn't do it, then it would be extremely painful to do very common things like i + 1.
(Note that the real u32::add is defined to either wrap or panic. The result is never an arbitrary number. I guess you are hypothesizing an alternative universe where the spec for that operation is a bit different, but this is easily confusing. That said, your general point still stands.)