"Tootsie Pop" model for unsafe code


#34

Out of curiosity, are there any other ‘action-at-a-distance’ (ish) examples like this in rust where code can affect the performance of completely unrelated code just by existing?

I wouldn’t be surprised if inlining thresholds were subject to such action-at-a-distance.


#35

Also rust is supposed to be a high performance language and unsafe blocks are supposed to be for people who know what they are doing. I wouldn’t want to stop optimizing whole modules to add a slight amount of hand holding to unsafe blocks.

I disagree with this sentiment. People who know what they’re doing ought to know enough to know that unsafe blocks are a measure of last resort and the importance of abstractions in cordoning off the unsafe blocks from outside influence. Meanwhile, the people who don’t know what they’re doing may very well be lured into believing that if they don’t modify code that is directly in an unsafe block then they can’t cause any unsafety, but this is provably untrue if they’re modifying a module that contains an unsafe block elsewhere.

The only alternative would be to make it so that privacy is not required for enforcing unsafe boundaries (as per the unsafe fields RFC, though I don’t know if this would be fully sufficient), but in today’s Rust, modules are very much the boundaries of unsafe code, and have been since long before this blog post.


#36

But the idea that I can’t add a single unsafe block to my whole file without deoptimizing the entire file? Ouch.

I don’t see why this would be the case, a submodule should be enough to cordon off the unsafe block behind a confidence boundary.


#37

Not only do proposals like “unsafe fields” correctly put the emphasis back on the type with fewer inhabitants, but they also conveniently make functions like copy_drop unsafe even in the module where those fields are in scope.

I agree that if we don’t want it to remain the case that modules represent the boundary of unsafe code then we’ll need to reconsider the unsafe fields proposal. However, 1) I don’t know if unsafe fields alone are necessary to revoke the importance of the module boundary wrt unsafety, 2) until all unsafe Rust code is rewritten to use unsafe fields, we would need to be exactly as conservative as Niko’s proposal here when optimizing modules that contain unsafe code.


#38

So the rule would at least be “a mod which defines a struct with private fields and contains unsafe anywhere inside of it is an unsafe abstraction boundary”.

@glaebhoerl I’m definitely intrigued by the idea of finding a more precise formulation of the conditions that necessitate a module-level unsafety boundary. However, how does this rule deal with Niko’s “usize-transfer” example in the OP?


#39

One thing I’ll note is that consume_from_usize() is not only undefined behavior according to C, but is actually disallowed by hardware that has been built. This is noted in http://www.cis.upenn.edu/~stevez/papers/KHM+15.pdf :

While the language definition provides an integer type uintptr_t that may be legally cast to and from pointer types, it does not require anything of the resulting values [5, 7.20.1.4p1].

C standard, 7.20.1.4p1:

The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

intptr_t

It does not require the result be dereferenceable in any way - it’s certainly permissible to set a flag bit that makes dereferencing it trap, or optimize away comparisons but turn dereferences into an unconditional error at compile time.

As for hardware, CHERI (A MIPS based on the BERI implementation) is a capability-enhanced ISA designed according to the rules of C, on which conversion of an integer to a pointer would result in a capability to memory of zero length - one which can be compared to other capabilities, but not dereferenced to any memory.

As a result, it seems to me that any unsafe boundary broader than the module would have the sole benefit of permitting behaviors that are very likely to be broken by lower levels of the compiler and high-security hardware anyway.


[off-topic] uintptr_t round tripping in C
#40

One problem I have with this model is that it is formulated in terms of breaking and restoring invariants.

It is certainly more interesting case of unsafe, but is it common case of unsafe? I think common case is not breaking invariants at all. Examples are FFI and unchecked indexing.

I consider “indexes must be within bounds” to be an invariant of Rust, though that just so happens to be an invariant that Rust can only enforce at runtime. As for FFI, it seems to me like we must assume that C code running in-process can do literally anything it wants, including breaking invariants, yes?


#41

The idea that calling my optimized assembly language function is going to, by default, de-optimize my Rust code makes zero sense in that scenerio.

I don’t know how to feel about calling this “de-optimization”. Rust doesn’t want to repeat the insanity of C compilers wrt optimizations and UB, so obviously it can only make aliasing optimizations where it can actually prove that aliasing does not occur. If you use unsafe code, then Rust can no longer necessarily prove that pointers don’t alias. This is not the fault of the proposal being discussed here, it’s just the state of things on the ground here today.

This hasn’t mattered much until now because we’ve leveraged aliasing relatively little for optimization, but it’s going to increasingly matter going forward. If you disagree with the proposal here for how to handle these future optimizations, then please propose an alternative, although I’m having a hard time thinking of an alternative that will have all the following properties: preserve correctness, retain backwards-compatibility, and permit raw pointers to be assumed to be unaliased by default.


#42

Most of the other uses I have for unsafe are all “I have to do something unsafe because that’s faster than the safe way” (e.g. using core::ptr::copy_nonoverlapping until clone_from_slice performance is fixed.) The final set of uses is “I’m actually trying to make things safer but the language won’t let me without unsafe,” e.g. coercing a slice to an array reference.

I don’t find these examples particularly compelling in this context. It’s perfectly justifiable if one has to resort to unsafe code to work around performance bugs in the standard library, but when it comes to discussing future language features we should assume that the performance bugs will be fixed, rather than hamstringing our future designs to account for past deficiencies.

(This is especially relevant in this discussion, since enabling greater aliasing optimizations in safe code will make it even less likely that your C code will be observably faster than your Rust code).


#43

The problem is that unsafe is overloaded to mean lots of different things. There seems to be at least two kinds of unsafety: one that makes aliasing analysis go wrong and one that doesn’t. If the Rust compiler cannot figure out which is which, then there should be two kinds of unsafe marker to differentiate them for the compiler.

In my experience, most of the code outside of libcore/libstd that uses unsafe seems to be the kind that doesn’t cause trouble for alias analysis, so having a way to tell the compiler that so that the compiler can continue to optimize aggressively seems important.

BTW, it really would be de-optimizing. Imagine that the Rust compiler’s optimizer improves as is planned. You write your code in “safe” Rust, which gets 100% optimized with the improved optimizer. Then you add your unsafe block in an attempt to implement a very localized optimization, which then causes your safe Rust code to be de-optimized.


#44

Most of the code in my project is actually written in assembly language (~50,000 lines of it), not C. (There is lots of C, but none of it is used for performance; I just haven’t replaced it with Rust code yet.) It would be many, many years before rustc could even hope to optimize the equivalent Rust code to the same level as that assembly language code, in general. (The best C compilers get nowhere close.) Thus, the idea that we won’t need to use the FFI (or asm!, which is also unsafe) for improving performance in the foreseeable future is wholly unrealistic.


#45

Out of curiosity how does all this compare to Fortran?

I’m not versed with all the low level details of this topic but AFAIK pointer aliasing was one of C’s “mistakes” compared to Fortran which does not allow it. Fortran compilers supposed to produce more optimal code because of this. Fortran integrates with assembly code so how does it insure its invariants? Does it also have UBs like C?


#46

If the Rust compiler cannot figure out which is which, then there should be two kinds of unsafe marker to differentiate them for the compiler.

I believe this is exactly the sort of thing that Niko is proposing at the end of his post, with ways to opt back in to aliasing optimizations if you are certain they apply.

You write your code in “safe” Rust, which gets 100% optimized with the improved optimizer. Then you add your unsafe block in an attempt to implement a very localized optimization, which then causes your safe Rust code to be de-optimized.

I still don’t consider this a de-optimization. You’re assuming that unsafe Rust should be expected to be uniformly faster than safe Rust, but I don’t see how that’s founded. Optimizers in general have always been able to perform better with the presence of restrictions, and removing those restrictions removes avenues for optimization. That unsafe Rust, which allows strictly more operations than safe Rust, should be inherently slower than safe Rust is both natural and intuitive to me.


#47

I agree with that too. In fact, I am a huge believer in that. But, consider:

unsafe {
     asm_function_that_is_4x_as_fast_as_rust_eqiv(a.as_mut_ptr(), a.len(), c);
}

My point is simply that code like this shouldn’t reduce the optimizations that are done elsewhere, just by its presence.

(Note that I’m not assuming asm_function_that_is_4x_as_fast_as_rust_eqiv is faster; I’ve actually measured it to be so.)


#48

If you pass a bad index to unchecked_get, then, yes, you are breaking Rust’s invariants. However, when people call unchecked_get, they at least expect to only pass a valid index.

If that expectation holds, all the invariants that hold in safe Rust hold in Rust code with unchecked_get, and so the optimizer does not need to take any extra care. This is different from e.g. creating multiple mutable pointers to the same value, which must not be UB by itself, but still can’t be done in safe Rust (and therefore requires the optimizer to take extra care).

FFI is the same - if your FFI is only interacting with Rust code through scalars and raw pointers, the optimizer does not need to know about it.


#49

If you just stick that code in the middle of a function, the compiler would be forced to assume that your assembly code can possibly stash the pointer you gave it somewhere and access it at will, and therefore will be unable to perform some optimizations.

If you put the call behind a private, safe function, that problem would not exist.


#50

[quote=“arielb1, post:49, topic:3522”] If you just stick that code in the middle of a function, the compiler would be forced to assume that your assembly code can possibly stash the pointer you gave it somewhere and access it at will, and therefore will be unable to perform some optimizations.[/quote]

We should find some way to annotate the code to indicate it doesn’t do that. Because, the vast majority of the time, it doesn’t.

This call is in a safe function already. That’s why the unsafe is required. I guess you’re saying that if I create an otherwise-useless wrapper function that just calls the unsafe function, the compiler can somehow assume that the asm code doesn’t “stash the pointer.” But, surely it still can stash it, and the wrapper function doesn’t do anything to make that impossible.

Regardless, it would be better to find a simpler and more convenient way to annotate unsafe blocks as being aliasing-friendly than by creating these wrapper functions/modules.


#51

The problem is that pretty much every call that uses the result of as_mut_ptr requires the compiler to make additional assumptions about aliasing - in safe code, the compiler would be allowed to move accesses to just after the call to as_mut_ptr as the call to your asm function could not possibly change it.

Because you want to forbid that while permitting other optimizations, you should wrap an interface rustc can understand around your assembly function:

fn asm_is_fast_on_this_one(a: &mut [u32], c: usize) {
    unsafe {
        asm_function_that_is_4x_as_fast_as_rust_eqiv(a.as_mut_ptr(), a.len(), c);
    }
}

OTOH, if we don’t want do pessimize functions that don’t use raw pointers, like callers of unchecked_get, we would need an additional strategy.


#52

Why not something simpler like this?:

    unsafe noalias {
        asm_function_that_is_4x_as_fast_as_rust_eqiv(a.as_mut_ptr(), a.len(), c);
    }

Where noalias (open to bike-shedding on the name) would mean that there’s no aliasing happening in the unsafe block.


#53

I think we see the same problem but want to solve it in different ways. I want to make the unsafe boundary to be the scope of the unsafe block while you want it to be the module. However most of your points seem to be in favour of my approach.

…people who don’t know what they’re doing may very well be lured into believing that if they don’t modify code that is directly in an unsafe block then they can’t cause any unsafety

This is my point. I want to constrain the unsafey to as small a point as possible.

but this is provably untrue if they’re modifying a module that contains an unsafe block elsewhere.

I don’t think that this is true at all. Try to prove that by adding an unsafe {} to a module will make modifying other areas unsafe.

I see that there are cases where this can be true but I think the correct approach is to attack these areas rather then giving up and allowing off screen code to affect what is being worked on right now.

I would also argue that if you ensure that all invariants hold whenever you exit an unsafe block then you won’t be able to do unsafe things outside of unsafe blocks.

The only hairy bit is calling safe code from unsafe blocks. But I don’t think that is solved by either proposal. However since the root of the problem is in an unsafe block I don’t think that is the worst problem to have. But to get back on topic…

I think that restraining unsafety to unsafe blocks is not a lost cause. While ensuring that this happens can’t be verified by the compiler I think that it fits perfectly with the “you better know what you are doing” nature of unsafe blocks.

Furthermore keeping this boundary ensures that people not working inside unsafe don’t need to concern themselves with these problems even without searching the entire module for the unsafe keyword. I also think that this encourages keeping unsafe blocks to small regions of unsafety that need to be understood as a whole, not often requiring tying that together with nearby unsafe regions.