Add rustc flag to disable mutable no-aliasing optimizations?

Yes, I would strongly suggest that. And let's be real. Rust doesn't even have a reliable picture of its abstract machine now. ^^

Okay. Well take for example the skeleton support for Rust that was just added to the Linux kernel. Let me ask you something. Do you think that every mutable pointer passed into Rust code there will be unaliased? Or is it C, and it's going to have pointers to that chunk flying around everywhere? Even if they go through manually and ensure that none of them alias, some dude is going to introduce a commit on the C side that breaks this, and it will be a subtle bug that will torment users for a year and a half until someone figures it out. Do you expect them to use raw pointers for everything in kernel drivers? If so, most of the point of Rust is lost.

I'm sorry, but I disagree. It can and should. Let's step for a moment back into the void, back into C-land.

Say you compile some library with aggressive strict aliasing optimizations, *restrict pointers are everywhere, but your code uses -fno-strict-aliasing.

You have a sea of objects with properties and data that can be pointing almost anywhere. Because you're not watching the control flow (it's just too complicated, who knows where that pointer came from?), you don't notice that you're passing the same pointer twice, to a function that takes only 2*restrict pointers.

What do you think will happen? Nothing good. Maybe nothing, depending on the code. Maybe a subtle bug that tortures you for months.

It's the same here. Unsafe code already comes with a responsibility and an intrinsic risk. You can blow your whole leg off. If you're passing aliased &mut pointers into Rust crates, you deserve whatever you get.

But there's another, rather easy solution here that you're not considering.

The problem stems from crates interacting badly with different aliasing rules.

So, just patch cargo too. Require an explicit option in Cargo.toml, only valid for the top-level crate (error out if it's only detected in a dependency), and which enables -fno-strict-aliasing on every dependency for that crate, whether they want it or not. Problem solved!

Oh, you're worried about std and core? If it's a problem, maybe just unconditionally build them with -fno-strict-aliasing. I doubt any performance lost there will be enough to really care about. Or, or, just have cargo download and build std and core like every other crate if that option is detected. There's probably other solutions there.

What I'm trying to do is remove undefined behavior in unsafe code. I'm not trying to create more UB or confusion or bugs, I'm asking for a compiler switch that will prevent them. That is what Rust's real purpose is.

What's the point of the borrow checker? To prevent accidental human error. What's the point of this flag? To prevent accidental human error, by disabling the rule that makes it an error in the first place.

Even if the compiler doesn't optimizes based on it, a large chunk of all code depends on &mut never aliasing. Take for example memcpy. Even with -fno-strict-aliasing it is UB to pass aliasing pointers as the memcpy implementation is free to copy bytes or larger chunks in whatever order it wants. If the source and destination alias, copyinh a chunk to the destination may overwrite the source in arbitrary ways.

Without this rule inside the compiler it is still an error. Crates depend on it and as such you would still have to follow the rule. Having this flag increases the risk of accidental human error by making it more likely that someone will just ignore the aliasing under the assumption of it being fine.

Disabling mutable no-alias optimizations does not and can not fix your code most of the time. It will only hide UB for the current version of the compiler, standard library and your dependencies. A change in any of these may break your code at any point.

7 Likes

I'm sure some crates do depend on that. And again, if so, that's your bad for passing in an aliased mutable reference. That code will fail regardless of the option. But in your own code, where most of this stuff actually occurs, it could alleviate a source of worry. Maybe you could hold it wrong and it ends up getting into a dependency, but that's your fault, and it's part of the risks of unsafe code that you should have expected anyways. For the vast majority of cases where you would actually use it, -fno-strict-aliasing would prevent bugs, not create them.

You're misinterpreting what the actual rules are.

Initially, every pointer passed into Rust code from C starts out raw, and at the top of the borrow stack for its target memory location. Rust can then freely derive a &T or &mut T reference from it without invalidating anything, regardless of how mutable and/or aliased it is on the C side.

The only way creating such a reference can invalidate anything (and the subtle point that people are getting upset about) is when there are already other things above the pointer on the borrow stack- and those are what gets invalidated. To hit this problem, you have to:

  1. Create a reference from a raw pointer, pushing a new entry onto the borrow stack.
  2. Keep that reference around (or other references or raw pointers derived from it).
  3. Create another, conflicting reference from the first raw pointer (or another one equivalent to it), invalidating the stuff from (2).

But this is nothing more than a special case of the normal &T/&mut T rules - that is, creating the new conflicting reference can invalidate raw pointers... but only if they were derived from a reference that just got invalidated!

3 Likes

You know, I've never been able to get a straight answer anywhere as to what the rules actually are. Everyone I talk to has a completely different idea of what the rules are. I have no idea who to believe. If that's true, that's good, but it still opens up a whole can of worms of potential UB. You can remove the problem from your own crate entirely by disabling the optimizations. But, currently, you can't. Rust doesn't have the option. It doesn't mean people want to create a ton of aliasing &mut everywhere. It's a safeguard in case you do, and don't notice. And frankly, with the rules the mess they are, I don't think it's a bad idea to have that insurance.

Yes you have, they have been stated several times above. @steffahn has laid out the RustBelt rules quite clearly.

You mean RustBelt, the competitor to Stacked Borrows? I thought Stacked Borrows was the one Rust used. You see the problem.

While I don't think Rust should have a -fno-aliasing flag, I think the rules aren't that clear and they're far, far too subtle. It's way too easy to shoot yourself in the foot with Rust, as demonstrated by issue #80682: slice::swap violates the aliasing rules, which tcsc linked to above. I think a lot of people will write code that accidentally hits the exact same issue. Unfortunately I don't have proof of this, as I haven't done a wide audit or investigation. But it's certainly something I've done in the past (and being honest, it's probably something I'll accidentally do again in the future).

3 Likes

RustBelt produced stacked borrows.

Sure, they should be written down somewhere more official if they haven't been already. But that doesn't mean that they haven't been clearly laid out in this thread.

Well, you're not wrong that the rules aren't fully defined yet. Rust is still young, so the process of defining them is still ongoing.

But on the other hand, steffahn has given pretty reasonable descriptions, and I linked you to a bunch of Stacked Borrows stuff, which is where that work is happening. In fact one of those links was to the Miri tool, which will automatically check your program for you! It's not hopeless.

I also don't really see any conflicting claims in answer to your questions, just people talking about different aspects of the same general set of rules. The confusion about RustBelt vs Stacked Borrows is an example of this- they're not competitors, they're just different research projects done by basically the same groups of people at different points in time.

I mean, this is the XY problem. The OP's proposed solution is wrong, but that doesn't mean they haven't identified a flaw in Rust as it stands today.

-fno-strict-aliasing exists because C's aliasing rules suck. There are all sorts of useful things that are literally impossible to do without violating them, such as making a local byte array and reinterpreting it as a struct type. Vendor extensions like __attribute__((may_alias)) lessen the problem, but they're not universally available and are somewhat broken to boot. And even if it's possible to follow the rules, they're very hard to understand, and there are no tools to validate that you're doing it correctly. No wonder people just want to just opt out. Even at the cost of fragmenting the C ecosystem; but then, the C ecosystem doesn't practice code reuse nearly as much as Rust anyway.

Rust is in a better situation in some ways. Type-based alias rules don't exist at all, unless you count borrows only giving permission to a certain number of bytes depending on the type. And for the aliasing rules that do exist, there is an automatic checker.

But in other ways our situation is as bad as C:

  1. Miri doesn't catch all Stacked Borrows violations.
  2. Stacked Borrows is as hard or harder to understand than C strict aliasing.
  3. Stacked Borrows makes some code patterns fairly difficult. Generally speaking, it requires increased use of raw pointers where they wouldn't be needed just to get the code to compile.

Combining 1 and 2, just today, even as a semi-expert, I made a mistake and thought something illegal was legal because Miri allowed it. Or maybe I didn't make a mistake. Who knows?

Here are corresponding things we can do to improve the situation (none of which are new ideas, some of which are probably being worked on already):

  1. Keep improving Miri, with the goal of making it able to catch all Stacked Borrows violations (possibly only in a slower-than-default mode).

  2. Document Stacked Borrows better, with lots of examples.

  3. Add UnsafeAliasedCell or some way to mark specific types as allowing concurrent &mut references. Similarly, add a way to mark a struct as being able to have container_of soundly used on it.

    After all, this should give the OP effectively what they want, a way to opt out of certain aliasing assumptions for their code, without needing to create an ecosystem-splitting language dialect.

13 Likes

I must admit I still think that -fno-strict-aliasing is a good idea, but I'm clearly outnumbered.

It seems, however, that -Cllvm-args='--enable-scoped-noalias=false' does disable most of the problematic optimizations. I still have to make sure not to pass questionable references to anything, but this will have to do. Of course, I'll use UnsafeCell if I am deliberately trying to have shared mutability.

@tcsc You might find this flag useful.

Here are corresponding things we can do to improve the situation (none of which are new ideas, some of which are probably being worked on already):

I don't disagree with these, but I think there's a fourth option: requiring a simpler aliasing model.

I suggested a very rough one above where references only invalidate the provenance of raw pointers while the reference exists (this should still allow a great amount of alias-based optimization, but probably not full llvm-noalias), and is essentially equivalent to the requirement we've historically told unsafe code it must uphold, e.g. "if you have a &mut T or &T you must not access the data in any way that contradicts it during it's lifetime" guidelines that we've documented in the past.

Particularly I think the fact that -Zmiri-track-raw-pointers has to be disabled by default because too much code is broken under it, should be a big red flag that the way miri and SB handles raw pointers is far too restrictive.

I also think it should be considered a breaking change in the language to decide that widespread patterns are UB (and especially to optimize against that UB), this happened once in the past with things like mem::uninitialized::<[u8; N]>() (which used to be considered well-defined so long as you didn't read from it, and now is considered UB), and I worry it will happen for whatever aliasing model we choose, since it's likely to be stronger than what historically we've used.

Similarly, add a way to mark a struct as being able to have container_of soundly used on it.

This one in particular probably just has to be fixed by adjusting the model, since the most common offender here is stuff like allocators, which have to work on all types. Also, if you could modify the type, you could just put the other stuff at the start anyway. There's a lot of other stuff that's broken by forbidding this too.

Thankfully, AIUI even for a very strict aliasing model, it's likely not fundamental, you can see some discussion about this here and on: Storing an object as &Header, but reading the data past the end of the header · Issue #256 · rust-lang/unsafe-code-guidelines · GitHub

s/prevent/hide and your statement is correct. -fno-strict-aliasing in gcc does not make it any less undefined to violate C or C++'s strict aliasing rules. It just simply extends the compiler to assign specific semantics. You're code is still fundamentally broken, you've just hidden that fact away and made it seem like your code has defined behaviour. If you forget that flag, or switch to a different compiler, then you will still see the result. The same would be the case of such a switch in rustc. The problem still remains, you've just blinded yourself to it.

Also, consider that there are wider implications then merely preventing an optimization. With RFC 3016, implementations would be able to issue diagnostics for undefined behavior in const eval. You'd have to consider whether the flag would affect those diagnostics as well, and, if it ever proceeds to the originally proposed state, how that would be reconciled. In lccc (linked above), there would be a fundamental incompatibility between disabling &mut unique optimizations and issuing mandatory diagnostics for cxeval &mut unique violations (as both operate on the same IR, using the same information, specifically, the pointer validity attribute unique).

I never viewed -fno-strict-aliasing as UB as much as I view it as the compiler volunteering to give it defined behavior. You are then not getting the defined-ness from C at that point, but rather the compiler. I don't know of any mainstream C or C++ compilers that don't support -fno-strict-aliasing, therefore in practice, it is defined behavior, and will do what you expect, if you use -fno-strict-aliasing. I have both written code and seen others' code that blatantly violates aliasing rules and is better and/or more performant for it. I know these people don't see it as hidden bugs, they see it as a necessary evil.

1 Like

I like this a lot. If it would actually happen, I would fully support it. I am a firm believer that such aliasing models are much more trouble than they're worth, and should be opt-in rather than opt-out.

I mean, if people really cared about defined behavior and usefulness, there would be no aliasing model, but that 5% performance improvement is just too juicy for people to resist.

Be careful about making broad generalizations. I care very much about these things, and believe that at least some kind of nonaliasing guarantee is vitally important. Without it, I'd have a really hard time convincing myself that certain routines will work properly under all circumstances.

3 Likes

Can you give me an example? I have a hard time believing that. If you had to explicitly enable the aliasing model, that would be the best option I think. Then people like you who desire it could use it, and those of us who did not, would not be forced to.

| Subsentient
April 8 |

  • | - |

I never viewed -fno-strict-aliasing as UB as much as I view it as the compiler volunteering to give it defined behavior. You are then not getting the defined-ness from C at that point, but rather the compiler. I don't know of any mainstream C or C++ compilers that don't support -fno-strict-aliasing, therefore in practice, it is defined behavior, and will do what you expect, if you use -fno-strict-aliasing.

MSVC at least does not support that particular flag. IDK if it has an equivalent.

I have both written code and seen others' code that blatantly violates aliasing rules and is better and/or more performant for it. I know these people don't see it as hidden bugs, they see it as a necessary evil.

I see it as something I'm now forced to support because people refuse to do defined things (like union type punning in C, bit_cast in C++). It also limits what I can detect in cxeval, as it uses the same information for that. Thankfully, it's impossible to violate strict aliasing in C++'s constexpr through other means, but if it weren't, that would force me into a situation where I cannot reconcile the obligations to the standard, and the obligations to the user via the flags. This is the same as my second concern. How would you resolve that, given those diagnostics and optimizations use the same information?