Discussion: Allowing Unsafe to Bypass Visibility

I absolutely agree. I usually wind up submitting PRs for pretty much every program I use; every time I find any kind of bug, I submit an issue, and then immediately take a stab at fixing it myself. I'm not against forking projects in the slightest, and if anything I'm also not entirely pleased with how my solution would result in fewer PRs.

In this particular instance though, I'm doing something that quite frankly shouldn't be a PR. I don't think the items I need access to should be publicly accessible, since in the 99.9% of cases, having access to it would just cause issues and be more confusing for developers. It shouldn't be popping up in intellisense for anyone.

However, the problem is that it's blanket-blocked without any way to get around it, when there are still 1 in 1000 people who genuinely need that information and don't have any other option.

 

It's like if the Windows Registry were made inaccessible on principle. Yes, for literally 99.999% of users it will only cause trouble if they have access, but restricting access entirely makes 100% of the 0.001%'s needs completely impossible.

 

It shouldn't be easy, it shouldn't be promoted, and heck it shouldn't even be a "stable" feature. But preventing jailbreaking altogether without even the most obscure workaround is not the right move, and that's why I'm so passionate about this.

 

Thank you so much for explaining it! I didn't think about cargo doing silent updates; that explains so much about everyone's responses. I was thinking "just be careful when updating!", but no, that does actually make complete sense. Of course you can bind to a non-minorpatch subversion and have changes happen under your nose. That's my bad.

In that case, stressing that possibility to the user by adding a warning and confirmation pop-up on update would be the best idea.

Ideally, bypassing visibility would be a nightly-only, unsafe feature, enabling the feature would be explicitly stated in the docs to be an absolute last resort, and enabling it would cause warnings and confirmations in cargo when attempting to update due to the potential semver complications, and hell it should probably throw a linting warning on use too like unsafe operations inside unsafe functions. Those last two would further dissuade people from using it unnecessarily, since no one likes build warnings. Overall, this would serve to allow the possibility while making it annoying as hell to access, which is far better than being outright disallowed by the compiler in all circumstances without recourse.

 

I don't like the idea of permanently babysitting a terminal fork, since that adds several dozen layers onto maintaining the project and makes it significantly harder to keep your unsafe changes catalogued as well. I really don't like other declaration-based strategies for the same reason: I wouldn't want to accidentally use a formerly-private function somewhere I genuinely don't need it, but changing the source makes it pop up like any other instead of being a case-by-case basis. Ideally the best strategy would be usage-focused so you know where your changes are and what you need to pay special attention to.

1 Like

If you maintain a fork to make something pub then "all internal changes" on your side should merely consist of a few additions of pub, in which case just rebasing your patch on top of a new upstream should not be that much more work than reading the crate's code after a change, which you'll have to do anyway to make sure that your way of meddling with its internals is still ok.

No, you may end up getting silent corruption because the requirements of the surrounding code changed. Those may not even be documented because it's internal code.

What you'd be doing would be more unsafe than transmuting types of 3rd-party crates, since transmute cannot reach statics or functions, while visibility bypasses could.

6 Likes

Still, have you tried? It is entirely possible that they'll relax the visibility rules if you ask them. They could have thought that nobody could possibly need that information.

I don't know about capstone, but Rust bindings to C libraries are typically partitioned into two parts. The first would be called something like capstone-sys, and it would be a direct 1-to-1 translation of the provided C API, usually automatically generated via something like bindgen. These kinds of bindings typically don't have any visibility restrictions, because the original C headers don't. Their goal is to directly expose the C API to Rust code with minimal effort and minimal possiibility of API mismatch. If the privacy change happens in that kind of low-level bindings, I'd consider it a bug and file an issue. After all, it doesn't make sense to make low-level Rust code more restricted than C code. Anyone could just autogenerate the bindings themselves, or write a C/C++ module which uses the C API directly.

The second kind of bindings is idiomatic Rust bindings built on top of a -sys crate. These kinds of bindings usually utilize all possible Rust features, including visibility and undefined struct layout. If that's the kind of library you have issue with, changing privacy may indeed be undesirable.

If we're talking about raw C bindings and the upstream is uncooperative, you have a way to solve the issue locally. The layout of C structs is well-defined, so you can always just redefine them in your code. Since you imply that you need just a few fields, it shouldn't be a big maintenance burden, like a vendored patched library. You won't be able to safely convert the bindings' type to your redefined type, but you can always do a mem::transmute in both directions, and work with the fields of your local type as you see fit. To minimize error, it would be best to write specific conversion methods between the local and foreign types, and use transmute only there.

Important! If you go that way, make extra sure you understand the way mem::transmute works, and the way C APIs must be translated into Rust. I'd suggest running bindgen on the original headers, and copying the generated type definition. There are subtle details! Like padding, which needs to be explicitly defined in Rust, because Rust considers padding as uninitialized memory, but C doesn't.

The local type + transmute trick may work even for Rust-native types with undefined layout, but you'd be really walking on eggshells if you try to do that. The possibility of breakage is huge, you'd have to include careful tests for layout validation. Don't do that unless you absolutely certainly need to. Writing a few hundred lines of boilerplate, or maintaining a patch, is almost always much better than this solution.

7 Likes

Note that if all you need it for is debugging, you can just mark it pub in your local cargo cache and use it however you want. That'll obviously fail CI if you submit it, but that's for the best.

2 Likes

You can't. Cargo validates the hash of cached dependencies, and errors out if you try to edit them. The only way to modify the dependencies is to use the [patch] section of Cargo.toml.

I'm sure it validates the hashes at some point, but I just opened regex's lib.rs locally, changed it, and tried to compile. cargo build doesn't give a hash error; it compiles the modified code from the local cache:

4 Likes

If the project is already incrementally compiled it doesn't check dependencies. You'll need to cargo clean.

In an ideal world, the merge conflicts would be a feature. Because in an ideal world, merge conflicts would only happen when two changes modify the same piece of code in such a way that the way to reconcile them isn't obvious.

If your change consists of adding pub to some field, it should be obvious how to reconcile that unless there are major changes to the definition of the field… in which case the merge conflict would be a good thing, because it would alert you to the fact that the semantics might have changed.

Too bad that in the real world, merge algorithms are not nearly that smart. And even when there aren't merge conflicts, there's still some labor involved in running the Git commands.

What I'd like to find, myself, is some software capable of making the process of maintaining a downstream fork as automatic as possible. I'm thinking of features like:

  • Watch for upstream commits;
    • Try re-applying my patches on every single commit. If there's a merge conflict, give me an early warning immediately. Ideally I can handle conflicts individually whenever I have a chance, rather than having to wade through a pile of conflicts when it's time to update my dependencies.
    • Also run tests for every patched commit. If a test fails, check whether it also fails without the patches. If not, give an early warning.
  • Watch for upstream releases and automatically push corresponding tags to the downstream repo.
  • Try to use LLMs to resolve merge conflicts! I wouldn't trust an LLM to handle this fully unsupervised. But if I could get a notification saying "there was a conflict; the LLM resolved it like this; all tests are passing afterwards"… and I could just give it a quick look and click 'approve'… that would be great.

If anyone does know of software fitting this description, I'd love to hear about it.

8 Likes

To add one viewpoint, in a company, it's already now difficult to justify to your teammates why some best practices should be followed, when they've already opened a merge request not doing so, and the next deadline is pushing everyone to "just get it done". One of the reasons why I think Rust is a great tool is that it has fewer footguns than some of the alternative languages, and consequently, the codebases are generally nice to work with (at least relatively) even after years of rushed short-term thinking development.

With this feature, I feel Rust code would become less maintainable, since people would use it incorrectly as a shortcut to get around encapsulation rules slowing them from shipping features.

8 Likes

What if we allowed this only indirectly? We could allow just offset_of! to bypass visibility. And have a deny-by-default lint whenever it's used to bypass visibility.

1 Like

That would probably be better than my idea, actually. Distance it from the "memory-unsafe" meaning of unsafe and prevent it from accidentally being used inside unsafe blocks. It would make it easier to restrict the feature, too.

Yes, but that's not a problem of the feature, but with how it's used. If it's an opt-in feature that's not suggested and makes code harder to maintain, it won't be part of the conventions of any organization, and any code reviews should immediately strike it down if there's a better alternative. Just because you could accidentally glue your hands to your face isn't a reason to ban all glue for everyone, it's a reason to teach common sense in using it.

1 Like

Wouldn't smarter merge algorithms, perhaps something operating on AST nodes instead of line-by-line be a more direct solution than trying to have LLMs guess?

AST-aware merge would absolutely help. But sometimes you need to understand not just syntax but semantics, which is where an LLM could come in. Ideally both tools would be available and could be combined.

Incidentally, going back to the OP's use case, it would at least be a step forward if we had a semantic patch tool that could automatically perform edits like "mark foo::bar as pub". Even if such a tool required manual configuration, as opposed to being able to merge arbitrary changes, it could still be used as a more reliable way to keep downstream changes up-to-date. Did you know there's a work-in-progress version of Coccinelle for Rust? I'm not aware of any other such tools for Rust though.

2 Likes

I wish we had a special type of proc macro that operated on crates, receiving whole rust crates, and returning a new crate.

It could be called in the source code like this

#[mymacro]
extern crate mycrate;

(Note that this wouldn't literally receive the extern crate mycrate; tokens)

Or maybe even in Cargo.toml, like this

[dependencies]
mycrate = { package = "mymacro(mycrate)", version = ".." }

[dev-dependencies]
mymacro = { something here }
1 Like

Slightly of-topic, but mergiraf does a tree-sitter (AST) based merge instead of a line-by line merge.

From my experience, it has much, much fewer false positives (invalid merge conflict), but a few more false negative (no conflict found where we should have got one). Even with its flaw, it’s definitively a step in the right direction.

4 Likes

There is https://ast-grep.github.io/, which is generic over many languages (including Rust). I believe that Coccinelle is more advanced though (e.g. has type awareness, more natural ways to write the rules as diffs, better multi-node matching, ...). I have used ast-grep with mixed success at my dayjob on C++ for some large scale refactoring / custom lints.