And before anyone brings this up, I need to stress that visibility has nothing to do with data security. "Private" members are purely a label that tells developers that they shouldn't mess with them. If a malicious third-party already has code running on your machine, it doesn't matter how "private" the "password" field is labeled when they can just view the bytes directly in RAM.
With that being said, there are times when you need to access private members and functions, just like there are times when you need to perform unsafe operations. And just like with unsafe operations, bypassing visibility should be your absolute last resort, but it is necessary on occasion.
There are far too many times where we have to fork a crate entirely just to access a single private member or function, and requiring that as the official method is very dangerous, since it means any upstream bug fixes are completely inaccessible to your fork from then on.
This is extremely necessary, especially in a language where every field and function is private by default. Sometimes you just can't do anything without that access, and the workarounds are far more brittle than allowing a single visibility bypass. It's so necessary a feature that every other major language has some way of resolving this, from a hefty "reflection" API to simple manual jailbreaking. Rust desperately needs one too, and being an unsafe operation, the unsafe block seems most fitting.
Not forking is also very dangerous, because accessing private fields in a new version that may have arbitrarily changed the invariants and usage of that field is a recipe for incorrectness and unsoundness.
Unsafe code has the responsibility to be sound β to not produce undefined behavior unless its own unsafe fn safety contract is violated β and there is no way to make this feature sound.
pub struct Foo{ bar: Bar }; // private `bar` field
// In function outside of module:
fn myFunction(foo: Foo) {
unsafe {
let b = myFoo.bar; // access private member, bypass visibility checks entirely
}
}
This implementation would be the safest possible way to go about it: instead of requiring brittle hardcoded jailbreak-structs, it uses the checks already present in the language. There would be zero difference between out-of-date code that bypasses visibility in this format and other out-of-date code depending on too new a module.
And in this way, you shift the onus on keeping your code up-to-date on your usage, exactly the same as every other piece of code.
It's just "when in an unsafe block, disregard visibility". It's a clean way to go about it that genuinely has zero problems whatsoever, because it only requires the same invariants that we subscribe to when writing standard Rust.
This would entirely destroy the concept of semver: cargo update would be quite likely to break since some crate somewhere depends on implementation details that are not stable. The ecosystem costs of your proposal are gigantic.
So, no. Visibility is a key concept of developing at scale, and trying to bypass it is very short-sighted. Even Java had to realize this, when after decades of letting people bypass visibility with reflection, they eventually introduced modules that can prevent reflection.
This is why it is unsafe: intentionally-private internal state is not to be mutated indescriminately, because it can cause some serious problems. But making it so visibility is an absolute is even worse, because it means you are physically not allowed to use the data you're given by an arbitrary restriction of the language. This leads to extremely brittle reimplementation of existing functions and state that are even harder to maintain than just having direct references to the ones that already exist.
I'm not saying visibility should be done away with altogether, but I'm saying it needs to be able to be violated sometimes to prevent even worse maintainability concerns.
Making it unsafe does nothing to prevent the semver issues. unsafe is used to deal with memory safety problems, it has no bearing on semver concerns, and it is not in any way useful to deal with semver concerns.
The cure you are proposing is a lot worse than the disease you want to fight with it.
This is very, very misguided. Privacy boundaries are one of the key pillars of creating sound interfaces to unsafe code. Violating that just doesn't make sense.
There is a "Frequently Request Changes" entry which is somewhat related: Cross-function type inference. It could maybe be generalized to "weaken encapsulation".
Rust did not invent many of its most popular features (e.g. there were languages with borrow checking before), but it implemented them such that they enabled engineering of large-scale programs.
A core aspect of Rust that enables distributed engineering is modularity and strong interfaces. The interface is the contract, and the internals are private and encapsulated. Cross function inference is rejected because it leaks implementation details not captured by the interface. Visibility bypass would also leak implementation details. As @RalfJung said higher, this ties in with semver and evolution
As you said, visibility is not a security boundary enforced at runtime and there are always escape hatches if you drop low-level enough. But security is not the point of visibility, the point is modularity/encapsulation.
I strongly agree with:
The cure you are proposing is a lot worse than the disease you want to fight with it.
In particular, this proposal seems very brittle if we compare 3 approaches:
Contact upstream maintainers to increase visibility
Maintain a fork of the lib with patched visibility
Use bypasses and control their safety invariants based on usage in your whole dep-tree, and control it again any time the lib releases a new semver compatible version.
The first option works pretty well in practice, and if it's refused it's usually explained why. Now between the currently supported option 2 and your proposed option 3, I feel that option 2 is still less effort to handle.
There is a giant difference: your example code breaks when Foo is changed in a way that is considered backwards-compatible, such as renaming bar, or worse, reusing bar to mean something different.
Could you elaborate on when you need this? As in specific real world examples. Then perhaps instead of a dangerous blanket visibility bypass keyword we could find some other solution.
Not OP, but here's a situation I found myself in right now.
I am observing weird effects that point towards there being a bug in some_lib::foobar, which contains a nontrivial amount of code. (More specifically, it is changing some global state in a way that is incompatible with other code from another library.) To figure out what the problem is, I'd like to inline the implementation to the callsite in my crate, and proceed with further modification to find the underlying issue.
However, foobar uses various private types and functions within some_lib so the total amount of code I would need to copy into my crate would be unreasonable.
One alternative would be to change it to a path-dependency and edit some_lib::foobar locally. For patching a dependency, I've usually done just that, but to actually inline the implementation I would need to mark quite a few things public.
Wanting to access private items comes down to one of two cases:
You want too much from the library
You just can't use a library for something it was not designed to do. Your options:
Use a different approach
Use a different library
Reimplement some functionality from the library in your crate
Fork the library and make changes to it
Make your own library
The library has a bug
It is totally possible that a library should expose an item, but it did not. You can:
Submit a PR (you can use cargo's patching functionality until it is accepted)
Fork the library
Any other options from Case 1.
It is important to recognise which of these is your case. It is also possible that you and the library owner do not agree: they have the right to decide what their library guarantees. You have no right to use a library outside of those guarantees, but you do have the right to fork it.
I mean debugging is the fundamental usecase here, and one in which no amount of "privacy guarantees" argument holds muster. It's also one in which efficiency matters. And there is no semver issue with terminal (bin) crates having this super power, if the author is okay with spurious breakage. Typically when debugging I will only be using ~1, sometimes 2 concrete versions of a dependency.
But the issue might just be that [patch] dependencies are fairly heavy to work with. I don't think it would be an issue if the task were automated, along with the task of "mark everything in this crate pub. I don't know.
Making [patch] dependencies easier to maintain could indeed be a solution.
In Node, Yarn has a nice feature called yarn patch. It extracts a copy of the dependency that you can edit freely. When you're done editing it, it computes a diff and attaches it to your project. When someone else clones your project and installs the dependencies, the patches are automatically applied. (It's obviously local and only affects the project with the patches). Instead of forking a whole repo, this let's you maintain only a tiny diff. And since it's handled by the package manager, the intention is clear and it does not mess with encapsulation.
An example where it could have been useful is with the time crate breakage. If stuck on the old version for some reason, you could instruct cargo to apply a patch at install time instead of replacing the whole crate.
I still don't think I'm wrong, but clearly no one's on my side to do my work for me, so I need a better argument. I'm going to think it through for a long while and gather a more structurally-sound case than just "it's kinda obvious, we've all dealt with this" before bringing it up again.
My actual issue is that I'm trying to use the official bindings for Capstone, a C library, but they don't include a pretty critical function from the original library. Luckily, they include FFI mappings... that I can't use because everything I need to use them β the pointer to the C struct and a few utility functions β are hidden in private members, meaning I have to either fully duplicate a ton of existing code in a wrapper class just to re-store data it already has, or entirely fork the project and set that while thing up. Or I could literally use a single private field access.
Considering the structure of the existing bindings and the fact that the C function I need to use goes counter to a ton of Rust conventions β (It's a C-style iterator that keeps mutating a single struct. Not Rusty.) β I can see the reason they left it out of the Rust bindings, so a PR wouldn't happen. But, the C library is also being constantly updated, so I need to stay on the official current bindings to stay up-to-date with bug fixes on the C side.
I'd rather focus on keeping a single "private field access" current than try to maintain an entirely-separate fork, because one has instant compiler errors in a program I wrote and one requires an entirely separate repo with downstream errors in a program someone else wrote. It would also keep my workspace from having to reconcile an entirely different repo stuck inside it for literally a single change.
Oh and reduplicating external code in your own repo is ten times as brittle as both of those options and also stores the same value multiple times just to satisfy an arbitrary condition, and is the worst thing possible for maintainability on all fronts.
Sidenote, I wish Markdown supported underlining so it didn't look like I was shouting there, lol.
Note that the thing that is needed is not (solely) a better argument for the feature to exist, but a proposal for how the feature should work so that it is sound.
Here's an idea for a totally different way for it to work: suppose the [patch] section of Cargo.toml gained a way to express "Apply the following patch file to version 1.2.3 of some_library". That would be sound, because
[patch] and [profile] get to arbitrarily modify the build anyway, so this adds no UB hazard that didn't exist already, just makes it existing things simpler, and
it's specifying an exact version to patch, not whichever version a dependency is resolved to, so updates can't cause it to go wrong
I think that might suit your original purpose β if the program that needs the patch is a program you're building. But itβs a very different thing which is applicable and inapplicable to different situations, and has different advantages and disadvantages.
Honestly, I don't really get how you all can't see that the current "fork the repo" convention is one of the least maintainable ways you could possibly use. If your PR is rejected, then rather than having to maintain single internal changes in a single place (a function in your program, the place where they're used), you now have to reconcile all internal changes in an entirely separate program.
Updating will make both equally outdated, but one is far easier to maintain than the other. A changed private reference can be resolved by just changing it to wherever the data is stored now, but a conflicting fork requires merge conflicts.
My experience in open source is that most projects fall in one the following cases:
The library is no longer actively maintained. If it's just done, it could be fine. Forking let's you keep the lib up-to-date yourself without depending on a slow upstream. Pulling from upstream should be rare.
The library is actively maintained. Raising usability issues (or sending PRs) about over-restrictive visibility should get a reply and fix if needed. I'm not very familiar with Capstone/which lib you're actually using but capstone-rust/capstone-rs seems to be in this category. There were a couple issues about missing visibility and they either got fixed or received a usage example.
Overall, actively maintained projects tend to react to the issues you raise; that's why a few people asked higher if you contacted the lib maintainers first before posting here. Your case would be stronger if you had examples where maintainers refused to increase visibility without good reasons.
Still, there are situations where your PR is rejected (very rare) or ignored (more common). In this case, the yarn patch approach that I mentioned / what @kpreid described in the context of Cargo usually offers a good middle-ground. It warrants some consideration at least.
One of the main use cases for package diffs by Yarn are Yarn's own patches for TypeScript. Some installation modes of Yarn change the layout of packages and break package resolution in the TypeScript type checker. TypeScript has a large active code base, but exposing package resolution hooks is a complex feature so it got stuck in limbo. Through patching, Yarn can fix the module resolution without having to maintain a full fork of TS.
The most relevant reply to this argument was the following:
The reason is integration with standard Rust workflows. cargo follows semver and tries to keep you up-to-date as much as possible. It will often pull more recent versions of libraries without notifying you explicitly. With a fork or patch, the version that you have does not change silently, it always requires active intervention.
You gave the example of a library providing bindings for a C lib. It's not that hard to imagine a situation where a patch update could change the semantics of a program and be silently missed when updating and introduce UB when dealing with low-level stuff. For example, a C wrapper could have an internal private helper function that returns a pair of void pointers. With a visibility bypass, you could hook into this function to retrieve these pointers. Then a few weeks later the lib does some refactoring and smaller bug fixes and releases a new patch version. During the refactoring they decided to swap the order of the pointer pair. It was a private helper function, so they feel confident that it won't break anyone. Next time you trigger a cargo update, you will pull this patched version. Your visibility bypass will still be active and will still see a pair of void pointers, but they no longer match what you expected when you wrote the bypass!
This kind of silent semantic change is why your proposal gets so much push-pack in my opinion. It needs an answer to semver. I recommend that you should first either acknowledge that this scenario is possible and discuss solutions; or you should explain why this scenario is not possible with your proposal.
The "apply diffs to specific versions" solution solves this scenario by not applying the diff when the version changes. This would revert you to the default upstream encapsulation. It also has the benefit of being more general and apply to more situations than visibility only.