Allow to go around e.g. the orphan rules for trait implementations


#1

Here is the actual scenario that led me to think about this:

  • I’m rewriting some software pieces by pieces in Rust.
  • That software uses a .ini-like configuration.
  • There are two crates with the “ini” keywords on crates.io: rust-ini and tini.
  • The former can’t read the configuration because it considers # to start comments, but the configuration contains # in values (there’s an open issue on rust-ini about that, that’s not relevant to this discussion).
  • The latter can read the configuration, but it has a very limited API, and doesn’t have any way to iterate over the whole configuration (which rust-ini has).

So here I am, with two possible crates to fulfill my needs, but neither can fulfill them completely. Now I’m faced with multiple choices:

  • Fix rust-ini
  • Improve tini

In either case, there are further sub-choices:

  • Fork the crate, and point to my fork in Cargo.toml.
  • Import the crate in my own repo.
  • Submit changes upstream, wait for them to be merged in a new release.

The latter is also something that should be done in the other cases, but the other cases allow for progress on my own project without having to wait on a third party.

All of the above are complications that maybe could be helped by allowing some relaxation of e.g. the orphan rules for trait implementations.

In my specific scenario, being allowed to add an implementation of Iterator on the tini::Ini struct would (mostly) make life easier (the other half of the problem being that access to private fields in the struct is required).

I know I’ve been frustrated more than once by the limitations of trait implementation for foreign types. I do understand why those limitations are there, but I posit that while they make sense in the case of libstd and other libs that come with the compiler, they seem arbitrarily restrictive for other dependencies where your Cargo.toml has an explicit dependency on a specific version: yes, a future version of that crate may change things such that your trait implementation doesn’t work, or implement that trait itself, but a) a future version of that crate may change any other thing in a way that would break your code and b) you would only use the future version if you asked for it.

Now, I’m not suggesting that the limitations should be lifted by default, or for non-libstd, but that maybe having some syntax allowing it would, IMHO, help cases like the current scenario I’m in. I think some syntax that would say something like “pretend the following block is defined in that other module in that other crate, and I’m willing to pay the consequences” would go a long way.


#2

I am missing a solution in your question: local aliasing.

As far as I understand your problem, introducing a crate-local alias for the given types and implementing the traits on those, would work for you?

Sorry for the brevity and I would love to come up with an example, but I have to run.


#3

Typically the workflow here is to:

  • Evaluate a crate to start with
  • Fork the crate, add the feature
  • Start using your fork in your other project, test it, make sure it works
  • In parallel submit a PR upstream and continue working locally

The idea is that you’re never blocked on someone merging something upstream, but you’re always contributing work back upstream as well for everyone else. Would that work in a case like this?


#4

That wouldn’t help with the second half of the problem, which is access to the non public fields. Well, it’s still technically possible to have a local version of the struct and use transmute, but that’s not very desirable.

That’s one of the choices I’ve written down, but I’m arguing this is a burden, and the developing experience would be nicer if we didn’t have to go through those hoops. However, I can see how my proposal doesn’t encourage submitting changes upstream, since they’re not in a straightforward form to submit a PR with.


#5

Are you sure you understand those consequences? Here are some guarantees that Rust provides right now that this would completely violate:

  1. Unsafe code can assume that private fields are private, and that nothing outside of this module can mutate those fields in a way that would violate the invariants they uphold. This encapsulation is essential to upholding Rust’s basic memory safety guarantees.
  2. Every possible impl can be contained in only one crate; this means that two valid Rust crates can never be incoherent when combined together. This prevents irrevocable ecosystem splits.
  3. If the impl were someday added to the crate it actually belongs in, that would be a breaking change for you. There’s been a lot of work into making sure that adding impls is not a breaking change.

So you’re giving up the guarantees of memory safety, compatibility with other Rust code, and semantic versioning. Any change to the crate you depend on could be a breaking change for you, ranging from compile errors to silent memory errors, and we should definitely make it unsafe to add your crate as a dependency because of the ecosystem split issue.

I definitely get that these rules are frustrating (especially the orphan rules), and people are trying to find a way to make the system more flexible without giving up those important guarantees. But this feature is so risky I don’t see how it could be consistent with Rust’s philosophy.


#6

I won’t pretend to know how the Rust compiler actually works, but that seems to me like an arbitrary limitation from the implementation. There doesn’t have to be a difference in memory safety guarantees because of where some impl is defined. Specifically, my proposal doesn’t have to be technically different from the alternative and cumbersome scenario where I fork the library, change it, push to github, and change my Cargo.toml to use module = { git = "url" }, which doesn’t break the memory safety guarantees, and also causes an ecosystem split.

That would be a breaking change for me… if I update Cargo.toml. As could be any other change in the new version of that dependency.


#7

Let me try to clarify.

The Vec type has three fields: a length, a capacity, and a pointer to the buffer containing the actual data. The module defining Vec contains a lot of unsafe code which is only safe if certain invariants about the relationship between the length, capacity, and buffer of the vector are all maintained. Vec's authors provide a safe API because they know the only way these fields can change is through code in their module that they can audit. But mutating these fields is totally safe - Vec just doesn’t exposed an API to mutate them because that would make other code in unsafe cause memory issues.

If you could reach into that module, you could trivially violate the safety of Vec by mutating any of those fields to violate the invariants they uphold. Your change could also violate safety after the fact when you update your dependency, because an update could introduce a new invariant you didn’t audit for.

And yes this isn’t any different than writing some code in the vec module which violates memory safety, except that you are performing the action at quite a distance from the vec module.

No, even if you run cargo update without changing your toml, or build your code on a new system without an existing Cargo.lock, unless you have provided an = dependency, your code could break. cargo treats your dependencies as properly following semantic versioning, and will accept e.g. 1.1.1 as a valid match for a dependency on 1.1.0 (you can override this behavior by depending on =1.1.0).

One of the provisions Rust makes is that crate authors can add new impls to their types without it being a breaking change. But it would be a breaking change for you.


So basically if you are willing to opt out of semantic versioning and have every update be potentially breaking, opt out of participation in the crates.io ecosystem because your crate could contain conflicting impls with another crate, and audit your code & the module you’re opening with extreme care because any of it could violate memory safety then you could take on this burden. Does that really seem like less of a cost than forking the upstream dependency?

If it does, there is already a solution - the mem::transmute hack. Since Rust strongly & emphatically discourages doing anything that would violate these rules, I think making this any easier to do would be inconsistent with the language’s goals.