Pre-RFC: Superseding public/private dependencies

Summary

Introduce a public/private distinction to crate dependencies.

Note: this supersedes RFC 1977 Enough has changed in the time since that RFC was approved that we felt we needed to go back and get wider input on this, rather than handling decisions through the tracking issue.

  • RFC 1977 was written before Editions, cfg dependencies, and package renaming which can all affect it
  • The resolver changes were a large part of RFC 1977 but there are concerns with it and we feel it'd be best to decouple it, offering a faster path to stabilization

Note: The 2024 Edition is referenced in this RFC but that is a placeholder for whatever edition next comes up after stabilization.

Motivation

The crates ecosystem has greatly expanded since Rust 1.0. With that, a few patterns for dependencies have evolved that challenge the existing dependency declaration system in Cargo and Rust. The most common problem is that a crate A depends on another crate B but some of the types from crate B are exposed through the API in crate A.

Related problems not with this scenario not handled by this RFC:

  • Poor error messages when a user directly depends on A and B but with a version requirement on B that is semver incompatible with As version requirement on B.
  • Allow mutually exclusiev features or overly-constrained version requirements by not requiring private dependencies to be unified.
  • Help check for missing feature declarations by duplicating dependencies, rather than unifiying features

Guide-level explanation

As a trivial, artificial example:

[package]
name = "diagnostic"
version = "1.0.0"

[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"
#[derive(Clone, Debug, PartialEq, Eq, serde::Deserialize, serde::Serialize)]
pub struct Diagnostic {
    code: String,
    message: String,
    file: std::path::PathBuf,
    span: std::ops::Range<usize>,
}

impl std::str::FromStr for Diagnostic {
    type Err = serde_json::Error;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        serde_json::from_str(s)
    }
}

The dependencies serde and serde_json are both public dependencies, meaning their types are referenced in the public API. This has the implication that a semver incompatible upgrade of these dependencies is a breaking change for this package.

With this RFC, in pre-2024 editions, this will warn saying that serde and serde_json are private dependencies in a public API. In 2024+ editions, this will be an error.

To resolve the warning in a semver compatible way, they would need to declare both dependencies as public:

[package]
name = "diagnostic"
version = "1.0.0"

[dependencies]
serde = { version = "1", features = ["derive"], pub = true }
serde_json = { version = "1", pub = true }

For edition migrations, cargo fix will look for the warning code and mark those dependencies as pub.

However, for this example, it was an oversight in exposing serde_json in the public API. Note that removing it from the public API is a semver incompatible change.

[package]
name = "diagnostic"
version = "1.0.0"

[dependencies]
serde = { version = "1", features = ["derive"], pub = true }
serde_json = "1"
#[derive(Clone, Debug, PartialEq, Eq, serde::Deserialize, serde::Serialize)]
pub struct Diagnostic {
    code: String,
    message: String,
    file: std::path::PathBuf,
    span: std::ops::Range<usize>,
}

impl std::str::FromStr for Diagnostic {
    type Err = Error

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        serde_json::from_str(s).map_err(Error)
    }
}

pub struct Error(serde_json::Error);

If you then had a public dependency on diagnostic, then serde would automatically be considered a public dependency of yours.

At times, some public dependencies are effectively private. Take this code from older versions of clap

#[doc(hidden)]
#[cfg(feature = "derive")]
pub mod __derive_refs {
    #[doc(hidden)]
    pub use once_cell;
}

Since the proc-macro can only guarantee that the namespace clap is accessible, clap must re-export any functionality that is needed at runtime by the generated code. As a last-ditch way of dealing with this, a user may allow the error:

#[doc(hidden)]
#[allow(external_private_dependency)]
#[cfg(feature = "derive")]
pub mod __derive_refs {
    #[doc(hidden)]
    pub use once_cell;
}

A similar case is pub-in-private:

mod private {
    #[allow(external_private_dependency)]
    pub struct Foo { pub x: some_dependency::SomeType }
}

Though this might be worked around by reducing the visibility to pub(crate).

I say "last ditch" because in most other cases, a user would be better served by wrapping the API which would be helped with features like impl Trait in type aliases if we had it.

Reference-level explanation

rustc

The main change to the compiler will be to accept a new modifier on the --extern flag that Cargo supplies which marks it as a private dependency. The modifier will be called priv (e.g. --extern priv:serde). The compiler then emits a lint if it encounters private dependencies exposed as pub.

While unstable, this lint will be warn by default. If the presentatuion of the lint is what holds us back from stabilization, one route to speed up the process is to change the level to allow. Once we feel comfortable with the presentation, we could then move it back towards warn. In the 2024 edition, we would change this lint to deny.

In some situations, it can be necessary to allow private dependencies to become part of the public API. In that case one can permit this with #[allow(external_private_dependency)]. This is particularly useful when paired with #[doc(hidden)] and other already existing hacks. This most likely will also be necessary for the more complex relationship of libcore and libstd in Rust itself.

cargo

A new dependency field, pub = <bool> will be added that defaults to false. When building a lib, Cargo will use the priv modifier with --extern for all private dependencies. What is private is what is left after recursively walking public dependencies (pub = true). We'll ignore this for other build target kinds (e.g. bin) as that would create extra noise.

Old cargo versions will emit a warning when this key is encountered but otherwise continue, even if the feature is present but unstable. While it is unstable, cargo publish will strip the field.

Cargo will not force a rust-version bump when using this feature as someone building with an old version of cargo depending on packages that set pub = true will not start to fail when upgrading to new versions of cargo.

cargo add will gain --pub <bool> flags to control this field. When adding a dependency today, the version requirement is reused from other dependency tables within your manifest. With this RFC, that will be extended to also checking your dependencies for any pub dependencies, and reusing their version requirement. This would be most easily done by having the field in the Index but cargo add could also read the .crate files as a fallback.

crates.io

Crates.io should show public dependencies more prominently than private ones.

Drawbacks

The warning message might not be the clearest in how to resolve as its emitted by rustc but is resolved by changing information in the build system, generally, but not always, cargo. As a last resort, we could put a hack in cargo to intercept the lint and emit a new version of it that explains things in terms of cargo.

There are risks with the cargo fix approach as it requires us to take a non-machine applicable lint, parsing out the information we need to identify the corresponding Cargo.toml, and translate it into a change for Cargo.toml.

In the case where you depend on foo = "300", there isn't a way to clarify that what is public is actually from foo-core = "1" without explicitly depending on it.

This doesn't cover the case where a dependency is public only if a feature is enabled.

The warning is emitted even when a pub item isn't accessible.

You can't definitively lint when a pub = true is unused since it may depend on which platform or features.

Rationale and alternatives

Misc

  • Cargo.toml: instead of pub (named after the Rust keyword, we could name the field public (like RFC 1977) or name the field visibility = "<public|private>"
    • The parallel with Rust seemed worth pursuing
    • pub could be seen as ambiguous with publish
    • While visibility would offer greater flexibility, it is unclear if we need that flexibility and if the friction of any feature leveraging it would be worth it
  • Cargo.toml: Instead of pub = false being the default and changing the warning level on an edition boundary, we could instead start with pub = true and change the default on an edition boundary.
    • This would require cargo fix marking all dependencies as pub = true, while using the warning means we can limit it to only those dependencies that need it.
  • Cargo.toml: Instead of pub = false being the default, we could have a "unchecked" / "unset" state
    • This would require cargo fix marking all dependencies as pub = true, while using the warning means we can limit it to only those dependencies that need it.
  • Cargo.toml: In the long term, we decided on the default being pub = false as that is the common case and gives us more information than supporting a pub = "unchecked" and having that be the long term solution.
  • cargo add: instead of --pub <bool> it could be --pub / --no-pub like --optional or --public / --private
  • cargo add: when adding a dependency, we could automatically add all of its pub dependencies.
    • This was passed up as being too noisy, especially when dealing with facade crates, those that fully re-export their pub = true dependency

Minimal version resolution

RFC 1977 included the idea of verifying version requirements are high enough. This is a problem whether the dependency is private or not. This should be handled independent of this RFC.

Dependency visibility and the resolver

This is deferred to Future possibilities

  • This has been the main hang-up for stabilization over the last 6 years since the RFC was approved
    • For more on the complexity involved, see the thread starting at this comment
  • More thought is needed as we found that making a dependency pub = true can be a breaking change if the caller also depends on it but with a different semver incompatible version
  • More thought is needed on what happens if you have multiple versions of a package that are public (via renaming like tokio_03 and tokio_1)

Related problems potentially blocked on this

  • It is hoped that the resolver change would help with cargo#9029
  • If we allow duplication of private semver compatible dependencies, it would help with cargo#10053

Prior art

Within the cargo ecosystem:

Unresolved questions

  • Will the warning be too disruptive to the ecosystem to enable by default?
    • Being automatically fixed with cargo fix (with cargo's reminder that you can run it) helps
    • Not requiring an MSRV bump helps
    • We could instead start it as allow but in rust-2024-compatibility group, still turning into an error in the 2024 edition, and have the edition migration reduce the blast radius.

Future possibilities

Help keep versions in-sync

When upgrading one dependency, you might need to upgrade another because you use it to interact with the first, like clap and clap_complete. The existing error messages are not great, along the lines of "expected clap::Command, found clap::Command". Ideally, you would be presented instead with a message saying "clap_complete 3.4 is not compatiblw with clap 4.0, try upgrading to clap_complete 4.0". Even better if we could help users do this upgrade automatically.

As solving this, via the resolver, has been the main sticking point for [RFC 1997], this was deferred out to take smaller, more incremental steps, that open the door for more experimentation in the future to understand how best to solve these problems.

Some possible routes:

Dependency visibility and the resolver

RFC 1977 originally proposed handling this within the resolver

Cargo will specifically reject graphs that contain two different versions of the same crate being publicly depended upon and reachable from each other. This will prevent the strange errors possible today at version resolution time rather than at compile time.

How this will work:

  • First, a resolution graph has a bunch of nodes. These nodes are "package ids" which are a triple of (name, source, version). Basically this means that different versions of the same crate are different nodes, and different sources of the same name (e.g. git and crates.io) are also different nodes.
  • There are directed edges between nodes. A directed edge represents a dependency. For example if A depends on B then there's a directed edge from A to B.
  • With public/private dependencies, we can now say that every edge is either tagged with public or private.
  • This means that we can have a collection of subgraphs purely connected by public dependency edges. The directionality of the public dependency edges within the subgraph doesn't matter. Each of these subgraphs represents an "ecosystem" of crates publicly depending on each other. These subgraphs are "pools of public types" where if you have access to the subgraph, you have access to all types within that pool of types.
  • We can place a constraint that each of these "publicly connected subgraphs" are required to have exactly one version of all crates internally. For example, each subgraph can only have one version of Hyper.
  • Finally, we can consider all pairs of edges coming out of one node in the resolution graph. If the two edges point to two distinct publicly connected subgraphs from above and those subgraphs contain two different versions of the same crate, we consider that an error. This basically means that if you privately depend on Hyper 0.3 and Hyper 0.4, that's an error.

If we want to go this route, some hurdles to overcome include:

  • Difficulties in working with cargo's resolver as this has been the main hang-up for stabilization over the last 6 years since the RFC 1977 was approved
    • For more on the complexity involved, see the thread starting at this comment
  • More thought is needed as we found that making a dependency pub = true can be a breaking change if the caller also depends on it but with a different semver incompatible version
  • More thought is needed on what happens if you have multiple versions of a package that are public (via renaming like tokio_03 and tokio_1)

Caller-declared relations

As an alternative, when declaring dependencies, a user could explicitly delegate the version requirement to another package

One possible approach for this:

[package]
name = "some-cli"

[dependencies]
clap = { version.from ["clap_complete"] }
clap_complete = "3.4"

When resolving the dependencies for some-cli, the resolver will not explicitly choose a version for clap but will continue resolving the graph. Upon completion, it will look to see what version of clap_complete was resolved and act as if that was what was specified inside of the in-memory clap dependency.

The packakge using version.from must be a public dependency of the from package. In this case, clap must be a public dependency of clap_complete. If the different packages in version.from do not agree on what the package version should resolve to (clap 3.4 vs clap 4.0), then it is an error.

Compared to the resolver doing this implicitly

  • It is unclear if this would be any more difficult to implement in the resolver
  • Changing a dependency from pub = false to pub = true is backwards compatible because it has no affect on existing callers.
  • It is unclear how this would handle multiple versions of a package that are public

The downside is it feels like the declaration is backwards. If you have one core crate (e.g. clap) and many crates branching off (e.g. clap_complete, clap_mangen), it seems like those helper crates should have their version picked from clap.

Missing feature declaration check

It is easy for packages to accidentally rely on a dependency enabling a feature for them. We could add a mode that limits feature unification to reachable dependencies, forcing duplication and longer builds for the sake of checking if any features need specifying.

However, this will still likely miss a lot of cases, making the pay off questionable. This also has the risk of being abused as a workaround so people can use mutually exclusive features. If packages start relying on it, it could coerce callers into abusing this mechanism, having a cascading effect in the ecosystem in the wrong direction.

6 Likes

When a new field is added cargo from before the field is added will allow it, however the cargo versions for which it exists but is still unstable will emit a hard error even though versions before and after are just fine with it.

That behavior is something we can control. For example as part of the work on lints RFC, I made it so the field would not error if in use by a version of cargo where it is currently unstable. We would then strip the field on publish.

The main reason we do that is if the behavior being added would is required for the regular functioning of the code. Personally, I don't think that is the case here, like with lints.

I'd be curious whether adding a [pub-dependencies] similar to [build-dependencies] and [dev-dependencies] would be workable as an alternative.

Something like that came up in the original RFC though I didn't dig into why they didn't go that direction.

Adding a new table comes with a migration code

  • This now requires an MSRV bump while before it didn't
  • Third-party tooling can easily ignore the pub field but has to update to handle a new table type.

It might be worth mentioning some of what rustdoc will do based on this switch:

Even if a blanket implementation from a private dependency is applicable to the currently documented crate, traits from private dependencies are not listed in the Trait Implementations list, and do not show up in search results.

Could you expand on that a bit more? I feel like I don't know enough about rustdoc for how its rendering traits across packages.

A new section would have a much harder problem with the "pub-only-if-feature-is-used" case.

I wanted to highlight one of the points of motivation for this RFC written above and reiterate it explicitly:

I've at least personally run into this a number of times managing a workspace where crates have different versions. Currently it's somewhat cumbersome to maintain the versions between crates and publishes and it requires intimate knowledge of each crate that I forget time-to-time. I'd love for automated tooling to handle the version bumps but many of the existing solutions, as far as I know, aren't able to handle situations such as when you do a major bump of one crate it must force a major bump of a dependent due to the public dependency relationship.

Now to be clear I don't expect a solution to materialize overnight to my problem, but I mostly wanted to say that I think from a workspace management perspective having the idea of public dependencies will be helpful to future tooling. Additionally the lint here will also be quite useful I think for catching cases where something is accidentally exposed. For example a little while ago I tried to remove a public dependency and make it private but it turns out that I forgot a few functions so that didn't work.

Orthogonally one of the major risks with this RFC I think is how much of a change it is for basically all Rust users. This is a surmountable problem of course but I think will present a roadblock to rolling this out. One possible suggestion to help mitigate this might be to have something like:

[package]
# ...
warnings-about-public-dependencies = true

This field (please bikeshed the name) would default to false for historical compatibility. The idea of this being opt-in to start out with would not only decouple this change from an edition but additionally provide a longer runway for feedback and such. In the 2027 edition perhaps this field could be turned on by default when there's more experience with it, but I think it's valuable to get this to eager crate maintainers before trying to solve all the problems anyone might encounter when it's turned on by default.

One reason I mention this is that I found the current state of the rustc lint not super helpful where there were some private dependencies erroneously flagged as public and having more runway to figure that out I think would be useful.

2 Likes

As a heads up, I've done an update of the original post with some clarifications and expansions.

I think there might be a misunderstanding. The motivation you highlighted is not about keeping versions in-sync but when you think a dependency is private when it isn't and the consequences of that.

As for the problem you are talking about, of keeping versions in sync, that was very intentionally dropped from the RFC to take a smaller, more incremental approach. I looked back through all discussion tangentially related to RFC 1977 and I saw that you had pushed a lot on this angle and it is sad to see this part deferred out. However, I feel that what is here is enough to be made stable quickly, has enough value on its own, and will make it easier to experiment on different solutions with the base feature being stable and packages explicitly stating their public dependencies.

Could you expand on this? I'm not seeing how this is a disruptive change

  • For myself, the majority of my dependencies are private though I recognize what we each work on can color our perception of what is common
  • Users already need to deal with new warnings being added

For me, the biggest risk is the quality of the warning as rustc is producing it but has no knowledge of the users build tool which is needed for resolving it.

Later you did mention

To be clear, it sounds like you are less concerned about the general state of the warning but only in cases where you shouldn't do pub = true.

The cases I'm aware of for this

  • A part of the API is marked as pub but not accessible from the root of the namespace
  • #[doc(hidden)] items, typically for macros

Is that what you are referring to? What is it about these kinds of cases that are even worse for the warning?

I could see us taking the approach of having taking this route:"

  • unstable: make it warn
  • initial stable release: make it allow
  • later stable release: make it warn
  • next edition: make it error

Note that the "make it error is the only part of this tied to an edition and I use "2024 edition" as a placeholder. We should progress on this as we feel its ready. I would love to see what we can do to improve the lint early on so that the gap until warn by default is small.

If we went this route, we wouldn't need a new field. RFC 3389 will be stable in 1.74, making this

[lints.rustc]
external_private_dependency = "warn"

If there are multiple lints, we could have a lint group for them.

Something more aggressive:

Could each crate actually consume all its private dependencies, so private dependency *.rlibs get merged into its "owner“ crate *.rlib? I think this can largely reduce actual work at linking time, and help with incremental compilation time.

(Not necessarily in the first version, but getting it forward compatible is important)

Today:

crate A(bin) --pub-dep-> crate B(lib) --priv-dep--> crate C(lib)

rustc generates:

  • A-*.o
  • B.rlib
  • C.rlib

for system linker to link together.

Vision: rustc generates:

  • A-*.o
  • B.rlib which includes essential portion of C.rlib

for system linker to link together.

1 Like

For commonly internally used libraries, like itertools, futures`, etc, and their dependents, this could result in a lot of duplicate copies of library code in a deep dependency graph, taking up disk space even if the linker doesn't mind. Private doesn't mean it's used by one dependent.

1 Like

Private doesn't mean it's used by one dependent.

I feel this is a lot like the #[inline] attribute on the functions. You're saying that if blindly treat #[inline] as #[inline(always)], there might be a lot of duplicate copies of the instructions, which is true. However it's also true that there're many cases that inlining into the caller site will remove the actual need of codegen-ing the original callee piece of code, and might actually improve compilation performance. So there's need for a good heuristic inlining strategy. But the whole inlining mechanism is important anyway.

pub could be easily confused with publish.

3 Likes

Another thing that comes to mind is that features on private dependencies don't need to be unified with other uses of the same crate elsewhere in the tree (though unification is probably wanted to reduce compile time and binary sizes in most cases). Some mode to do this could help with "you're assuming features implicitly enabled elsewhere in the dependency graph" detection.

Another complication is that some crates can communicate at link time using crates like inventory where you want all consumers of some inventory-providing crate to agree on a single version so that all consumers see the same array at runtime.

4 Likes

Noted so we can keep that in mind when making the final decision

Added this as a future possibility. I would prefer us to have real solutions for mutually-exclusive features before designing such a check mode because I can see it being abused to the point of having a negative affect on the ecosystem.

Good point, especially when we get to designing a real solution for this ("distributed slice"). I've left it out for now because I'm intentionally leaving out de-unification and don't want to get too side tracked with the considerations of that.

Ah sorry I was approaching this from a bit of an odd angle so I don't think I quite got my point across. To be clear I'm all in favor of this updated version of the RFC, even as it's been slimmed down. My main point is that I want to be able to get warnings about dependencies which are public when I thought they were private. That transitively makes multi-crate workspaces easier to manage for me because it's more clear what's public and what's not, and I'll be empowered (or anyone else) to build tooling around this (in theory) to manage version numbers and bumps more automatically - e.g. a major bump on one crate forces anything with a public dep on it to also have a major bump.

Also to clarify, I am definitely not seeking to expand the scope of this feature, I think it's great as-is! Slimming it down to remove the resolution bits for now and mostly enabling the lint to be used I think is a great first candidate.

My hunch, with no data to back it up, is that the majority of crates have at least one public dependency. This means that if the lint were turned on today then the majority of crates would, immediately, start getting warnings about "this dependency isn't marked public = true in Cargo.toml". To me this large quantity of warnings for the majority of users is something I consider a disruptive change. Definitely not disruptive in the "you broke everyone's build" sense but disruptive in the "you're causing lots of PRs to be created" sense since everyone will likely by default want to resolve these warnings.

You mention that users already deal with new warnings being added. I agree this is true, but my guess is that the scale of the amount of new warnings which would come about from turning this feature on by default puts this change in a different category. Most new lints in rustc are relatively niche things I think, or at least my experience of 5 or so years is that it's relatively rare for the codebases I work on to get all that many new lints from rustc updates. Certainly not change-every-crate-in-the-workspace size lints.

One category of disruption, in my opinion, is everyone needs to add public = true to dependencies which are actually public. I'm sure the lint message could be tailored a bit more to suggest this (I tend to not read rustc diagnostics that closely myself, so maybe it already handles this!), but my point is that there's a fundamental amount of work of adding public = true everywhere, and this is a major source of the "disruptive change" in my opinion.

My comment you're referring to here is a bit orthogonal where there are other causes for churn such as false positives. The example I ran into today was:

mod private {
    pub struct Foo { pub x: some_dependency::SomeType }
}

where rustc thought that some_dependency should be a public dependency. I understand where it's coming from, but this I think needs to be taken into account one of two ways:

  1. One option is to improve the warning to realize that Foo isn't publicly accessible so therefore some_dependency has no need to be public = true. This I'm sure is not easy to do in rustc.
  2. Otherwise this is more churn on top of adding public = true because now any crate using this pattern needs to update everything to pub(crate) or similar to successfully quash the warning.

Ah a great idea! In my opinion my preferred method of rollout for this feature then would be to add the lint to rustc and make it allow by default. Crates could then opt-in with an annotation to set it to warn, and then eventually in the future, if it seems productive, the lint is turned to warn or error by default at an edition boundary (or something like that)


( also sorry if others have addressed some of my points, I haven't had time to catch up on the rest of this thread so I'm just responding to this one post )

I think I'll start this with the more aggressive schedule to see where we land in the conversation, and adjust from there. In my local draft, I'm more explicitly calling this out as an unresolved question.

Went ahead and posted the RFC

1 Like

An idea regarding dependency resolution came to my mind:

It may be advantageous to allow public dependency versions to be determined by their parent, and making the version field of public dependencies more of a lint. Each crate would get a warning if the version supplied to one of its dependencies doesn't match the version indicated by the dependency: that way version conflicts could be handled locally.

Here's how I imagined this:

[package]
name = "serde_user"
version = "1.0.0"

[dependencies]
serde = { version = "1", pub = true } # this is just a lint
[package]
name = "serde_user_user"
version = "1.0.0"

[dependencies]
serde = { version = "1.0.189" } # this determines the actual version -- warns if broader than what is required by serde_user
serde_user = { version = "1", dependencies = { serde = "serde" } } # make serde-user use serde defined as serde in this crate's dependecies

This gives crate authors greater flexibility for managing and synchronizing dependencies of dependencies.

This would also allow multiple versions / copies of the same crate to coexist in the same subtree:

[package]
name = "complex_package"
version = "1.0.0"

[dependencies]
public_serde = { package = "serde", version = "1", pub = true }
serde_08 = { package = "serde", version = "0.8" }

public_serde_user = { version = "1", dependencies = { serde = "public_serde" } }
old_serde_user = { version = "1", dependencies = { serde = "serde_08" } }

Does this make sense? Does it have issues I didn't think of? (I'm sure it has...) Would it be too much effort to implement this in the current RFC?