Pre-RFC: `pub(api)` visibility, and static enforcement of crate-internal API boundaries

This is an idea I wrote up while working with a self-written large crate with a number of internal sub APIs in modules. Hopefully it makes sense.

Any feedback is welcome, but I mainly want to see if other people have felt this pain point, if the approach seems reasonable to people for the Rust philosophy, and if I'm somehow treading over previous attempts to do something similar. I'm happy to address the semantics further, but I wanted to get this high-level feedback before spending major effort refining the write up.

Also, this is my first attempt at anything in the realm of Rust RFCs, so be a little kind :slight_smile: .

(Posting as a link to github, as there is a limit of 5 links for new users)

Rendered

EDIT: Updated to sync to head of branch.

Our options are now either to use pub(in super::super), or to use pub(in crate::resources), but these add a lot of verbiage to our source file:

// In src/resources/io/file_api/mod.rs

/// A set of resources that can be read individually.
pub(in crate::resources) struct ResourceSet {
  // ...
}

impl ResourceSet {
  /// Open a file path to load resources from it.
  pub(in crate::resources) fn open(path: &Path) -> Result<Self> { 
    // ...
  }

  /// Load a resource with the given ID.
  pub(in crate::resources) fn load_resource(res_id: &str) -> Result<Resource> {
    // ...
  }
}

There’s another option: pub(in crate::resources) struct ResourceSet (or in super::super), and use pub for associated items. This is what I do; I think it’s cleaner for associated items’ visibility to be controlled by the main item’s visibility[1]. I know there’s an issue open about adjusting the unreachable_pub lint (or something like that) to treat associated items differently.


  1. Well, more precisely, “the visibility of associated items intended to have the same visibility as the main item”, I don’t mean to say that none of my associated items are private. ↩︎

pub(in path) had to have in added because pub(not_a_keyword) was ambiguous (in struct definitions? I don't remember the exact syntax quirk). It probably blocks pub(api) too. You might have something with a keyboard like variations of pub(in mod) or pub(super api).

The pub visibility is only supposed to be used for items (types, traits, functions, etc.) that are part of the public API of the crate , not the module.

That's not accurate. pub is for the module. It is unfortunately ambiguous (not locally explicit) whether pub means it's part of the crate's public-public API or not.

And I'm disappointed that your proposal isn't the pub to clarify that reachability for pub items. I'd expect pub(api) to be that "absolutely definitely externally pub from outside of the crate", rather than a syntax sugar for internal non-public pub(in path) with a new invention for avoiding repeating the path.

Couldn't you replace !#[api] with use crate::something as api and use pub(in api)?

I thought that, generally, all pub items are suggested to be exported from the containing crate. The text of the unreachable_pub rustc lint docs say:

This lint is “allow” by default because it will trigger for a large amount of existing Rust code. Eventually it is desired for this to become warn-by-default.

I don't see it saying that pub methods are an exception here (at least when it's in warn mode).

Regardless, I can work through that example again, either replacing it or at least mentioning it as another example.

It shouldn't block pub(api) because, as mentioned, it would become a weak keyword. The set of possible first tokens in a pub(E) context I believe are self, super, and in otherwise. If that is the case, then introducing api as another should not be ambiguous.

While pub can be used within a module, the rustc lints documentation suggests that the only reason this isn't a default warning is that it would break too much code right now, and may change. See my reply to @robofinch about unreachable_pub.

I'm perfectly happy with replacing #![api] with another way of definining the boundary. I provide some alternatives I came up with off the top of my head in the RFC. While it may be possible to import a module as the name api, note that this is unrelated to my suggestion, as the pub(api) visibility modifier is not using api as a name in the current module scope. There are several potential benefits to having an alternate visibility, such as ensuring that non pub(api) items don't escape a module boundary, and to allow for the visibility of an API boundary module and its exported items to be modified with a single visibility at the module definition site.

For reference, here’s the issue I’m thinking of: Suggestion: Split `unreachable_pub` lint into multiple lints · Issue #112615 · rust-lang/rust · GitHub

I pretty much agree with @jplatte w.r.t. why I use the lint.

(The lint no longer triggers on public fields of private types, unless I’m sorely mistaken, but it still triggers on public methods of private types.)

3 Likes

I think the ambiguity would be nested tuples: struct TupleStruct(pub (u32, u32), u32). Without in, you can't easily disambiguate between a type path and a module path.

This is not too far removed from the ambiguity that made it hard to use crate as shorthand for pub(crate): struct TupleStruct(crate ::type).

4 Likes

I'm sure this has been discussed before. It would be cool to add an even more general and powerful feature instead.

vis api = vis!(in crate::resources);
vis anotherapi = vis!(in crate::resources);

pub(api) fn foo() {}
pub(anotherapi) fn bar() {}

This would completely solve the problem of

If we later decide to move the file_api module to a different location, we would have to update all of the visibility modifiers

And, I feel this gives developers a simpler model of visibility than needing to understand mod and #[api]. Adding another attribute #[api] doesn't feel like the most elegant solution here.

Granted, we are adding another keyword vis (bikeshedding allowed). And, we will need a syntax for vis!(). But, if we call my pre-pre-pre RFC idea here 'Vis Assignments', then it seems that Vis Assignments solve the same problems presented here but better. Happy to discuss. Vis Assignments would actually be pretty neat.

This could be an interesting idea on its own, but the goal of this RFC is a little different. Your problem your approach is solving is one part that I'm addressing, but I find the boundary enforcement at least as important. My argument is that it simplifies reasoning about internal dependencies, where you know that the only symbols that can be exported from a boundary module are those marked with pub(api), and the possible lint tooling can use the intended semantics to help out. The goal here was accomplishing that with the minimal syntactic and semantic changes to the language, which things like adding a vis top-level keyword would probably be too heavyweight for my purposes.

For your idea in particular, I'm curious if pub(<name>) is the best way of doing that, as in theory you should be able to use any visibility with that (not just pub). If you are already going to introduce the vis keyword for this purpose, maybe just using vis(<name>) would be better? Also, for the vis <name> = ... syntax, couldn't you just follow that with the visibility modifier itself?

Another question: what would the scoping be for the visibility name you define? If you define it in a submodule, would it only be defined in the modules below it? Would it only be valid at the crate root? Can visibilities be shadowed? Could visibilities have visibilities? :slightly_smiling_face:

Your last paragraph brings up many cool questions, for which in my mind there are clear derivable answers.

But I'm still confused. What problem does 'Vis Assignments' not solve again? The "boundary enforcement" aspect is simply a question of visibility and documentation, no? I claim that 'Vis Assignments' enforces all visibility requirements. If you mark as 'pub(api)', you will not be able to export beyond the visibility of the 'vis(in crate::resources)', and normal visibility rules apply as if you textually replaced pub(api) with pub(in crate::resources) as this is just an alias. About visibility of visibilities, I believe an initial implementation could disregard this by only allowing pub(self) vis api = ... which is identical to vis api = ..., and as such visibilities cannot form the public API of a crate.

My general philosophy is that, while documentation is great and necessary, it tends to be better if you can get the compiler to enforce a clear set of semantics that are used by multiple people frequently, as long as the semantics themselves are as simple as is reasonable. I have tried to do multiple things to enforce these kinds of boundaries in my own code, but I always end up having trouble glancing through the resulting code (with excessively long item visibilites), leading to me often adding the wrong visibilities, or at least slowing me down on my own code within a single crate. When I try to work with multiple (private) crates, I both have to worry about the publishing workflow, and often find that I'm supporting a bunch of dead code, as there is no lint or analysis that would check that all of the pub exports in a "private" crate are used elsewhere in the workspace (and that would tend to be a hard lint to write, IIUC).

This RFC is what I came up with after some thinking. Modules are (in some cases) used for the same purpose of defining an API boundary in multiple projects, and I would imagine being able to statically enforce this encapsulation would at least reduce the maintenance burden for a bunch of projects.

To answer your question directly: The problem that vis assignments would not solve is escaping of items outside of the module boundary. The following would not cause any compile errors and/or warnings with your proposal:

// in src/parent_mod/my_mod.rs
// We want only the things in the api visibility range to be available in the rest of the crate. Other
// items should not be exported from this module
vis api = vis!(crate);

// This is great: It will have the expected visibility.
vis(api) fn exported_fn() { /* ... */ }

// Also fine. Does not escape my_mod.rs
fn local_private_fn() { /* ... */ }

// Problematic: These items escape, even though it's not part of the module boundary
pub fn why_am_i_public() { /* ... */ }
pub(crate) fn better_but_no_cigar() { /* ... */ }
pub(in crate::parent_mod) fn use_me_other_mod() { /* ... */ }

Correct me if I'm wrong, but all of the above should theoretically compile with vis assignments. The problem is now there are some kinds of migrations and refactorings that are more difficult.

First, if better_but_no_cigar() is not part of our expected API (perhaps forgotten as part of a previous migration, where it was supposed to be made module private), and then we decide to delete the code, or modify the signature, we can find that there is other code elsewhere in the crate that uses it. This means that you have to read the code that's using it, and migrate that before making the change. This is annoying if you're the sole developer on a project, and could even be actively painful in a team setting, where the the code using it is not owned by you.

Second, if we want to move the module to another path, the use_me_other_mod() may no longer be accurate. We will update the vis api = ... declaration, but we will forget the other. If this is a large piece of code, this may not see that the visibility needs to be updated, and may be left as it is, leaving a trap for future maintenance.

Third, if we want to extract this to a crate, the better_but_no_cigar() declaration will compile naturally, but it isn't actually to the crate that previously depended on it, so it would create a build failure in the new crate (with the above migration issues). The use_me_other_mod() decl would just fail to build, as there is no crate::parent_mod path in this crate.

Fourth (and possibly least), the why_am_i_public() decl will be (potentially) exported from the crate. This will often be found via lint, but if by mistake someone ends up pub useing it somewhere, then it will become part of your public API without being intended.

The critical check is that those problematic visibilities all escape from the current module, which from my understanding of the compiler internals, is reasonably easy to do at the moment (since all visibilities are reduced to either pub, or pub(in <path>)). With my pub(api) proposal:

// in src/parent_mod/my_mod.rs
// We want only the things in the api visibility range to be available in the rest of the crate. Other
// items should not be exported from this module

// Mark this module as an API boundary, used by `pub(api)`
#![api_boundary]

// This is great: It will have the expected visibility.
pub(api) fn exported_fn() { /* ... */ }

// Also fine. Does not escape my_mod.rs
fn local_private_fn() { /* ... */ }

// ERRORS: These items would escape, so we cause errors

// `pub` item escapes from `crate::parent_mod::my_mod`
pub fn why_am_i_public() { /* ... */ }

// Path `crate` is not equal to or under `crate::parent_mod::my_mod`
pub(crate) fn better_but_no_cigar() { /* ... */ }

// Path `crate::parent_mod` is not equal to or under `crate::parent_mod::my_mod`
pub(in crate::parent_mod) fn use_me_other_mod() { /* ... */ }

// Resolves to path `crate::parent_mod`, which is not equal to or under
// `crate::parent_mod::my_mod`
pub(super) fn use_me_parent() { /* ... */ }

So while you can certainly make some complex visibility patterns more usable with vis assignments, these refactoring and migration hazards are still possible.

I think I understand your proposition better now. Admittedly I haven't been able to write large solo rust codebases recently so I've been a bit slow on the uptake. The "power" your RFC adds (in the general case, thinking beyond just APIs and its intended use cases) is opt-in #[api_boundary] compile-time exclusion of symbol exports, with a custom syntax pub(api)to whitelist exports beyond the boundary. Maybe its just me, but the benefits of this can be gained simply by being attentive and understanding existing visibility rules. Perhaps better developer tooling should be provided, to allow for IDEs/linters to help when refactoring with upholding a project's various invariants. Also, given mod api, you can always add another private module layer (e.g. imp or _api) and export the "api surface" all in the api module top level, guaranteeing only symbols named there are exported as the API surface. This pattern can already uphold the refactoring issues you've presented (to my understanding?), it just adds more lines of code.

Also, I've realised my 'Vis Assignments' idea is functionally identical to kornel's excellent suggestion:

Couldn't you replace !#[api] with use crate::something as api and use pub(in api)?

As in converting my original example,

// vis api = vis!(in crate::resources);
use crate::resources as api;
// vis anotherapi = vis!(in crate::resources);
use crate::resources as anotherapi;

// pub(api) fn foo() {}
pub(in api) fn foo() {}

// pub(anotherapi) fn bar() {}
pub(in anotherapi) fn bar() {}

The "bridge" between this patter ('Vis Assigments') and pub(api) symbol export exclusion is the addition of a private module boundary, with which you can gain the power of #[api_boundary] at the cost of increased code.

pub(in crate::anythingyouwant) mod api {
  pub use _api::{Something};
  mod _api {
    use crate::anythingyouwant as api;
    pub(in api) struct Something { };
  }
}

// OK
use api::Something;

// in crate::different
// COMPILE ERROR
use crate::anythingyouwant::api::Something;

I suppose that's fair. I do try to do that in most of my crates, where the top-level API crate is just a bunch of pub usees on private subcrates. I think I'd still prefer if there was a more explicit way of saying what the visibility was that was kept in sync with the API module visibility, but still, fair.

I'm slowly recalling my collection of motivations. Another one is that I often want to have docs for my own internal code that only show intentional API. From the Pre-RFC:

rustdoc can provide improved internal documentation by using the API boundaries as starting points. Right now, the option --document_private_items documents all items in a crate, not only those that cross an API boundary. By defining the internal API boundaries, there is a clear subset of items that should be exposed within internal documentation.

It's true that this could also be a minor update to rustdoc, but I would want to make it clear that it should be used in a mode that lets you specify which modules are supposed to have docs generated for them, and that there is a way to only show documentation for items that would go through that module boundary (and not private items, or items that have visibility below that module).

Personally when I look at the internal documentation, I really want to look at everything without exception. Often I am already looking at the source code and I want to conveniently search through it