[lang-team-minutes] the module system and inverting the meaning of public

These is a summary of our lang team discussion from yesterday. Our lang team meetings are now organized into two categories: triage meetings and focused discussion on a particular topic (on alternating weeks). This was a “focused discussion” week, and the topic was the module system, and in specific a “wild and crazy” idea that @withoutboats had for extending it. Ultimately we did not reach any firm conclusions, but a number of concerns about the idea were raised and my sense was that there was significant doubt if this was the right general direction. (If you’re interested, the raw “minutes” can be found here.

The order of the presentation here is my own. @withoutboats presented things in a different order. But since I’m writing the notes, I’m organizing it the way that makes the most sense to my brain. :wink: The important thing is that the proposal consisted of two complementary parts:

  • inverting the meaning of privacy to the >= interpretation (details below);
  • supporting implicit modules that don’t have to be declared.

Motivation

It’s worth spending some time on the motivation. I think the bottom line is that the module system is a frequent source of confusion, and that seems sad. Opinions abound on just what particular aspect is the most confusing, but things that I personally frequently hear:

  • the warnings about private-type-in-public-API are annoying and confusing
  • the need to declare modules (mod m;) rather than just importing from them is different from other languages

At the end of the meeting, I was left with the definite feeling that we needed to drill further in here to be more precise about just what problems we have and hope to solve. =)

Status quo

Today, when you declare a struct, you also give it some associated privacy. Using the pub(restricted) rules, structs can either be declared as “world accessible” (pub) or “private to some module in your crate”. The default is the to private to the current module, but one can write pub(super) to be “private to the parent module”, pub(crate) to be “private to the root of the crate”, or pub(in foo::bar) to be "private to the module foo::bar". Being “private to a module” means, roughly, that nobody outside of that module can name the type or possess an instance of that type, except in generic code. The “private type in public API” rules are intended to enforce that guarantee (but they have issues of their own that we are still working through). We named this interpretation the <= interpretation, because when you declare a struct as being public to a module M, you are saying that it is accessible only to submodules of M (i.e., modules “less than” M).

The proposal

The proposal keeps the pub(restricted) notation, but turns the meaning on its head. Under this idea, when you write pub(in foo) struct Bar, you are declaring that the struct Bar is accessible from at least the module foo, but indeed it may be accessible by anything outside of foo as well, if there is a pub use that exports it farther. The way I think about it is that when you declare a struct as being public to a module M, you are saying that it is (or should be, see below) accessible in this location from at least the module M. It may also be accessible from elsewhere, but only if there is a pub use, and hence from a different location. For this reason, we called this the >= interpretation of privacy.

So, for example, I might have a crate like this:

pub mod ty {
    pub struct Ty { ... } // world accessible
}

This is a good declaration, because when I declare Ty, I am saying that "the world should be able to name it as ty::Ty".

On the other hand, if I wrote this:

mod ty {
    pub struct Ty { ... } 
}

This is a less-good declaration, because although I said that ty::Ty should be “world nameable”, in fact the world cannot name it, because ty is private. This code is of course legal today, but presumably under withoutboats proposal there would be a lint warning you about this situation. Something like “you declared pub struct Ty, but it is contained within a non-pub module”. (You may have noticed that this is a very common setup today; we called it the “facade pattern”. Indeed it is, and there would be a better way to handle it, I’ll get to it shortly.)

Now, the important part is that something declared as private to a certain point can be named from elsewhere, but only as the result of a pub use:

mod ty {
    pub(super) struct Ty { ... } // visible to super as ty::Ty
}
pub use ty::Ty; // now visible to the world as `::Ty`

Here I declared Ty as pub(super), meaning that it is intended to be visible to my parent module at this location. But the parent module chooses to re-export it further, making it visible to the world, but under a different name (::Ty).

Facade pattern

The second example I gave above may have looked familiar to you:

mod ty {
    pub struct Ty { ... } 
}

Having a private module with a public type is actually a prety common pattern in Rust today. Typically it goes along with a pub use that re-exports the Ty name further:

mod ty {
    pub struct Ty { ... } 
}
pub use ty::Ty; // re-export to the world as `Ty`

We were calling this pattern the “facade” pattern. It’s a pretty common way to allow a given module or crate to have a lot of internal structure that is not exposted to the outside world. Under the <= interpetation – i.e., the current rules – the snippet above is perfectly reasonable. You declared Ty as being world-visible, so readers should be aware that it might be pub used to the world (though it doesn’t have to be).

But under the newer proposal, the code above is not the preferred way to write it. It would be better to use pub(super) on Ty, since the path ty::Ty is not “world accessible”.

A change in focus

The older rules were very oriented around the needs of unsafe code authors. Basically the <= interpretation makes it very easy to know what code is “potentially exposed” and how far. The <= interpretation, in contrast, tells you only how fact the current path is exposed: you have to ripgrep around to find out if there are any pub uses that expose this type further out.

The newer rules in contrast are oriented around a different need. They tell you have far the current path is exposed, and to deduce the proper way to refer to a type from any given location. i.e., if I see a pub struct Ty, I know that I should name it relative to the current module from anywhere. But if I see pub(super) struct Ty, I know that I can (should?) use the name Ty from the super module, but if I want to name it from elsehwere I have to find a pub use.

But there was another aspect to the proposal that I’ve neglected to mention at all so far: implicit modules.

Implicit modules

Another big part of the proposal was the idea of making mod ty declarations optional. The idea was roughly this: if you have a ty.rs file (or ty/mod.rs), you would get an implicit module ty declared for you. But then the question becomes, what should the privacy of this module be? The idea was to scan its contents and take the publicity of the most public member. So if foo contains a pub struct Ty, then you would have pub mod ty. This is consistent with the idea that Ty should be world nameable at the location ty::Ty. If you have a pub(super) struct Ty, you would be mod ty (because the privacy of the module is relative to the super module of its contents). (If you had a module with only private contents, presumably it would also be private.)

This means that if you basically just removed all the mod ty declarations from your source and you label all of your structs etc with the appropriate “public exposure levels” that you want (i.e., you tag it with the module that ought to be able to name it in its current location) everything “just works”.

Backwards compatibility note: There are some practical hurdles to overcome here: most notably the main.rs and lib.rs patterns in cargo today, but also stuff like projects with dead files, or projects that use #[path] attributes, etc. We mostly agreed to table these to try and decide if we even would want to do this thing in the first place!

Facade pattern: desirable or not?

The most obvious impact of this change is that the existing way of doing the facade pattern would probably be deprecated. Rather than having a private module with public contents, you would prefer to declare the contents pub(super) and then re-export. We spent a while debating about this.

One thing that I contended is that I find the existing facade pattern can make it hard for me to figure out my own code. Often I have a module with mixed contents like this:

mod internal {
    // Part of the public API for the crate
    pub struct PublicType { .. }
    
    // Only used elsewhere within the crate, not publicly visible
    pub struct CrateType { .. }
    
    // Private to this `internal` module
    struct PrivateType { ... }
}

Now note that from the declarations alone I cannot tell which parts are “crate-local” and which are not. I have to either read the comments, or look at the super-module, which should have pub use declarations:

mod internal; // declares the module whose contents are above
pub use self::internal::PublicType; // re-export this to the world

In particular, I also have to be very careful not to write something like pub use self::internal::*, because that would expose too much!

Now, with the <= interpretation of pub(restricted), I can make declarations at the source site that distinguish between these cases:

// Under `<=` interpretation:
mod internal {
    // Part of the public API for the crate
    pub struct PublicType { .. }
    
    // Only used elsewhere within the crate, not publicly visible
    pub(super) struct CrateType { .. }
    
    // Private to this `internal` module
    struct PrivateType { ... }
}

Now I don’t have to look at the parent crate to know just how “public” each type is (although it may happen that the parent crate forgot to re-export PublicType, that’s also of interest – I think we should target that with a lint). I also don’t have to worry about the parent crate accidentally exporting too much, since if it tries to do pub use self::internal::*, it will get an error, because it re-exported internal::CrateType farther than it was allowed to.

In contrast, under the >= interpretation, pub(restricted) is not helpful; everything needs in internal needs to be declared as `pub(super):

// Under `>=` interpretation:
mod internal {
    // Part of the public API for the crate, 
    // but that is from a different path, so we only declare `pub(super)` here:
    pub(super) struct PublicType { .. }
    
    // Only used elsewhere within the crate, not publicly visible
    pub(super) struct CrateType { .. }
    
    // Private to this `internal` module
    struct PrivateType { ... }
}

Other things we talked about

We also discussed some other ideas:

  • Maybe instead of inferring the publicity of an implicit module from its contents, we could use a declaration at the top like pub mod; or pub; or #![pub] that declares the publicity of the module in its surrounding context. This at least would mean you don’t have to have a mod foo
  • The current system of declaring mod foo is nice and explicit, but also easy to forget, which is annoying
  • nmatsakis really likes making it very easy to know the full set of code that will be compiled:
    • mod foo is both helpful and a hindrance here
    • it’s explicit, but also easy to forget
    • particularly with inherent impls nad things that don’t need to be imported to be used
4 Likes

I find the idea of <= privacy combined with a lint that warns you if supposedly “public” things are not fully exposed – or at least equivalently exposed – a pretty nice compromise. It does mean that when you see pub struct Ty you don’t know off-hand the “public path”, but you do know that one exists (or else a lint warning would have occurred).

5 Likes

I just wanted to add a few more notes here. I think that if crate-privacy suffices, one could write pub(crate) to recover the distinction I was aiming for. The result then is that you're either writing pub(super) (which is kind of code for "I will re-export this to the world eventually, but not from here") or pub(crate) (internally people can name it here).

There has been some ongoing debate on whether it is useful to have "privacy horizons" within a crate that are more fine-grained that "this module", "this crate", and "world". I find it useful to create submodules with structure that share state but that state is not exposed more broadly, but others disagree, and feel that crate-level is "good enough" for such cases. This may be a function of the size of your crate, but I suspect it's also a matter of your personal style. I certainly agree that crate-local is good enough in the sense it enables one to easily ripgrep around to find uses, and so maybe in practice it suffices, but it feels like then that there is actual internal structure (this field really ought to only be used for things in the ty module, not anywhere in the crate) that is getting lost and has to be moved into a comment.

I also find that pattern useful, to ensure that I'm designing things with the modularity that I have in mind.

I use this pattern a lot, and not necessarily with the intention of re-exporting it out of the crate (or the super mod). Many times, I use it to be able to isolate responsibilities to a certain module, which is especially important when unsafe is needed. I still don't necessarily want to export the Ty to the world, but I wanted to protect "Future Me" from reaching in and doing something bad because he forgot how a certain invariant was enforced. Even without unsafe, separation of ideas is still a good thing in general.

A lint would be nudging me to use new syntax. By doing so, I'd be effectively raising the minimum rustc version that can compile this crate. As a library author, I try to only do that for 1) security or bug fixes, 2) the added functionality is just that important. Some alternative syntax to express the same thing I can today wouldn't fit that criteria.

3 Likes

After the meeting, from my perspective there are three abstract options:

  1. <=: today's system, you can tell the maximum an item is exposed by looking at it, but you can't tell if it is actually that exposed.
  2. >=: you can tell that an item is exposed at least this much, but you can't gaurantee it is that exposed.
  3. ==: you can tell exactly how exposed an item is by looking at its definition.

The third would mean making the changes I proposed without changing how pub(restricted) works. The implication of that would be that if you facade, your item is also always visible at its true path to anyone who can see its facade path. This doesn't mean you can't facade, just that you can't hide the actual definition. (You can make it undocumented, though, with a doc(hidden) attribute).

I incline most toward the third option. We seem to have a "choose two" situation:

  • I know its not more exposed than it says.
  • I know its at least as exposed as it says.
  • I can expose it at a different point without exposing it here.

I think 1 and 2 is the best choice, in the abstract.

This sounds like you ought to use pub(super) once its stable (setting aside your forward compatibility concerns from the subsequent paragraph which are important but more generally applicable.)

1 Like

That inversion looks very wrong to me. It's not the job of the module itself to know about how the rest of the code is organized.

A module should be something that exposes an API, and pub(super) essentially means that that we decide what the API of the parent module is.

4 Likes

I have a C/C++ background and I had absolutely zero difficulty in understanding the modules system when I discovered Rust.

What's the criteria to decide whether something being confusing is Rust's fault or the users' fault for not trying enough?

3 Likes

I agree with this, except that I think that the possibility of a lint may change the calculus here. It seems to me that we can't actually achieve ==, because there is a lot of existing code that is written with the current <= semantics in mind. But we can add a lint that encourages you to adopt the == approach by warning if your types are not as exposed "as they could be", and we can make it easy to have the == approach (via the implicit modules, perhaps).

I had also considered a hybrid: a lint that checks for "consistency" in how far things are exposed. In particular, I rarely (but not never) have the problem that I completely forget to pub use items from module X that are meant to be public. But I do have the problem of later adding new public bits of the API and forgetting to export that. So, a lint that says "you exposed item public item X all the way to the world, but it returns a Y, and Y is not public to the world" might be helpful, while at the same time not affecting existing code (as long as that code doesn't have the sorts of inconsistencies the lint is setting out to detect).

(This lint is more-or-less what the original private-in-public RFC proposed; at the time, we felt the errors were very confusing. I've since come to think that with some more effort, we could craft a better error, for example by giving example paths that show how an item is exposed.)

We're miscommunicating; I don't understand what your response means.

pub(super) doesn't mean "this is exposed by the parent module," it means "this is exposed to the parent module (and not beyond that)." It allows the module to make a more nuanced statement of what its public API is - "this is my public API to my parent;" "this is my public API to the crate;" "this is my public API to the world." If you don't want to expose this outside of the X, you should annotate it as such at the item level so you don't forget.

In any event that's largely unrelated to this discussion, pub(restricted) is not a part of this proposal: Tracking issue for `pub(restricted)` privacy (RFC #1422) · Issue #32409 · rust-lang/rust · GitHub

That's what I think is really wrong.

The module that puts pub(super) on an item decides that its parent doesn't expose the item. In other words the module decides for its parent.

I wasn't aware of pub(super) unfortunately.

2 Likes

It sounds like you’re in favor of allowing the parent to re-export anything visible to it, which is a discussion ongoing on that tracking issue.

I don't really see that as a useful question. I view it differently. Basically, can we keep the strengths while being less confusing? If so, it seems like a net win.

I don't want to make short-sighted decisions that are convenient at first but annoying later. I want to make decisions that help both users.

I feel like in this discussion we've been focused on that goal, though: what are the downsides of today's module system? In my view, there are some, even as a very experienced user (e.g., the problem I mentioned of not being able to sort out at a glance how far my items are exposed is something I have trouble with). There are also downsides of which I am aware but I don't personally have as much trouble with (e.g., being surprised to have to declare mod -- although I also forget fairly regularly to do so).

But the system has strengths too. For example, the facade pattern adds complexity, but it gives you the ability to have a very nuanced expression of privacy that I think is very cool and very useful. In general, being able to have the implementation be organized differently from the public interface seems like it can improve code readability.

So I'd like to know what we can do to help keep those strengths while addressing some of the downsides. I don't think that the proposal as originally floated quite did that (it lost some of the strengths, imo) -- but perhaps some variant does.

3 Likes

I’m going a bit off-topic here, but I’m feeling more and more distant from the Rust community in general. (by the way, even more off-topic, I suspect some people are more or less boycotting my person, but I don’t want to start a shitstorm here ; I’m just saying this because it contributes to that feeling)

I personnally came to Rust because I was tired of languages doing magic things behind your back. I came looking for a language where everything’s explicit, in order to avoid as many mistakes as possible and to cover all possible corner cases.

Right now I’m not using the ? operator because it’s too magic (I very often use match result, because clearly seeing all the possible paths of a function has avoided me many bugs in the past), I won’t use short field initialization because it’s too magic, I don’t use serde or rustc_serialize because they’re too magic, and if this modules rework lands I know I’m not going to use it because it’s too magic. All these changes have been here in the name of “this aspect is confusing for beginners”.

There’s always this argument that if you don’t like a feature you can choose to not use it, but if this rework ends up landing I’m going to be wondering if I’m still writing the same Rust language as everyone else.

5 Likes

With respect, I actually think that none of these changes have been done because "this is confusing to beginners", or at least not solely for this reason. For example, one of the the strongest reasons to use ? was that the try! macro had to be used in prefix position, making call chains like foo()?.bar() much more convenient. Short field initialization was done because it eliminates a common source of boilerplate, not because it is easier for beginners (I at least find it very convenient). Writing serialization code by hand is tedious and error-prone, so using a tool like serde to generate it for you seems rather more reliable to me. I guess what you meant is that these changes were done for ergonomic reasons, and I agree with that, but I don't think they specifically targeted beginners, nor do I think they are actively detrimental for experienced users.

9 Likes

I agree entirely, and was my reason for personally disapproving of the pub(scope) syntax.

True, though even with a tracking issue, that doesn't mean the feature will finally be promoted. Through use, learning, and more nuanced walkthroughs for or against, a feature can be removed instead of stabilized.

Of course, but since the feature’s in FCP you should probably express your concerns on the tracking issue

I like the facade pattern (it feels the same as patterns in CommonJS modules), so to me privacy is not a property of the module itself, but rather where it is exposed. I’d like implicit modules to be private by default, and have ability to make them public anywhere via pub use ty.

use foo;
pub use bar;

should be equivalent to:

mod foo;
pub mod bar;

And:

use submodule::foo;
pub use submodule::bar;

should be equivalent to (oh, that’s why modules are confusing):

mod submodule {
   mod foo;
   pub mod bar; // does it even need to be pub? I don't know
}
use submodule::foo;
pub use submodule::bar;
2 Likes

I don’t like implicit module visibility to be based on most public symbol in the module, because I won’t be using pub(restricted). I’ll continue using plain pub (it’s shorter and works great with facade), but I don’t want my modules to implicitly become an external API of my crate.

I only care about public API of my crate. In the crate everything is under my control. If I ever want to more fine-grained privacy within the crate, I’ll split the crate into two crates instead. So in the end I only need 3 levels:

  • items private to a struct or module (implementation details),
  • crate-public (everything I use),
  • and externally visible public API that I’m committing to.
2 Likes

Well if you want anecdotes, I came from Python and found the module system confusing. In Python, modules are first class objects that are implicitly defined by the filesystem. Rust modules mostly work the same way except that there are seemingly extraneous mod declarations required everywhere, and you can declare new modules within a single file if you feel like it. Also, there's a bunch of magical weirdness when it comes to macro definitions and visibility.

No language I am aware of does things quite the same way as Rust, though now that you point it out, I can see how it kind of sort of resembles C++ namespaces. However, it still doesn't work quite the same way, as C++ code is never implicitly placed within a namespace and namespaces can be defined across an arbitrary number of files.

If you look at other popular languages like Java and Go, packages are implicitly defined by the filesystem, but there is a redundant declaration at the top of each file telling which package it is in.

I think what really makes Rust confusing is that it has both modules implicitly defined by the filesystem and explicitly defined modules within another file. IMO, the later isn't useful enough to justify the cost, but that ship sailed long ago.

2 Likes