These is a summary of our lang team discussion from yesterday. Our lang team meetings are now organized into two categories: triage meetings and focused discussion on a particular topic (on alternating weeks). This was a “focused discussion” week, and the topic was the module system, and in specific a “wild and crazy” idea that @withoutboats had for extending it. Ultimately we did not reach any firm conclusions, but a number of concerns about the idea were raised and my sense was that there was significant doubt if this was the right general direction. (If you’re interested, the raw “minutes” can be found here.
The order of the presentation here is my own. @withoutboats presented things in a different order. But since I’m writing the notes, I’m organizing it the way that makes the most sense to my brain. The important thing is that the proposal consisted of two complementary parts:
- inverting the meaning of privacy to the
>=
interpretation (details below); - supporting implicit modules that don’t have to be declared.
Motivation
It’s worth spending some time on the motivation. I think the bottom line is that the module system is a frequent source of confusion, and that seems sad. Opinions abound on just what particular aspect is the most confusing, but things that I personally frequently hear:
- the warnings about private-type-in-public-API are annoying and confusing
- the need to declare modules (
mod m;
) rather than just importing from them is different from other languages
At the end of the meeting, I was left with the definite feeling that we needed to drill further in here to be more precise about just what problems we have and hope to solve. =)
Status quo
Today, when you declare a struct, you also give it some associated privacy. Using the pub(restricted)
rules, structs can either be declared as “world accessible” (pub
) or “private to some module in your crate”. The default is the to private to the current module, but one can write pub(super)
to be “private to the parent module”, pub(crate)
to be “private to the root of the crate”, or pub(in foo::bar)
to be "private to the module foo::bar
". Being “private to a module” means, roughly, that nobody outside of that module can name the type or possess an instance of that type, except in generic code. The “private type in public API” rules are intended to enforce that guarantee (but they have issues of their own that we are still working through). We named this interpretation the <=
interpretation, because when you declare a struct as being public to a module M, you are saying that it is accessible only to submodules of M (i.e., modules “less than” M).
The proposal
The proposal keeps the pub(restricted)
notation, but turns the meaning on its head. Under this idea, when you write pub(in foo) struct Bar
, you are declaring that the struct Bar
is accessible from at least the module foo
, but indeed it may be accessible by anything outside of foo
as well, if there is a pub use
that exports it farther. The way I think about it is that when you declare a struct as being public to a module M, you are saying that it is (or should be, see below) accessible in this location from at least the module M. It may also be accessible from elsewhere, but only if there is a pub use
, and hence from a different location. For this reason, we called this the >=
interpretation of privacy.
So, for example, I might have a crate like this:
pub mod ty {
pub struct Ty { ... } // world accessible
}
This is a good declaration, because when I declare Ty
, I am saying that "the world should be able to name it as ty::Ty
".
On the other hand, if I wrote this:
mod ty {
pub struct Ty { ... }
}
This is a less-good declaration, because although I said that ty::Ty
should be “world nameable”, in fact the world cannot name it, because ty
is private. This code is of course legal today, but presumably under withoutboats proposal there would be a lint warning you about this situation. Something like “you declared pub struct Ty
, but it is contained within a non-pub
module”. (You may have noticed that this is a very common setup today; we called it the “facade pattern”. Indeed it is, and there would be a better way to handle it, I’ll get to it shortly.)
Now, the important part is that something declared as private to a certain point can be named from elsewhere, but only as the result of a pub use
:
mod ty {
pub(super) struct Ty { ... } // visible to super as ty::Ty
}
pub use ty::Ty; // now visible to the world as `::Ty`
Here I declared Ty
as pub(super)
, meaning that it is intended to be visible to my parent module at this location. But the parent module chooses to re-export it further, making it visible to the world, but under a different name (::Ty
).
Facade pattern
The second example I gave above may have looked familiar to you:
mod ty {
pub struct Ty { ... }
}
Having a private module with a public type is actually a prety common pattern in Rust today. Typically it goes along with a pub use
that re-exports the Ty
name further:
mod ty {
pub struct Ty { ... }
}
pub use ty::Ty; // re-export to the world as `Ty`
We were calling this pattern the “facade” pattern. It’s a pretty common way to allow a given module or crate to have a lot of internal structure that is not exposted to the outside world. Under the <=
interpetation – i.e., the current rules – the snippet above is perfectly reasonable. You declared Ty
as being world-visible, so readers should be aware that it might be pub used to the world (though it doesn’t have to be).
But under the newer proposal, the code above is not the preferred way to write it. It would be better to use pub(super)
on Ty
, since the path ty::Ty
is not “world accessible”.
A change in focus
The older rules were very oriented around the needs of unsafe code authors. Basically the <=
interpretation makes it very easy to know what code is “potentially exposed” and how far. The <=
interpretation, in contrast, tells you only how fact the current path is exposed: you have to ripgrep around to find out if there are any pub uses that expose this type further out.
The newer rules in contrast are oriented around a different need. They tell you have far the current path is exposed, and to deduce the proper way to refer to a type from any given location. i.e., if I see a pub struct Ty
, I know that I should name it relative to the current module from anywhere. But if I see pub(super) struct Ty
, I know that I can (should?) use the name Ty
from the super module, but if I want to name it from elsehwere I have to find a pub use
.
But there was another aspect to the proposal that I’ve neglected to mention at all so far: implicit modules.
Implicit modules
Another big part of the proposal was the idea of making mod ty
declarations optional. The idea was roughly this: if you have a ty.rs
file (or ty/mod.rs
), you would get an implicit module ty
declared for you. But then the question becomes, what should the privacy of this module be? The idea was to scan its contents and take the publicity of the most public member. So if foo
contains a pub struct Ty
, then you would have pub mod ty
. This is consistent with the idea that Ty
should be world nameable at the location ty::Ty
. If you have a pub(super) struct Ty
, you would be mod ty
(because the privacy of the module is relative to the super module of its contents). (If you had a module with only private contents, presumably it would also be private.)
This means that if you basically just removed all the mod ty
declarations from your source and you label all of your structs etc with the appropriate “public exposure levels” that you want (i.e., you tag it with the module that ought to be able to name it in its current location) everything “just works”.
Backwards compatibility note: There are some practical hurdles to overcome here: most notably the main.rs
and lib.rs
patterns in cargo today, but also stuff like projects with dead files, or projects that use #[path]
attributes, etc. We mostly agreed to table these to try and decide if we even would want to do this thing in the first place!
Facade pattern: desirable or not?
The most obvious impact of this change is that the existing way of doing the facade pattern would probably be deprecated. Rather than having a private module with public contents, you would prefer to declare the contents pub(super)
and then re-export. We spent a while debating about this.
One thing that I contended is that I find the existing facade pattern can make it hard for me to figure out my own code. Often I have a module with mixed contents like this:
mod internal {
// Part of the public API for the crate
pub struct PublicType { .. }
// Only used elsewhere within the crate, not publicly visible
pub struct CrateType { .. }
// Private to this `internal` module
struct PrivateType { ... }
}
Now note that from the declarations alone I cannot tell which parts are “crate-local” and which are not. I have to either read the comments, or look at the super-module, which should have pub use
declarations:
mod internal; // declares the module whose contents are above
pub use self::internal::PublicType; // re-export this to the world
In particular, I also have to be very careful not to write something like pub use self::internal::*
, because that would expose too much!
Now, with the <=
interpretation of pub(restricted)
, I can make declarations at the source site that distinguish between these cases:
// Under `<=` interpretation:
mod internal {
// Part of the public API for the crate
pub struct PublicType { .. }
// Only used elsewhere within the crate, not publicly visible
pub(super) struct CrateType { .. }
// Private to this `internal` module
struct PrivateType { ... }
}
Now I don’t have to look at the parent crate to know just how “public” each type is (although it may happen that the parent crate forgot to re-export PublicType
, that’s also of interest – I think we should target that with a lint). I also don’t have to worry about the parent crate accidentally exporting too much, since if it tries to do pub use self::internal::*
, it will get an error, because it re-exported internal::CrateType
farther than it was allowed to.
In contrast, under the >=
interpretation, pub(restricted)
is not helpful; everything needs in internal
needs to be declared as `pub(super):
// Under `>=` interpretation:
mod internal {
// Part of the public API for the crate,
// but that is from a different path, so we only declare `pub(super)` here:
pub(super) struct PublicType { .. }
// Only used elsewhere within the crate, not publicly visible
pub(super) struct CrateType { .. }
// Private to this `internal` module
struct PrivateType { ... }
}
Other things we talked about
We also discussed some other ideas:
- Maybe instead of inferring the publicity of an implicit module from its contents, we could use a declaration at the top like
pub mod;
orpub
; or#![pub]
that declares the publicity of the module in its surrounding context. This at least would mean you don’t have to have amod foo
- The current system of declaring
mod foo
is nice and explicit, but also easy to forget, which is annoying - nmatsakis really likes making it very easy to know the full set of code that will be compiled:
-
mod foo
is both helpful and a hindrance here - it’s explicit, but also easy to forget
- particularly with inherent impls nad things that don’t need to be imported to be used
-