[lang-team-minutes] the module system and inverting the meaning of public

ahmedcharles · February 21, 2017, 6:05am

Overall, I'm not sure why comparing languages is relevant to the discussion, when all of them require an explicit indication when a module is created. And more importantly, all of them are different.

I don't think it's an obvious ergonomics and beginner friendliness win.

I agree that Rust is different. But that's not interesting.

Though, it seems like we're talking past each other.

My primary point is: The current implementation of the spec lacks quality of implementation features which do not require changing the language to implement and try to improve. Why is the initial suggestion, a change to the language? In most mature languages (which Rust wants to become soon), changes to the language are rare and only done when all other avenues are exhausted. I'm never going to suggest this proposal should never be added to the language, but I fail to understand why changing the language is the first thing proposed. Note: This seems to be common here, because almost every thread is suggesting a language feature or change to fix some papercut, when other alternatives are not considered or discussed. Perhaps because people view this as a language design discussion board rather than a place to discuss the evolution of a language and it's ecosystem.

Anyways, does it require an RFC to add a warning to rustc? If not, would anyone like to mentor me adding warnings to rustc which would help teach users about the differences between use and mod when they get it wrong? If users still complain about the distinction after we've run out of appropriate warnings to add and documentation to adjust, perhaps a change is warranted, but I'd much rather avoid adding more implicit things to the language that people get to avoid and not learn just so we can check a usability box on a 2017 roadmap post.

ahmedcharles · February 21, 2017, 7:22am

I created https://github.com/rust-lang/book/issues/460 which is an issue against the new version of the book to address what I think is confusing wording which may lead people to misunderstand the distinction between use and mod.

steveklabnik · February 21, 2017, 8:01pm

Yes, in general, new lints require an RFC. This is for a number of reasons, but one of them is that they're likely to break existing code.

clippy, however, adds new lints all the time, and does not.

withoutboats · February 21, 2017, 8:05pm

I haven’t followed the discussion super closely but I think @ahmedcharles is actually talking about adding more notes to existing error messages, which doesn’t require an RFC.

ahmedcharles · February 21, 2017, 11:23pm

Well, compared to changing the semantics of the language, lints seem like less heavy-weight approach and adding lints to clippy wouldn’t be a solution to the problem that this proposal attempts to solve, so RFC it is?

ahmedcharles · February 21, 2017, 11:25pm

I think there would be at least one new warning, which would look for files which do not have mod statements and suggest adding them.

The rest can probably be done by improving existing warnings/errors.

nikomatsakis · February 22, 2017, 1:17am

Just to be clear, under the proposal I was discussing, there would be no "breaking change" in the strict sense. That is, all existing code would retain precisely its current semantics. However, some patterns that were once common would have been sort of... deprecated (at least without adding more precise pub(restricted) declarations). I don't think, in any case, that we are going to adopt that proposal. But I want to clarify that nobody wants to (or was proposing to) change the semantics of existing code.

That said, I think it's an interesting question as to whether even a deprecation of this kind would fly. It seems related to e.g. the dyn Trait question -- but it would almost certainly affect more code!

durka · February 22, 2017, 4:05pm

It seems like you're saying "the old defaults were good for unsafe code writing, but bad for usability [debatable]". OK, that's worth fixing, but Rust can't give up its features that let you provably isolate unsafe code from safe code. How are the needs of unsafe code authors preserved with the proposed models?

nikomatsakis · February 22, 2017, 6:12pm

It does seem like that, but I'm not really. I think the needs of unsafe code authors just puts these needs into starker relief. In my opinion, the <= model is well-suited to any system that wants to support "closed" reasoning. i.e., it optimizes for making it easier to know just how much code can see a given item. To be honest, I'm not entirely sure what >= optimizes for -- it seems like it optimizes for knowing what is the proper path to an item from a given point.

Alex_Burka · February 22, 2017, 6:24pm

But how you can recover “closed” reasoning with a >= model, when you need it in order to trust some unsafe code?

nikomatsakis · February 22, 2017, 8:25pm

ripgrep – you know the set of places you have to search to find a pub use, if any exists.

withoutboats · February 22, 2017, 9:45pm

Its important to remember this is always within a single crate.

This is also a rather narrow concern. You can’t re-export fields. It would have to be a type which if exposed could be used to break some invariants. That type would need to have methods which are “safe” but break invariants unsafe codes rely on. Given that situation, my own feeling is that you are already engaging in bad practice if you haven’t marked those methods unsafe. (Similar to the arguments in favor of unsafe fields).

I think this is more valid as a general ‘encapsulation’ concern than a concern about unsafe code in particular. The change would mean you have to be suspicious when you see a pub use during code review.

But as I already posted, we can have exact exposure if we give up the notion of hiding the true path of an item. We can also consider providing a less fine grained form of module privacy - you would just attach #![internal] to your modules that are “implementation details” that you don’t want showing up in the public path of your API. This would be equivalent to marking your module doc(hidden) and pub(crate).

Setting aside the backwards compatibility concerns (which are real, but again, are outside the scope of the discussion we’re trying to have), that seems very close to the ideal system to me.

aturon · February 24, 2017, 9:53pm

So @withoutboats and I had a pretty interesting chat today about the module system and privacy, trying to dig down to first principles. I’d like to summarize that here.

A rational reconstruction of the status quo

First, the module system currently couples together three concerns:

Namespacing
Privacy scopes
Interaction with the file system

While in principle you could separate these concerns, there are a lot of advantages to tying them together. (And of course, there’s some flexibility here given that inline mod declarations are a thing).

One consequence of this coupling, though, is that when there’s a mismatch between these dimensions, you have to fight the module system a bit. For example, it’s pretty common to have a submodule in a file which defines a single public type, along with a bunch of private code – where the module itself isn’t exported, and instead the public type is reexported at a different path. In that case, there’s a mismatch between what you want for privacy/file system – i.e., a separate file defining its own scope of privacy – and what you want for naming, which is to export the name at some other path.

For these kinds of mismatches we tend to use “facades” and other patterns. Generally these all involve making a submodule private, while some of its contents are public and re-exported. This is basically a “design patter”, a way of using the tools of our module system to achieve a certain goal.

One consequence of this design pattern is that pub has two distinct meanings:

The item is world-visible
The item is defined “in the wrong place” and re-exported elsewhere, but you need to trace the re-exports to discover its visibility.

The pub(restricted) model makes the most sense with the first meaning of pub, since the (restricted) part is supposed to decrease the level of publicity. But because of re-exports, that it doesn’t work out perfectly.

What this means for module system improvements

There are a fair number of ideas in flight for how to simplify and streamline the module system. I think that what @withoutboats has been working toward is basically doubling down on the three-way coupling mentioned above.

Here’s a strawman proposal bringing these pieces together (which came out of discussion with @withoutboats):

Introduce implicit modules, as per the original post. In particular, discover modules via the filesystem organization, implicitly introducing appropriate mod delcarations. (A file ty.rs leads to an implicit mod ty; delcaration, modulo privacy, discussed next.)
Default the visibility of implicit modules to be the maximal visibility of the items they transitively contain. Again, this is as per the original post.
Keep the semantics of pub(restricted) as they are today, in particular the <= interpretation of privacy. At this point there’s widespread agreement that <= privacy is very important to provide, and that pub(restricted) gives a good model for specifying it.
Introduce #![internal] as a module-level attribute that can be applied to implicit modules. It has the effect of setting the visibility of that module as pub(crate), regardless of the items it contains.

What are the implications of this design?

You get = privacy by default. In particular, if you don’t use #![internal], the stated visibility of an item is its precise visibility.
You always get <= privacy, as was the intent with pub(restricted) and our overall privacy system.
You can express the facade pattern, by using #![internal]. However, the fact that you write this attribute helps mitigate some of the downsides of the facade pattern: a reader of the code gets a visible, local heads-up that the actual visibility of the items is going to be determined by a super module that re-exports them (though bounded by their local restriction). In other words, we’ve made the design pattern a bit more first-class, and given you a way to write down your intent. But the attribute doesn’t introduce anything fundamentally new; it’s just a way of specifying pub(crate) for the module.
Increases the coupling between files and namespaces, by discovering modules directly from the filesystem. (It almost always works that way in practice today, but you have to explicitly set it up. We can make it so much simpler and smoother.)
Increases the coupling between privacy and the rest of the system, by automatically setting the visibility of implicit modules based on their items.

Where this all heads, in my mind, is an approach where the module system just sort of disappears into the background. You don’t really think about modules at all; you just think about paths, files, and items contained in them. Things like the facade pattern, while still expressible, are a clearly delineated and more “advanced” feature, so you can start with a simpler mental model (where you get = privacy everywhere) and later learn how to tweak it with #![internal]. I think there’s a real chance from going from “Rust’s module system is complex and hard to learn” to “What module system? I just write code where it belongs and get the namespacing and visibility I would expect”.

To the extent that things are more “implicit” here, I would claim that we are just using already-explicit information that would otherwise be redundant. In the vast majority of cases, the module hierarchy exactly mirrors the filesystem hierarchy; why force you to repeat that structure? It can’t be for the sake of explicitness – it’s already explicit, just represented in a different way. Likewise with privacy: the appropriate privacy of a module is usually implied by the privacy of its items. And when that’s not right, we actually give you greater explicitness by allowing you to express your intent via #![internal].

This also doesn’t change any of the scoping rules; you still have to use an item to gain access to it. So all bindings for a module are discoverable within the file defining it.

nikomatsakis · February 24, 2017, 10:38pm

So I’m excited about this plan. We’ll have to talk about the back-compat issues, but I’ll ignore them for a bit. I wanted to re-state something that we discussed on IRC that I didn’t fully appreciate (though you said it). That if you step back a minute from the system we have today, and imagine instead the system that we would have with this scheme, it basically looks like this:

You write code in files. You never write mod declarations (assuming you don’t use inline modules).
Within a crate, you can always name items from elsewhere in the crate at their “true path” (that is, the path implied by the location of the file that they are in).
- Note however that you might get a privacy error if the item is not “sufficiently public”.
From outside a crate, you can usually name things at the “true path”, unless they resident (transitively) within an #![internal] module
- In that case, there should be a pub use that re-exports them at some other path
- It seems like this would be a good use for a lint: make sure that there exists some path to every pub item from outside the crate (and perhaps a unique path?)

What I wrote above isn’t quite right, maybe, in that you might not be able to name something from internal to the crate from anywhere else in the crate, because maybe the publicity of the module is “more narrow” than that (if it only contains e.g. pub(super) items). We’d have to work those rules out in detail. But it seems like we could even make it a “binary switch” (pub(crate) or pub, depending on whether it contains (transitively) a pub item that is not within a #![internal] module) and it would be good enough. After all, you could then always name the module but not its contents (if they are declared with a more narrow level of privacy).

That is certainly an appealing vision to me!

aturon · February 24, 2017, 10:50pm

I hadn't noticed that, but it's an excellent point. Again, it just feels simple and streamlined, moving in the direction of purely local reasoning about both visibility and namespacing.

nagisa · February 24, 2017, 11:43pm

I’m arguing for keeping ability/necessity to explicitly name the crate structure within source; there’s multiple reasons, all mostly very particular to how I do my devstuffs.

Firstly, I use “UNIX as IDE”. I’m aware that is quite an uncommon use case, but that’s what I’m used to. My development routine involves more than just vim - cargo - run cycle. I tend to leave around a lot of files that do not necessarily involve my current checkout. For example this is what git status of my primary rust-lang/rust checkout looks like (last time I cleaned up was a few weeks ago):

HEAD detached at real/master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   src/librustc/mir/tcx.rs
	modified:   src/librustc_mir/transform/qualify_consts.rs
	modified:   src/librustc_mir/transform/type_check.rs
	modified:   src/librustc_trans/mir/rvalue.rs
	modified:   src/llvm (new commits)

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	src/bootstrap/config.toml
	src/libcore/target/
	src/librustc_mir/transform/inline.rs
	src/librustc_mir/transform/intrinsics.rs
	src/rustc/std_shim/target/
	src/target/
	test
	test.rs
	test.s
	test2
	test2.rs
	worktrees/

There’s some in progress playground that just persists between checkouts (e.g. transform/inline and transform/intrinsics) that I never got back to. These files would not even compile with current checkout. Since this is a pretty clean state, you don’t see a number of .rs files that most commonly appear because I checked out a differently named file from another WIP branch, an older commit or different project altogether (mostly to diff against or have open in a vim buffer for reference).

These files are never something I would like rustc to even consider for compilation, and to me it would be obnoxious if rustc changed to an implicit scheme. If it did, I would most likely spend some time to figure out why my code does not compile or even worse compiles and I accidentally depend on it without intending to check-in the temporary junk. And then some extra time every time to rename offending files to some other filename like transform/intrinsics.rs.this-is-junk-rustc-please-no-touch-import-compile).

Then there’s also partial outputs of commands such as what would result from something | vim - that I later write to disk so I wouldn’t have to craft the something command again. These rarely end up having the extension .rs but when the result of something is a valid(-ish) rust file, there’s no reason to save it with some other extension. I also don’t really pay attention to my $PWD, so it kinda ends up within src sometimes.

Quite a different concern (not something related to my current use-cases) is the tendency of various environments to restrict inspection of the filesystem. There’s a difference between recursively going through the whole tree looking for files ending in .rs and open("some/particular/file.rs"). From the standpoint of some sandbox implementation anyway.

Then there’s filesystems which have arbitrary limitations not applicable to a module system (no directory support (or limited depth, how you gonna scan this for a module tree of all things? S3 is example), short filenames (ye olde FATs), case-insensitive filenames (macOS), etc etc).

If I tried I could probably think of many more reasons to encode module structure within the source and not derive implicitly from the filesystem.

As far as privacy goes, I’m content with current system. I’ll admit I had problems with it at some point, mostly surrounding std::{os,sys} weirdness, but never when authoring or contributing to libraries. With that in mind, I do not find a change here necessary, but I wouldn’t oppose any change to that part of the module system either.

withoutboats · February 25, 2017, 12:20am

This is also my workflow. But I never leave junk files in my src directory, partly because I never cd into a directory that contains code files, I always operate one level above it. I actually make a clear habit of drawing this distinction between the directory which contains my crate and my 'workspace' above it, because I tree that directory regularly to remind myself of the module structure (because these are already closely coupled).

Its true of course that different module systems will provide better support to different workflows. If you create junk files as a part of your workflow, you will need to adopt the same discipline I have about where you create them. This is a trade off, but in exchange you get the various advantages of this system that @aturon has already laid out. But I don't want it to be framed as a negative consequence for "UNIX users," because as a UNIX user my experience would be strictly improved.

Not addressing any particular concern, in general the negative response these ideas have generated seem to me to be "the response of a puddle to a proposal we change the shape of its hole." That is, the current system has been naturalized for people who use it regularly, and there is an inertial force in favor of the status quo.

Obviously no change is without downsides, but none of the downsides enumerated have seemed to me to even nearly equal the upsides of a different system (which are mainly to do with localizing information and simplifying the mental model). I really do think that many users have just normalized the costs of the current system and are writing them off when they should not be.

withoutboats · February 25, 2017, 12:31am

aturon:

First, the module system currently couples together three concerns:

Namespacing

Privacy scopes

Interaction with the file system

While in principle you could separate these concerns, there are a lot of advantages to tying them together. (And of course, there's some flexibility here given that inline mod declarations are a thing).

One consequence of this coupling, though, is that when there's a mismatch between these dimensions, you have to fight the module system a bit. For example, it's pretty common to have a submodule in a file which defines a single public type, along with a bunch of private code -- where the module itself isn't exported, and instead the public type is reexported at a different path. In that case, there's a mismatch between what you want for privacy/file system -- i.e., a separate file defining its own scope of privacy -- and what you want for naming, which is to export the name at some other path.

One thing I think isn't perfectly captured by this summary is this: we discussed this coupling in terms of Rust's existing system; that is, right now we have already coupled namespacing, privacy, and file structure. In a totally decoupled system, you could imagine all of these scenarios:

A file which is at its own path with its own set of items in scope (today the only option)
A file which has its own set of items in scope, but for which its items are directly included into another
A file which is just verbatim included into another file

And of course other combinatorics of these three freely moving things. My impression of C++'s system is that these things are relatively decoupled, and I find it extremely confusing.

I would describe the partial coupling of our system today as a leaky abstraction - it partly joins these concepts into one, but not entirely, so that you can manipulate them independently, but in a sort of hacky and imperfect way (facading). This partial coupling is a large part of what leads to confusion about the exact meaning of our keywords - mod, use, even pub.

So I see this proposal as the culmination of the existing, partially implemented system. I see #[internal] as the solution to the major downside of completing that coupling.

petrochenkov · February 25, 2017, 12:48am

I'm still not convinced the described changes are necessary. The cost of the current system is two extra words per file mod name;, which is negligible. Okay, maybe pub as well, three words.

At the same time drawbacks described by @nagisa are real - you can't temporarily "comment out" a module, you can't keep unfinished stuff in the tree. All the directory traversing complexity leaks into the language - we need to somehow guess and ignore separate projects in nested directories and VCS directories/dot directories/etc. It may be okay for Cargo, for which guessing things is half of the job, but I don't want this in the language.

I have an impression that the module system became too boring and familiar so people are eager to start changing things just because.

petrochenkov · February 25, 2017, 1:18am

One more concrete example for directory traversing. Imagine a crate with a build script build.rs and code in src directory. If build.rs included src implicitly as a module, that would be a pretty large build script! Supposedly, only some modules (like mod.rs or lib.rs) would include other modules implicitly, but I haven’t seen the rules written out.

Topic		Replies	Views
My Preferred Module System (a fusion of earlier proposals) language design	10	2454	March 25, 2019
Yet another module modification proposal language design	13	1389	March 25, 2019
The Great Module Adventure Continues language design	243	16649	March 25, 2019
Module, SubModule, subdirs, etc language design	6	1161	June 10, 2023
Please welcome withoutboats to the language design team! announcements	4	4599	March 25, 2019

[lang-team-minutes] the module system and inverting the meaning of public

A rational reconstruction of the status quo

What this means for module system improvements

Related topics