Revisiting Rust's modules

I like the gist of this! I think the proposal meets the stated goals of reducing declarations, path confusion, as well as obstacles needed to be learned for beginners.

However, I’d like to +1 @Lokathor’s idea of an explicit export list per module, thinking that it’s the module’s responsibility to specify what’s to be visible to the outside – not so much individual items’ (as they need not know their context).

Perhaps some kind of hybrid is possible, where individual .rs files don’t create new submodules?

I think that the diagnosis of the problems of the current module system at the beginning is spot-on. This characterizes pretty well the majority of stumbling blocks that people encounter when using the current module system.

I think that the proposed solution has a lot of problems of its own, however.

First is the principle of least surprise. In general, I feel like the expectation would be that if modules are based on filesystem hierarchy, then a module would correspond to a file. Maybe this is just my bias as I’ve been using Python and the current Rust module system most recently, but having module structure based on directories seems surprising to me.

I also feel like this is optimizing for the more complex cases, at the expense of simpler cases. In particular, for a small project that I want to split into a couple of small modules, I would need to create a directory for each one, leading to an unnecessarily nested directory hierarchy:

src/
    main.rs
    utils/
        mod.rs
    net/
        mod.rs
    term/
        mod.rs

The examples given in the blog post all fairly mature, large, comprehensive libraries that try to be widely applicable, provide cross-platform abstractions, and the like. But a lot of application code has simpler needs for the module system, and having to create a directory just to create a module seems pretty heavyweight. In general, I try to eschew deeply nested directory hierarchies, since I find them harder to navigate, easier to get lost in (figuring out which mod.rs you are looking at in an editor), etc, and this is adding one more level of nesting just to define distinct modules.

And sure, you could use mod utils { } within a file in the top level to work around this, but that adds an extra level of rightward drift, which is also something that would be best to avoid.

Finally, this adds one additional level of privacy, file level privacy. I can appreciate why it’s added; I was going to object that this reduces privacy by default, until I read the proposal over again and noticed that part that I’d missed. But that does add one more concept to consider, making the privacy rules a bit more complex.

6 Likes

Now this is a proposal I feel I can get behind! Bikeshedding-wise, I’ll prefer _-modules to be used through a path that’s the same as their directory names, where a base/_foo.rs module would be used as pub use self::_foo::xyz; - that is more consistent with how names beginning with an underscore work in other places in Rust (and Python).

My main concerns are that:

  • “spillover” is a bit of a departure from how modules work, both in previous Rust and in other languages. I don’t feel it’s particularly hard to understand, but it’s both something new people have to get used to, and it might present new and unforeseen issues. e.g. there’s already the “I haven’t set up RLS and I’m doing C-s to look up a name” problem - which already exists somewhat in Rust (cough impl InferCtxt cough) but would be magnified.
  • Relying on listdir - junk files in git repositories would get read and compiled in. I’m not sure how much that would be a problem in practice, but I see how it could get annoying in some cases. We might be able to keep our sanity by only looking at files that match [_A-Za-z0-9]+[.]rs and directories that match [_A-Za-z0-9]+

Also, I’ll like to try @xfix’s error messages and see whether they improve learnability.

6 Likes

This is a really good point, the Rust compiler is generally so helpful but this is a big blind spot. Any new module system is going to take months or more to fully hash out and ship, but seems like better error messages and hints could start going out immediately.

3 Likes

Please let’s not resurrect export lists. They were a huge pain back in the day.

8 Likes

I'm against almost every idea brought up in the proposal. As for the issues listed, I believe some of them don't apply, while others should be fixed in different ways. I agree with one issue though, which is that code in lib.rs and main.rs can use absolute paths without rooting them with a preceeding ::.

Implicit mod and crate

The blog post proposes implicit mod and crate, calling the explicit notation boilerplate. I think adding the ability to omit these two statements is bad for several reasons:

  • The Cargo.toml format is no way trivial and easy. There are conditional dependencies, dev-dependencies, build-dependencies, and so on. How should a beginner coming to a new codebase know which crates they can use in build.rs or lib.rs or src/bin/binary.rs if not through the extern crate declarations? Having an extern crate gcc; in build.rs is a net increase in clarity. Rust will be less learnable through this.
  • If you see some code do foo::bar::baz() or even use foo;, where will you know from what foo is? Is it a module? You'd have to check the file system hierarchy. Is it an external crate? You'd have to check Cargo.toml. Is it an inline module? You'd have to scan the file. With current Rust, everything is declared in lib.rs/main.rs either via mod or extern crate so you only have to look at this single file, and the file your statement is in. Of course if you have sth like use super::baz::bar; or similar, you'll know you have to look there. I call this feature self-contained-ness of .rs files in current Rust, and I believe it should be a goal to preserve it. The only exception to this self-contained-ness is macros, which will hopefully be fixed soon. Therefore, the proposal means a clarity decrease.
  • You need to open Cargo.toml. When looking at some crate, I usually don't open Cargo.toml and open lib.rs instead because it contains far more useful info. I open Cargo.toml only to get the actual version of a dependency. In fact, it'd be far more prefferable to have Cargo.toml implicit, meaning that version info and stuff like that is declared next to the extern crate definition. @ubsan has been proposing this and I support it!
  • You need to obtain a list of files. My editing workflow right now is to have a list of open files with content I'm interested in. When I've worked on the compiler in the past, I've had to open tens of files from various crates with definitions and code. Now the rustc project overall contains hundreds if not thousands of files. I don't know which way your editor is displaying stuff to you, but for me who uses Kate, I have the choice between two modes, one is "collapse directories, but display everything a directory contains if its uncollapsed" and the second is "only show opened files". I doubt other editors offer any other or better mode. Let's assume I want to add a built in macro to Rust, so I'd be interested in the content of two files, src/libsyntax/ext/source_util.rs and src/libsyntax_ext/lib.rs (I'll probably also have to look at more files but this is the two I must edit). See the screenshots at the bottom of this post for an overview of the modes. You'll notice that the "show everything in the file hierarchy" spectacularly fails for the rustc usecase. rustc is simply too big! The same problem manifests for smaller projects as well. The big difference to mod declarations is that while you need to do scrolling for such as well, all files that are not interesting to you are simply not displayed, so you can focus your attention on the actual stuff your current focus is on. So ergonomically, it'd certainly hurt me to not have the mod notation for all modules.
  • Don't forget that the percentage mod and extern crate is for most crates in the single digit range and maybe even less than a percent. The general overhead of typing this should be very low.

Summarizing, implicit mod and crate will make the situation worse for clarity, learnability, and ergonomics, while its overhead is low.

Re: Proposal: directories determine modules

  • The proposal seems to add boilerplate, not remove it. Yes, you'll have to type more if you have an util module with a bunch of declarations you want to be pub(crate) because you must repeat the pub(crate) for each single declaration, you can't just say pub(crate) mod foo; and have the declarations inside be pub.
  • leading _ is ugly as hell.
  • It introduces an inconsistency with normal identifier names. Right now pub mod foo and pub struct Bar are highly similar. In the future you will be able to declare a module _foo but calling a struct _Bar would mean nothing. This makes Rust less consistent. Even if you supported struct _Bar to mean pub(crate) struct Bar it would be less learnable as there is one additional way of calling stuff, and highly inconvenient as you can't just grep for struct Bar any more if you wanted to find the definition of Bar.

The seems to want to remove two properties of current Rust that the blog sees as problem:

  • Widespread use of the facade pattern
  • pub having different meanings in different places

First, let me admit that the widespread use of the facade pattern is not the most beautiful part of Rust, but I'm not sure its so bad that turning over the entire module system is neccessary. If lowering its use is really such a big goal, I'd prefer to have the proposal by @ahmedcharles : inline mod. Not sure how it is going to handle namespacing, e.g. does an use foo inside one of the inline mods affect the parent mod, or not... I'd personally prefer that it doesn't affect the parent module.

Second, with inline mod we'll have less uses of reexports, so pub will mean pub(world) more often. Making it mean pub(world) in every case would be wrong IMO as this makes creating module-scoped modules highly inconvenient, and even modules with mostly pub(crate) items, because every time you'd have to type the pub(crate). Instead, let me make an alternative proposal:

Explicit pub(world) and pub attribute for modules

Initial note: I'm disregarding any backwards compat here as well, and build on the assumption that this will be included into a new epoch.

The first part of the proposal is to add an explicit pub(world) (please bikeshed the name if you find a better one) which means that the item is visible to the world.

The second part is that we error if any pub(X) doesn't end up at publicity level X, when its inside a private module and not pub re-exported.

The third (and main) part, making the meaning of pub a module local property:

The main issue with "pub can mean different things" is finding out what it actually means in the end. If finding out is made easy, changing the meaning of pub shouldn't be that bad, should it? Therefore, I propose a different mechanism to the current one: that you can put an attribute #![pub = pub(...)] to a module where you can choose the ... to mean anything from crate to world to super or even in path if you want. All the uses of pub in the module would mean pub(...) like declared in the attribute.

Features:

  • It should default to world, so in most places pub something would mean pub(world) something.
  • Attributes of any super module won't affect any sub module.
  • The places where deeper hierarchies are wished are still very nicely supported as you can just put one attribute per file.
  • Any pub(world) item which doesn't end up in the public API either through reexports or through the module hierarchy should give an error.

Advantages:

  • It would fix the "finding out the meaning of pub" issue
  • It would still enable people to write modules full of content that is pub(crate) without having to type pub(crate) all the time.

Disadvantages:

  • Modules that are half pub(in path_a) and half pub(in path_b) or similar situations won't profit very much, but I think they are rather sparse.
  • Its still less convenient than the current system (you need to type #![pub = pub(...)] in every module; thats better than typing it for every file but still an effort), but I guess that combined with the `inline mod proposal, it would end up being used less, and also I guess the opinion of the lang team is that convenience in this aspect doesn't matter

What do you think?

Screenshots

"collapse directories, but display everything a directory contains if its uncollapsed" mode:

“only show opened files” mode:

8 Likes

I do worry about the sibling scoping question. I know I, for one, often track down bindings by searching purely within the current file. With this proposal, I’d have to change that workflow to greping within the current directory, or using tags more consistently, etc. Yet, I suspect that in the end, these other workflows are an improvement — e.g., tags allow a more direct jump to definition regardless of where that definition lives, whereas my current workflow often requires following a chain of imports.

And also just a note, many Rust developers will be using an IDE (like IntelliJ Rust, VSCode, etc) that provides code navigation out of the box, and thus wouldn't need to change this particular workflow based on this proposal.

4 Likes

The IDE argument works both ways. You can also say that and IDE can just automatically add any mod foo; and extern crate; declarations.

4 Likes

That rust’s modules are limited to files that have to be explicitly referenced (i.e. the directory doesn’t collectively define the module as in this proposal) is tedious when writing but so so pleasant when reading. I know exactly where to look to find where something is defined.

We can see the alternative in Go for example and I think it’s a strictly inferior experience having to rely on grep to determine what file defines what.

11 Likes

While the inline mod proposal solves the problems of re-exports, it adds yet another feature to the module system, and removes nothing, so it’s not an improvement on the learnability front.

1 Like

I like this proposal. I dislike the automatic includes of sibling modules, but, how about something like use self:: being required to actually use a name from a sibling file?

given:

src/
  mything/
    a.rs
    b.rs

with:

// file a.rs
pub struct A;
struct Aprime;
// file b
use self::A; // required to be able to use A in this scope
use self::Aprime; // error

Just to leave some opinions:

  • It is generally useful to explore the space, possibilities and have a discussion around it
  • “Rust way”, IMO, is to prefer generally reliability over convenience; saving some keystrokes here and there is a minor gain
  • I don’t think public API not corresponding to a directory structure is a bad thing. When investigating someone’s code, I don’t browse files, I just jump in the code with Vim using racer/rusty-tags plugins, and I expect the directory structure organization to have a different meaning than API origanization (which I view in a Web Browser anyway); as an author I consider it a valuable freedom to express my code with a directory structure that fits my needs, while exposing API that makes sense for the user
  • I have a bad experiences with languages that implicitily pull in all the files in a dir to be a part of a module (leftover files, stale files left after autogeneration because building system is imperfect etc.) ; I prefer explicitly list them and have a confidence is that what I get is what I intended
3 Likes

How is this not already the case?

1 Like

But why do you need them to be modules, if you can split them into separate files & privacy scopes without making them modules?

"saving some keystrokes here and there is a minor gain"

it's not just keystrokes, its the number of mental steps in figuring out how to set it up - the learning curve. how similar it is to other languages that you already have to keep in your head (c++ and whatever else)

I have a bad experiences with languages that implicitily pull in all the files in a dir to be a part of a module (leftover files, stale files left after autogeneration because building system is imperfect etc.)

IMO given the number of other tools that can use a directory structure out of the box (recursive grep, 'git' telling you whats unversioned, etc..), I'd guess it's worth putting up with /or fixing those hazards (i.e. clean up the stale files, etc).

My main means of navigating Rust is recursive grep (it's great that it's is so grep-able), so I want the directory structure to be clean.

3 Likes

Throwing in my support for explicit mod declarations- their explicitness is my absolute favorite thing about the module system and I strongly oppose getting rid of them. I also strongly disagree with the article’s assessment that “the file system arrangement itself is a perfectly “explicit” way of providing information.” Explicit declarations provide both detail (particularly intent; also an easy solution to the sibling file question) and flexibility (single file modules vs directory modules vs mod { .. }; no accidental files) that the file system does not and cannot provide- especially with the inline mod proposal.

I like @ahmedcharles’s inline mod proposal as an alternative- in a way it is actually more explicit than the current system, by elevating the facade pattern into a language feature, and yet it is still more succinct. It clearly differentiates the case of organizational files from publicly visible submodules, without enforcing a directory structure. I do not agree that it hinders learnability, which I see as entirely a problem of discovery- facades are not a solution to the problems beginners have, so they don’t need to deal with it right away.

The new user scenario is a perfect example of why the problem with the current system is discovery. The solution, IMO, is absolutely not guessing what people expect, in the form of assuming that they’ll intuit the directory-based system. The solution is simply to tell them what they did wrong, via better error messages! This stands out to me as a better solution because it also applies to every other problem listed in the post:

  • “Too many declaration forms”: We can’t remove any, so the file system proposal itself deprecates some- another error message. We could alternatively work on improving what we do now, as @xfix suggests.

  • “Path confusion”: The file system proposal doesn’t address this at all. I suggest we could start warning on path usage in the top-level file that would break in submodules. For example, writing extern crate regex; and then regex::Regex could warn that regex is not used. (This could even become a hard error with implicit extern crate, which I’m somewhat less opposed to.)

  • “Filesystem organization”: This plays into the same scenarios as the first bullet point. When someone just wants to add a new file to their project, and they try useing it or directly referencing it, just suggest mod the_file; in the error message.

  • “Privacy”: With inline mod, we could even start warning on the facade pattern to suggest it instead. This could be adjusted depending on how precisely the code matches the behavior of inline mod.

  • “Who can see this item?”: Taking a different tack here, since it’s about discovery while reading instead of writing- improve rustdoc! Add information about where things are defined and exported from. Maybe just a collapsed-by-default “internal path” would be enough, maybe something more involved. This is certainly better than making things more implicit.

  • "pub use abuse": The file system proposal doesn’t solve this problem, either, IMO. Instead it just gives you even more files you have to check. inline mod and rustdoc improvements can both help the discovery problem here, in cases when you’re not in an IDE.

  • “Repetition”: inline mod and implicit extern crate help here, without sacrificing readability.

In the end, even if mod foo; gets deprecated and modules are implicitly declared by directories, I will continue to use the current system and go out of my way to avoid the new one. It brings too many of the downsides of C++'s linking model, and its touted upsides are dubious.

9 Likes

“separate files & privacy scopes” are modules.

This is very different from my experience. Because of re-exports, I'm as likely to find a chain of hops between files if I follow a use statement as I am to find the actual definition. That is, if it wasn't glob imported at some point in the chain. And if what I'm looking for is a method (most commonly), it might not even be the file were the type was defined, because there's no requirement that impl blocks be co-located with type definitions. There's no way to follow use statements to find the impl block.

In other words, I already use ripgrep to search for everything. I don't find this to be a bad experience, but I imagine other people might prefer the IDE "jump to definition" solution instead. Either way, you can't reliably find definitions by reading the import system today.

3 Likes

The only reason you think Aaron’s proposal removes things is because it ignores backwards compatibility. I appreciate why he did that but one can’t judge how easy it will be to learn his proposal in isolation.

The primary features of the proposal are:

  1. Implicitness in more places (which some number of people will always view as being a bad thing, myself included)
  2. Collapsing modules which are declared as files.
  3. Some new privacy primitives.

Like many other people, I don’t see any reasonable transition to this proposal resulting in this making Rust easier to learn.

But anyways, my goal isn’t to make Rust easier to learn, because I largely think that’s futile in a situation where you can’t break compatibility. My goal was to present a proposal that does a lot of what the original proposal does, but without losing the part of the module system I like, the explicitness of it.

2 Likes

No proposal is ever going to fix the problem that programmers write code that’s hard to read. Aaron’s proposal won’t make it easier to find the code. (In fact, it makes it harder than today, since you can’t tell which file something is in based on it’s module, if there are multiple files in a directory, unlike today, if you avoid pub use.)

If @turnage successfully uses the module system without reexports and therefore, it allows him to easily find code in his code base, that seems like reasonable feedback and this proposal would remove his ability to do that.

3 Likes