Revisiting Rust's modules

there could still be well documented means of prioritising, e.g. ‘it looks through the siblings, then goes back a level, and looks again, etc’; the common case i was finding was locating a sibling. (…/foo.rs, …/bar.rs … ‘super::foo::f()’ … I tend to always do ‘use super::’ and ‘pub use all the sub mods’ in each directory in the mod.rs’).
maybe another way to simplify that would be a combined way of referencing the mods in the first place… combination of ‘use mod’ (and of course pub use mod). mod.rs “pub use mod foo; pub use mod bar” … in conjunction with 'use super::
’ you’re good to go with one pathname prefix to locate siblings.

I suppose we could check how many clashes there might be by default in existing projects.

anyway I probably prefer the other suggestion about ‘declaring something one level up’, which would be more precise, but might be seen as equally unusual (more ‘special cases’ to think about)

The problem isn’t that there are no rules; the problem is that now instead of exactly one thing, I have to juggle that entire algorithm for selecting a thing in my head. The number of possible matches going from “one” to “more than one” is the problem, not how it picks from among those matches.

2 Likes

Deprecate mod foo; declarations, instead determining module directly from directory structure.

What? You know git (probably all VCSs) track files, not directories, right? Git won’t delete an unknown file in a tracked subdirectory (for good reasons), but you definitely don’t want files outside version control affecting your build.

CMake had support for automatically detecting source and header files years ago. It was somewhat useful for headers (which don’t affect builds; this was only for creation of IDE project files), but never worked well for sources.

4 Likes

My biggest problem with modules, when I started to learn rust, was that modules can be imported in any order, except modules that exports macros, which should be imported before modules that use these macros.

I spent a whole evening trying to figure it out.

1 Like

The vast majority of the issues with the current system seem to boil down to: “Doctor, it hurts when I do this.” “Well, don’t do that.” Obviously, that’s the cynical view.

One of the things I like about the current module system is that you have to be redundant. Redundancy is important in programming language design because if you remove all redundancy, you get machine code. Every possible input is valid, thereby reducing size, but you have no way of checking if what is being executed is what was intended.

One of the things that really is confusing in Rust is name lookup. ::std:: vs std::, in the crate root vs in a submodule. The privacy system is also confusing. This proposal doesn’t seem to fix name lookup at all. And the privacy aspects of the system like pub(crate) and pub(self) seem to work independently of the other aspects of the proposal. Granted, I’m not very familiar with the recent pub(restricted) stuff, so it’s possible I’m missing something here.

One possible solution that would be explicit (which I like) and solve a lot of boiler plate (which I don’t like, as well) is inline namespaces (C++). If you wrote inline mod foo; it could behave as though foo didn’t exist and import the contents of foo into it’s containing module. This would result in the futures module looking like:

inline mod and_then;
inline mod flatten;
inline mod flatten_stream;
inline mod fuse;
inline mod into_stream;
inline mod join;
inline mod map;
inline mod map_err;
inline mod from_err;
inline mod or_else;
inline mod select;
inline mod select2;
inline mod then;
inline mod either;

This results in almost the exact same behavior you describe. If you want the code in another file, add a single line of code with the name of the file.

  • Private items would be private to the file, just like your proposal.
  • pub(self) would make it accessible to the containing module.
  • pub(crate) would make it accessible to the entire crate.
  • pub would make it accessible to the parent of the containing module.

And note, pub inline mod foo; doesn’t make sense because foo isn’t introduced as a name into any scope. (This would be unlike C++'s inline namespaces.)

And I don’t think this has any backwards compatibility concerns.

Edit:

Oh, and the PAL stuff can be:

#[cfg(...)]
inline mod unix;

#[cfg(...)]
inline mod windows;

And it doesn’t matter whether it’s a single file or a folder structure.

The more I think about it, the more I realize that your proposal is effectively a subset of this proposal. In your proposal, files are implicitly inline mod foo; while directories are pub mod foo; or mod foo; depending on whether they start with _. Your proposal doesn’t have a way to have non-inline files or inline directories. It also has backwards compatibility issues.

(Note, I could be missing something, after all, it is 4am and I only started thinking about this an hour ago.)

20 Likes

I think myself that the biggest issue with Rust’s module system is language not helpful enough if you get it wrong unlike other parts of a language, which is very likely if you are learning Rust.

When making a new project in Rust, a following tree is created.

|-- Cargo.tml
`-- src
    `-- lib.rs

src is encouraging in saying to an user - hey, put more stuff here - this is a good design here as it makes it clear that you can put more files - too bad that rest of the compiler doesn’t work together with it. So they can say decide to make a file called another_file.rs, and even put some content here.

fn method() -> i32 {
    42
}

Everything compiles successfully, no warnings, now how to use that from lib.rs?

pub fn external_number() {
    another_file::method();
}

Now to compile this.

error[E0433]: failed to resolve. Use of undeclared type or module `another_file`
 --> src/lib.rs:2:5
  |
2 |     another_file::method();
  |     ^^^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `another_file`

error: aborting due to previous error

error: Could not compile `example-crate`.

To learn more, run the command again with --verbose.

--verbose info is useless for this issue by the way. No hints about what needs to be done. Uh, mentioning crate name will fix it? Surely, right?

pub fn external_number() {
    example_crate::another_file::method();
}
error[E0433]: failed to resolve. Use of undeclared type or module `example_crate`
 --> src/lib.rs:2:5
  |
2 |     example_crate::another_file::method();
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `example_crate`

So perhaps it should be used, I mean, that’s how it works in Perl and other programming languages. You need to import something before using it. There is use keyword, it’s surely for that, I mean, it’s very similar to something like use std::collections::HashMap so it has to be it. Also documentation for E0433 suggests using use keyword.

use another_file;

pub fn external_number() {
    another_file::method();
}
error[E0432]: unresolved import `another_file`
 --> src/lib.rs:1:5
  |
1 | use another_file;
  |     ^^^^^^^^^^^^ no `another_file` in the root

Perhaps name of crate needs to be mentioned?

use example_crate::another_file;

pub fn external_number() {
    another_file::method();
}
error[E0432]: unresolved import `example_crate::another_file`
 --> src/lib.rs:1:5
  |
1 | use example_crate::another_file;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Maybe a missing `extern crate example_crate;`?

Alright compiler, let’s put extern crate if that’s the issue.

extern crate example_crate;

use example_crate::another_file;

pub fn external_number() {
    another_file::method();
}
error[E0463]: can't find crate for `example_crate`
 --> src/lib.rs:1:1
  |
1 | extern crate example_crate;
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't find crate

But that crate is like… right here, what’s wrong with you compiler.

Okay, I’m done here, four unsucessful attempts to do something, and one misleading hint from a compiler. Like this surely can be done better, right?


My suggestion is simply to change error message so that when trying to do any of those unsuccessful attempts when there is a directory or a file with requested module name, it should instead hint to use mod keyword. Something like this possibly.

error[E0433]: failed to resolve. Use of undeclared type or module `another_file`
 --> src/lib.rs:2:5
  |
2 |     another_file::method();
  |     ^^^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `another_file`
  = help: modules need to be declared first
  = note: use `mod another_file` to do so

(the error message obviously should be better, I’m not good at English myself)

25 Likes

I like the gist of this! I think the proposal meets the stated goals of reducing declarations, path confusion, as well as obstacles needed to be learned for beginners.

However, I’d like to +1 @Lokathor’s idea of an explicit export list per module, thinking that it’s the module’s responsibility to specify what’s to be visible to the outside – not so much individual items’ (as they need not know their context).

Perhaps some kind of hybrid is possible, where individual .rs files don’t create new submodules?

I think that the diagnosis of the problems of the current module system at the beginning is spot-on. This characterizes pretty well the majority of stumbling blocks that people encounter when using the current module system.

I think that the proposed solution has a lot of problems of its own, however.

First is the principle of least surprise. In general, I feel like the expectation would be that if modules are based on filesystem hierarchy, then a module would correspond to a file. Maybe this is just my bias as I’ve been using Python and the current Rust module system most recently, but having module structure based on directories seems surprising to me.

I also feel like this is optimizing for the more complex cases, at the expense of simpler cases. In particular, for a small project that I want to split into a couple of small modules, I would need to create a directory for each one, leading to an unnecessarily nested directory hierarchy:

src/
    main.rs
    utils/
        mod.rs
    net/
        mod.rs
    term/
        mod.rs

The examples given in the blog post all fairly mature, large, comprehensive libraries that try to be widely applicable, provide cross-platform abstractions, and the like. But a lot of application code has simpler needs for the module system, and having to create a directory just to create a module seems pretty heavyweight. In general, I try to eschew deeply nested directory hierarchies, since I find them harder to navigate, easier to get lost in (figuring out which mod.rs you are looking at in an editor), etc, and this is adding one more level of nesting just to define distinct modules.

And sure, you could use mod utils { } within a file in the top level to work around this, but that adds an extra level of rightward drift, which is also something that would be best to avoid.

Finally, this adds one additional level of privacy, file level privacy. I can appreciate why it’s added; I was going to object that this reduces privacy by default, until I read the proposal over again and noticed that part that I’d missed. But that does add one more concept to consider, making the privacy rules a bit more complex.

6 Likes

Now this is a proposal I feel I can get behind! Bikeshedding-wise, I’ll prefer _-modules to be used through a path that’s the same as their directory names, where a base/_foo.rs module would be used as pub use self::_foo::xyz; - that is more consistent with how names beginning with an underscore work in other places in Rust (and Python).

My main concerns are that:

  • “spillover” is a bit of a departure from how modules work, both in previous Rust and in other languages. I don’t feel it’s particularly hard to understand, but it’s both something new people have to get used to, and it might present new and unforeseen issues. e.g. there’s already the “I haven’t set up RLS and I’m doing C-s to look up a name” problem - which already exists somewhat in Rust (cough impl InferCtxt cough) but would be magnified.
  • Relying on listdir - junk files in git repositories would get read and compiled in. I’m not sure how much that would be a problem in practice, but I see how it could get annoying in some cases. We might be able to keep our sanity by only looking at files that match [_A-Za-z0-9]+[.]rs and directories that match [_A-Za-z0-9]+

Also, I’ll like to try @xfix’s error messages and see whether they improve learnability.

6 Likes

This is a really good point, the Rust compiler is generally so helpful but this is a big blind spot. Any new module system is going to take months or more to fully hash out and ship, but seems like better error messages and hints could start going out immediately.

3 Likes

Please let’s not resurrect export lists. They were a huge pain back in the day.

8 Likes

I'm against almost every idea brought up in the proposal. As for the issues listed, I believe some of them don't apply, while others should be fixed in different ways. I agree with one issue though, which is that code in lib.rs and main.rs can use absolute paths without rooting them with a preceeding ::.

Implicit mod and crate

The blog post proposes implicit mod and crate, calling the explicit notation boilerplate. I think adding the ability to omit these two statements is bad for several reasons:

  • The Cargo.toml format is no way trivial and easy. There are conditional dependencies, dev-dependencies, build-dependencies, and so on. How should a beginner coming to a new codebase know which crates they can use in build.rs or lib.rs or src/bin/binary.rs if not through the extern crate declarations? Having an extern crate gcc; in build.rs is a net increase in clarity. Rust will be less learnable through this.
  • If you see some code do foo::bar::baz() or even use foo;, where will you know from what foo is? Is it a module? You'd have to check the file system hierarchy. Is it an external crate? You'd have to check Cargo.toml. Is it an inline module? You'd have to scan the file. With current Rust, everything is declared in lib.rs/main.rs either via mod or extern crate so you only have to look at this single file, and the file your statement is in. Of course if you have sth like use super::baz::bar; or similar, you'll know you have to look there. I call this feature self-contained-ness of .rs files in current Rust, and I believe it should be a goal to preserve it. The only exception to this self-contained-ness is macros, which will hopefully be fixed soon. Therefore, the proposal means a clarity decrease.
  • You need to open Cargo.toml. When looking at some crate, I usually don't open Cargo.toml and open lib.rs instead because it contains far more useful info. I open Cargo.toml only to get the actual version of a dependency. In fact, it'd be far more prefferable to have Cargo.toml implicit, meaning that version info and stuff like that is declared next to the extern crate definition. @ubsan has been proposing this and I support it!
  • You need to obtain a list of files. My editing workflow right now is to have a list of open files with content I'm interested in. When I've worked on the compiler in the past, I've had to open tens of files from various crates with definitions and code. Now the rustc project overall contains hundreds if not thousands of files. I don't know which way your editor is displaying stuff to you, but for me who uses Kate, I have the choice between two modes, one is "collapse directories, but display everything a directory contains if its uncollapsed" and the second is "only show opened files". I doubt other editors offer any other or better mode. Let's assume I want to add a built in macro to Rust, so I'd be interested in the content of two files, src/libsyntax/ext/source_util.rs and src/libsyntax_ext/lib.rs (I'll probably also have to look at more files but this is the two I must edit). See the screenshots at the bottom of this post for an overview of the modes. You'll notice that the "show everything in the file hierarchy" spectacularly fails for the rustc usecase. rustc is simply too big! The same problem manifests for smaller projects as well. The big difference to mod declarations is that while you need to do scrolling for such as well, all files that are not interesting to you are simply not displayed, so you can focus your attention on the actual stuff your current focus is on. So ergonomically, it'd certainly hurt me to not have the mod notation for all modules.
  • Don't forget that the percentage mod and extern crate is for most crates in the single digit range and maybe even less than a percent. The general overhead of typing this should be very low.

Summarizing, implicit mod and crate will make the situation worse for clarity, learnability, and ergonomics, while its overhead is low.

Re: Proposal: directories determine modules

  • The proposal seems to add boilerplate, not remove it. Yes, you'll have to type more if you have an util module with a bunch of declarations you want to be pub(crate) because you must repeat the pub(crate) for each single declaration, you can't just say pub(crate) mod foo; and have the declarations inside be pub.
  • leading _ is ugly as hell.
  • It introduces an inconsistency with normal identifier names. Right now pub mod foo and pub struct Bar are highly similar. In the future you will be able to declare a module _foo but calling a struct _Bar would mean nothing. This makes Rust less consistent. Even if you supported struct _Bar to mean pub(crate) struct Bar it would be less learnable as there is one additional way of calling stuff, and highly inconvenient as you can't just grep for struct Bar any more if you wanted to find the definition of Bar.

The seems to want to remove two properties of current Rust that the blog sees as problem:

  • Widespread use of the facade pattern
  • pub having different meanings in different places

First, let me admit that the widespread use of the facade pattern is not the most beautiful part of Rust, but I'm not sure its so bad that turning over the entire module system is neccessary. If lowering its use is really such a big goal, I'd prefer to have the proposal by @ahmedcharles : inline mod. Not sure how it is going to handle namespacing, e.g. does an use foo inside one of the inline mods affect the parent mod, or not... I'd personally prefer that it doesn't affect the parent module.

Second, with inline mod we'll have less uses of reexports, so pub will mean pub(world) more often. Making it mean pub(world) in every case would be wrong IMO as this makes creating module-scoped modules highly inconvenient, and even modules with mostly pub(crate) items, because every time you'd have to type the pub(crate). Instead, let me make an alternative proposal:

Explicit pub(world) and pub attribute for modules

Initial note: I'm disregarding any backwards compat here as well, and build on the assumption that this will be included into a new epoch.

The first part of the proposal is to add an explicit pub(world) (please bikeshed the name if you find a better one) which means that the item is visible to the world.

The second part is that we error if any pub(X) doesn't end up at publicity level X, when its inside a private module and not pub re-exported.

The third (and main) part, making the meaning of pub a module local property:

The main issue with "pub can mean different things" is finding out what it actually means in the end. If finding out is made easy, changing the meaning of pub shouldn't be that bad, should it? Therefore, I propose a different mechanism to the current one: that you can put an attribute #![pub = pub(...)] to a module where you can choose the ... to mean anything from crate to world to super or even in path if you want. All the uses of pub in the module would mean pub(...) like declared in the attribute.

Features:

  • It should default to world, so in most places pub something would mean pub(world) something.
  • Attributes of any super module won't affect any sub module.
  • The places where deeper hierarchies are wished are still very nicely supported as you can just put one attribute per file.
  • Any pub(world) item which doesn't end up in the public API either through reexports or through the module hierarchy should give an error.

Advantages:

  • It would fix the "finding out the meaning of pub" issue
  • It would still enable people to write modules full of content that is pub(crate) without having to type pub(crate) all the time.

Disadvantages:

  • Modules that are half pub(in path_a) and half pub(in path_b) or similar situations won't profit very much, but I think they are rather sparse.
  • Its still less convenient than the current system (you need to type #![pub = pub(...)] in every module; thats better than typing it for every file but still an effort), but I guess that combined with the `inline mod proposal, it would end up being used less, and also I guess the opinion of the lang team is that convenience in this aspect doesn't matter

What do you think?

Screenshots

"collapse directories, but display everything a directory contains if its uncollapsed" mode:

“only show opened files” mode:

8 Likes

I do worry about the sibling scoping question. I know I, for one, often track down bindings by searching purely within the current file. With this proposal, I’d have to change that workflow to greping within the current directory, or using tags more consistently, etc. Yet, I suspect that in the end, these other workflows are an improvement — e.g., tags allow a more direct jump to definition regardless of where that definition lives, whereas my current workflow often requires following a chain of imports.

And also just a note, many Rust developers will be using an IDE (like IntelliJ Rust, VSCode, etc) that provides code navigation out of the box, and thus wouldn’t need to change this particular workflow based on this proposal.

4 Likes

The IDE argument works both ways. You can also say that and IDE can just automatically add any mod foo; and extern crate; declarations.

4 Likes

That rust’s modules are limited to files that have to be explicitly referenced (i.e. the directory doesn’t collectively define the module as in this proposal) is tedious when writing but so so pleasant when reading. I know exactly where to look to find where something is defined.

We can see the alternative in Go for example and I think it’s a strictly inferior experience having to rely on grep to determine what file defines what.

11 Likes

While the inline mod proposal solves the problems of re-exports, it adds yet another feature to the module system, and removes nothing, so it’s not an improvement on the learnability front.

1 Like

I like this proposal. I dislike the automatic includes of sibling modules, but, how about something like use self:: being required to actually use a name from a sibling file?

given:

src/
  mything/
    a.rs
    b.rs

with:

// file a.rs
pub struct A;
struct Aprime;
// file b
use self::A; // required to be able to use A in this scope
use self::Aprime; // error

Just to leave some opinions:

  • It is generally useful to explore the space, possibilities and have a discussion around it
  • “Rust way”, IMO, is to prefer generally reliability over convenience; saving some keystrokes here and there is a minor gain
  • I don’t think public API not corresponding to a directory structure is a bad thing. When investigating someone’s code, I don’t browse files, I just jump in the code with Vim using racer/rusty-tags plugins, and I expect the directory structure organization to have a different meaning than API origanization (which I view in a Web Browser anyway); as an author I consider it a valuable freedom to express my code with a directory structure that fits my needs, while exposing API that makes sense for the user
  • I have a bad experiences with languages that implicitily pull in all the files in a dir to be a part of a module (leftover files, stale files left after autogeneration because building system is imperfect etc.) ; I prefer explicitly list them and have a confidence is that what I get is what I intended
3 Likes

How is this not already the case?

1 Like

But why do you need them to be modules, if you can split them into separate files & privacy scopes without making them modules?