Revisiting Rust's modules

Hello

Reading the proposal, I’m not convinced. I don’t really have anything important against the system, but I don’t really see it very helpful either.

I actually find the explicitness of extern crate and mod useful. Often, I place multiple binaries into a single crate (experiments, examples, helpers, etc). I don’t import all the crates into each of them. Similarly, how would the usual binary + lib in the same crate work? Eg. currently I have:

- src/
  - main.rs
  - lib.rs

I’m not saying this couldn’t be solved. But I’m afraid the proposal would only exchange one set of problems for a different set.

Also, isn’t leading _ something meaningful on Windows? Weren’t such files hidden there, like ones starting with a dot on unix? (maybe not, I haven’t used that system for at least a decade)

Also, this is a bit of overloading of leading _. In the code visibility is denoted by pub (or no pub), which is nice and explicit, at least in the local file. And _ means „throw away“. Assigning a completely different meaning to it with files seems kludge.

But my worries may come from the fact these things look too new and I’m already used to something.

4 Likes

I don’t know, modules system is fine for me. The only time I had a problem with it was when I skimmed the docs too fast. Looking at docs again was sufficient.

That being said, my pain point is not knowing when to write ::std::... and when std::... suffices. I often write ::std::... everywhere to avoid having to recompile again.

Also, I think having 1-of-N cargo features would be super-useful.

I really like the idea of using directories (and not files) as the way to describe the namespace hierarchy. Putting module level doc in a README.md file in each directory is also very appealing because it would make the doc easy to consume/edit with github and gitlab (I know that it’s not a great argument but it looks like a large part of the rust community is using either github or gitlab, so having the module doc rendered there is a super nice bonus). Edit: and I agree with the statement that the current module system can be confusing to newcomers. It was my biggest stumbling block when I started using rust (interestingly not the borrow checker).

3 Likes

I like all of this. The only thing I find a bit weird is to use a leading underscore for pub(crate) modules. Can anybody think of any other alternatives for achieving the same thing?

4 Likes

Re sibling visibility, one could imagine a tool that would generate an index of all identifiers with pub(self) and greater visibility, for each module directory. Maybe even with Git{Hub,lab} integration, like markdown and handy links. cargo-modindex?

1 Like

reading through, I agree the flip between global /relative scopes is one of several issues, and coming form C++ the module system was a big surprise (e.g. not being a graph like haskell). It did get clearer over time and it's much easier to use now that the glob imports work ok

pub use abuse. More generally, re-exports are ubiquitous in idiomatic Rust code.

one thing I note is that modules are half doing the job of classes in C++ , e.g. in C++ we can have nested classes, and we use them to share type-params. (and of course there are frustrations with things that classes can't do that namespaces can, and vica versa). but there are ways in which 'a class behaving like a module' is actually quite useful. ("it's broken sometimes/it half works" -> doesn't mean the whole underlying concept was bad).

Observation: We often have a module that centres around one item inside it (e.g. hash_map::HashMap, vec::Vec, option::Option, and similar in anything I've written myself). I think this is the 'class-like' case resurfacing.

Is this why people need to 'abuse pub use', to flatten it out, when it's slightly too deep ?

I had two vague ideas to simplify things:-

  • what about making it easier to make a 'key item' in the module visible at the previous level .. ("this module is Vec, Vec is what you need to see") .. maybe weird, but what if you could say 'struct self { ..}' to recover the idea of 'a module being a class'/ 'a class being a module'). the world sees struct current::location::foo {} when you have current/location/foo.rs: struct self{..} , instead of struct current::location::foo::Foo{}. I guess this could apply to structs/enums/traits (std::option::Option? , modules centering on a specific trait?). I suppose you should only be able to have one 'self item' per mod.

    • or even struct super::Vec{} , enum super::Option{} .. to inject symbols one level up.
  • conversely, what if the modules could be made to 'work harder' / i.e. they were more useful beyond merely namespaces; e.g. in sharing type-parameters or even common function parameters (like a 'this', almost recovering OOP-y behaviour). I'm not familiar with what ML does (and reading up it seems there's other suggestions that overlap with what I'm about to say), I gather it might have things along these lines. https://users.rust-lang.org/t/module-wide-shared-function-parameters-tls-dynamic-scope-emulation/11996 https://users.rust-lang.org/t/module-wide-type-parameters-thoughts/11792

finally a simple (probably more controversial) idea on reducing the amount of 'uses' you might have to do, and possibly working around the global/local split:
https://www.reddit.com/r/rust/comments/6olo9o/thought_experiment_on_name_spacing/

rust has deep heirarchies by default which you then go and flatten (e.g. the redundancy between vec and Vec, the 'vec::' is extranous). What if you made the search slightly fuzzy , and warned (or even gave an error) in the case of ambiguities; the idea being that with many::levels::before::an::item, specifying both the item name and at least one of (but not all) of it's path components might be enough. (e.g. when you see foo::bar() .. search the whole project for any mod foo containing a bar(), if not found look for foo::::bar, etc. might the pair reduce the probability of ambiguity far enough? there's obvious repeated names like 'new()' which I don't like for other reasons - I can workaround that, but most of the time aren't function names doing 'domain-specific' (e.g. named constructors relating to specific use cases). ) in the case of 'vec' what if saying std::Vec found it. obviously if the stblib gained multiple versions of 'vec' that would break. this 'fuzzy search' could have some sort of distance metric, e.g. it would prioritise super::foo::bar() over super::super::foo::bar(), std::Vec over std::::Vec, std::::::Vec etc.

under this fuzzy search idea, the difference between a local/global path goes away

Fuzziness in one of the things I find least desirable in a programming language. Overall, I understand things by modeling them as exactly as possible, and then transcribing the model from my brain to the medium. Fuzziness, global scope, spooky action at a distance, and other such things all sit squarely in my "I cannot model these effectively" zone, and drastically reduce my comfort and productivity.

3 Likes

there could still be well documented means of prioritising, e.g. ‘it looks through the siblings, then goes back a level, and looks again, etc’; the common case i was finding was locating a sibling. (…/foo.rs, …/bar.rs … ‘super::foo::f()’ … I tend to always do ‘use super::’ and ‘pub use all the sub mods’ in each directory in the mod.rs’).
maybe another way to simplify that would be a combined way of referencing the mods in the first place… combination of ‘use mod’ (and of course pub use mod). mod.rs “pub use mod foo; pub use mod bar” … in conjunction with 'use super::
’ you’re good to go with one pathname prefix to locate siblings.

I suppose we could check how many clashes there might be by default in existing projects.

anyway I probably prefer the other suggestion about ‘declaring something one level up’, which would be more precise, but might be seen as equally unusual (more ‘special cases’ to think about)

The problem isn’t that there are no rules; the problem is that now instead of exactly one thing, I have to juggle that entire algorithm for selecting a thing in my head. The number of possible matches going from “one” to “more than one” is the problem, not how it picks from among those matches.

2 Likes

Deprecate mod foo; declarations, instead determining module directly from directory structure.

What? You know git (probably all VCSs) track files, not directories, right? Git won't delete an unknown file in a tracked subdirectory (for good reasons), but you definitely don't want files outside version control affecting your build.

CMake had support for automatically detecting source and header files years ago. It was somewhat useful for headers (which don't affect builds; this was only for creation of IDE project files), but never worked well for sources.

4 Likes

My biggest problem with modules, when I started to learn rust, was that modules can be imported in any order, except modules that exports macros, which should be imported before modules that use these macros.

I spent a whole evening trying to figure it out.

1 Like

The vast majority of the issues with the current system seem to boil down to: “Doctor, it hurts when I do this.” “Well, don’t do that.” Obviously, that’s the cynical view.

One of the things I like about the current module system is that you have to be redundant. Redundancy is important in programming language design because if you remove all redundancy, you get machine code. Every possible input is valid, thereby reducing size, but you have no way of checking if what is being executed is what was intended.

One of the things that really is confusing in Rust is name lookup. ::std:: vs std::, in the crate root vs in a submodule. The privacy system is also confusing. This proposal doesn’t seem to fix name lookup at all. And the privacy aspects of the system like pub(crate) and pub(self) seem to work independently of the other aspects of the proposal. Granted, I’m not very familiar with the recent pub(restricted) stuff, so it’s possible I’m missing something here.

One possible solution that would be explicit (which I like) and solve a lot of boiler plate (which I don’t like, as well) is inline namespaces (C++). If you wrote inline mod foo; it could behave as though foo didn’t exist and import the contents of foo into it’s containing module. This would result in the futures module looking like:

inline mod and_then;
inline mod flatten;
inline mod flatten_stream;
inline mod fuse;
inline mod into_stream;
inline mod join;
inline mod map;
inline mod map_err;
inline mod from_err;
inline mod or_else;
inline mod select;
inline mod select2;
inline mod then;
inline mod either;

This results in almost the exact same behavior you describe. If you want the code in another file, add a single line of code with the name of the file.

  • Private items would be private to the file, just like your proposal.
  • pub(self) would make it accessible to the containing module.
  • pub(crate) would make it accessible to the entire crate.
  • pub would make it accessible to the parent of the containing module.

And note, pub inline mod foo; doesn’t make sense because foo isn’t introduced as a name into any scope. (This would be unlike C++'s inline namespaces.)

And I don’t think this has any backwards compatibility concerns.

Edit:

Oh, and the PAL stuff can be:

#[cfg(...)]
inline mod unix;

#[cfg(...)]
inline mod windows;

And it doesn’t matter whether it’s a single file or a folder structure.

The more I think about it, the more I realize that your proposal is effectively a subset of this proposal. In your proposal, files are implicitly inline mod foo; while directories are pub mod foo; or mod foo; depending on whether they start with _. Your proposal doesn’t have a way to have non-inline files or inline directories. It also has backwards compatibility issues.

(Note, I could be missing something, after all, it is 4am and I only started thinking about this an hour ago.)

20 Likes

I think myself that the biggest issue with Rust’s module system is language not helpful enough if you get it wrong unlike other parts of a language, which is very likely if you are learning Rust.

When making a new project in Rust, a following tree is created.

|-- Cargo.tml
`-- src
    `-- lib.rs

src is encouraging in saying to an user - hey, put more stuff here - this is a good design here as it makes it clear that you can put more files - too bad that rest of the compiler doesn’t work together with it. So they can say decide to make a file called another_file.rs, and even put some content here.

fn method() -> i32 {
    42
}

Everything compiles successfully, no warnings, now how to use that from lib.rs?

pub fn external_number() {
    another_file::method();
}

Now to compile this.

error[E0433]: failed to resolve. Use of undeclared type or module `another_file`
 --> src/lib.rs:2:5
  |
2 |     another_file::method();
  |     ^^^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `another_file`

error: aborting due to previous error

error: Could not compile `example-crate`.

To learn more, run the command again with --verbose.

--verbose info is useless for this issue by the way. No hints about what needs to be done. Uh, mentioning crate name will fix it? Surely, right?

pub fn external_number() {
    example_crate::another_file::method();
}
error[E0433]: failed to resolve. Use of undeclared type or module `example_crate`
 --> src/lib.rs:2:5
  |
2 |     example_crate::another_file::method();
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `example_crate`

So perhaps it should be used, I mean, that’s how it works in Perl and other programming languages. You need to import something before using it. There is use keyword, it’s surely for that, I mean, it’s very similar to something like use std::collections::HashMap so it has to be it. Also documentation for E0433 suggests using use keyword.

use another_file;

pub fn external_number() {
    another_file::method();
}
error[E0432]: unresolved import `another_file`
 --> src/lib.rs:1:5
  |
1 | use another_file;
  |     ^^^^^^^^^^^^ no `another_file` in the root

Perhaps name of crate needs to be mentioned?

use example_crate::another_file;

pub fn external_number() {
    another_file::method();
}
error[E0432]: unresolved import `example_crate::another_file`
 --> src/lib.rs:1:5
  |
1 | use example_crate::another_file;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Maybe a missing `extern crate example_crate;`?

Alright compiler, let’s put extern crate if that’s the issue.

extern crate example_crate;

use example_crate::another_file;

pub fn external_number() {
    another_file::method();
}
error[E0463]: can't find crate for `example_crate`
 --> src/lib.rs:1:1
  |
1 | extern crate example_crate;
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't find crate

But that crate is like… right here, what’s wrong with you compiler.

Okay, I’m done here, four unsucessful attempts to do something, and one misleading hint from a compiler. Like this surely can be done better, right?


My suggestion is simply to change error message so that when trying to do any of those unsuccessful attempts when there is a directory or a file with requested module name, it should instead hint to use mod keyword. Something like this possibly.

error[E0433]: failed to resolve. Use of undeclared type or module `another_file`
 --> src/lib.rs:2:5
  |
2 |     another_file::method();
  |     ^^^^^^^^^^^^^^^^^^^^ Use of undeclared type or module `another_file`
  = help: modules need to be declared first
  = note: use `mod another_file` to do so

(the error message obviously should be better, I’m not good at English myself)

25 Likes

I like the gist of this! I think the proposal meets the stated goals of reducing declarations, path confusion, as well as obstacles needed to be learned for beginners.

However, I’d like to +1 @Lokathor’s idea of an explicit export list per module, thinking that it’s the module’s responsibility to specify what’s to be visible to the outside – not so much individual items’ (as they need not know their context).

Perhaps some kind of hybrid is possible, where individual .rs files don’t create new submodules?

I think that the diagnosis of the problems of the current module system at the beginning is spot-on. This characterizes pretty well the majority of stumbling blocks that people encounter when using the current module system.

I think that the proposed solution has a lot of problems of its own, however.

First is the principle of least surprise. In general, I feel like the expectation would be that if modules are based on filesystem hierarchy, then a module would correspond to a file. Maybe this is just my bias as I’ve been using Python and the current Rust module system most recently, but having module structure based on directories seems surprising to me.

I also feel like this is optimizing for the more complex cases, at the expense of simpler cases. In particular, for a small project that I want to split into a couple of small modules, I would need to create a directory for each one, leading to an unnecessarily nested directory hierarchy:

src/
    main.rs
    utils/
        mod.rs
    net/
        mod.rs
    term/
        mod.rs

The examples given in the blog post all fairly mature, large, comprehensive libraries that try to be widely applicable, provide cross-platform abstractions, and the like. But a lot of application code has simpler needs for the module system, and having to create a directory just to create a module seems pretty heavyweight. In general, I try to eschew deeply nested directory hierarchies, since I find them harder to navigate, easier to get lost in (figuring out which mod.rs you are looking at in an editor), etc, and this is adding one more level of nesting just to define distinct modules.

And sure, you could use mod utils { } within a file in the top level to work around this, but that adds an extra level of rightward drift, which is also something that would be best to avoid.

Finally, this adds one additional level of privacy, file level privacy. I can appreciate why it’s added; I was going to object that this reduces privacy by default, until I read the proposal over again and noticed that part that I’d missed. But that does add one more concept to consider, making the privacy rules a bit more complex.

6 Likes

Now this is a proposal I feel I can get behind! Bikeshedding-wise, I’ll prefer _-modules to be used through a path that’s the same as their directory names, where a base/_foo.rs module would be used as pub use self::_foo::xyz; - that is more consistent with how names beginning with an underscore work in other places in Rust (and Python).

My main concerns are that:

  • “spillover” is a bit of a departure from how modules work, both in previous Rust and in other languages. I don’t feel it’s particularly hard to understand, but it’s both something new people have to get used to, and it might present new and unforeseen issues. e.g. there’s already the “I haven’t set up RLS and I’m doing C-s to look up a name” problem - which already exists somewhat in Rust (cough impl InferCtxt cough) but would be magnified.
  • Relying on listdir - junk files in git repositories would get read and compiled in. I’m not sure how much that would be a problem in practice, but I see how it could get annoying in some cases. We might be able to keep our sanity by only looking at files that match [_A-Za-z0-9]+[.]rs and directories that match [_A-Za-z0-9]+

Also, I’ll like to try @xfix’s error messages and see whether they improve learnability.

6 Likes

This is a really good point, the Rust compiler is generally so helpful but this is a big blind spot. Any new module system is going to take months or more to fully hash out and ship, but seems like better error messages and hints could start going out immediately.

3 Likes

Please let’s not resurrect export lists. They were a huge pain back in the day.

8 Likes

I'm against almost every idea brought up in the proposal. As for the issues listed, I believe some of them don't apply, while others should be fixed in different ways. I agree with one issue though, which is that code in lib.rs and main.rs can use absolute paths without rooting them with a preceeding ::.

Implicit mod and crate

The blog post proposes implicit mod and crate, calling the explicit notation boilerplate. I think adding the ability to omit these two statements is bad for several reasons:

  • The Cargo.toml format is no way trivial and easy. There are conditional dependencies, dev-dependencies, build-dependencies, and so on. How should a beginner coming to a new codebase know which crates they can use in build.rs or lib.rs or src/bin/binary.rs if not through the extern crate declarations? Having an extern crate gcc; in build.rs is a net increase in clarity. Rust will be less learnable through this.
  • If you see some code do foo::bar::baz() or even use foo;, where will you know from what foo is? Is it a module? You'd have to check the file system hierarchy. Is it an external crate? You'd have to check Cargo.toml. Is it an inline module? You'd have to scan the file. With current Rust, everything is declared in lib.rs/main.rs either via mod or extern crate so you only have to look at this single file, and the file your statement is in. Of course if you have sth like use super::baz::bar; or similar, you'll know you have to look there. I call this feature self-contained-ness of .rs files in current Rust, and I believe it should be a goal to preserve it. The only exception to this self-contained-ness is macros, which will hopefully be fixed soon. Therefore, the proposal means a clarity decrease.
  • You need to open Cargo.toml. When looking at some crate, I usually don't open Cargo.toml and open lib.rs instead because it contains far more useful info. I open Cargo.toml only to get the actual version of a dependency. In fact, it'd be far more prefferable to have Cargo.toml implicit, meaning that version info and stuff like that is declared next to the extern crate definition. @ubsan has been proposing this and I support it!
  • You need to obtain a list of files. My editing workflow right now is to have a list of open files with content I'm interested in. When I've worked on the compiler in the past, I've had to open tens of files from various crates with definitions and code. Now the rustc project overall contains hundreds if not thousands of files. I don't know which way your editor is displaying stuff to you, but for me who uses Kate, I have the choice between two modes, one is "collapse directories, but display everything a directory contains if its uncollapsed" and the second is "only show opened files". I doubt other editors offer any other or better mode. Let's assume I want to add a built in macro to Rust, so I'd be interested in the content of two files, src/libsyntax/ext/source_util.rs and src/libsyntax_ext/lib.rs (I'll probably also have to look at more files but this is the two I must edit). See the screenshots at the bottom of this post for an overview of the modes. You'll notice that the "show everything in the file hierarchy" spectacularly fails for the rustc usecase. rustc is simply too big! The same problem manifests for smaller projects as well. The big difference to mod declarations is that while you need to do scrolling for such as well, all files that are not interesting to you are simply not displayed, so you can focus your attention on the actual stuff your current focus is on. So ergonomically, it'd certainly hurt me to not have the mod notation for all modules.
  • Don't forget that the percentage mod and extern crate is for most crates in the single digit range and maybe even less than a percent. The general overhead of typing this should be very low.

Summarizing, implicit mod and crate will make the situation worse for clarity, learnability, and ergonomics, while its overhead is low.

Re: Proposal: directories determine modules

  • The proposal seems to add boilerplate, not remove it. Yes, you'll have to type more if you have an util module with a bunch of declarations you want to be pub(crate) because you must repeat the pub(crate) for each single declaration, you can't just say pub(crate) mod foo; and have the declarations inside be pub.
  • leading _ is ugly as hell.
  • It introduces an inconsistency with normal identifier names. Right now pub mod foo and pub struct Bar are highly similar. In the future you will be able to declare a module _foo but calling a struct _Bar would mean nothing. This makes Rust less consistent. Even if you supported struct _Bar to mean pub(crate) struct Bar it would be less learnable as there is one additional way of calling stuff, and highly inconvenient as you can't just grep for struct Bar any more if you wanted to find the definition of Bar.

The seems to want to remove two properties of current Rust that the blog sees as problem:

  • Widespread use of the facade pattern
  • pub having different meanings in different places

First, let me admit that the widespread use of the facade pattern is not the most beautiful part of Rust, but I'm not sure its so bad that turning over the entire module system is neccessary. If lowering its use is really such a big goal, I'd prefer to have the proposal by @ahmedcharles : inline mod. Not sure how it is going to handle namespacing, e.g. does an use foo inside one of the inline mods affect the parent mod, or not... I'd personally prefer that it doesn't affect the parent module.

Second, with inline mod we'll have less uses of reexports, so pub will mean pub(world) more often. Making it mean pub(world) in every case would be wrong IMO as this makes creating module-scoped modules highly inconvenient, and even modules with mostly pub(crate) items, because every time you'd have to type the pub(crate). Instead, let me make an alternative proposal:

Explicit pub(world) and pub attribute for modules

Initial note: I'm disregarding any backwards compat here as well, and build on the assumption that this will be included into a new epoch.

The first part of the proposal is to add an explicit pub(world) (please bikeshed the name if you find a better one) which means that the item is visible to the world.

The second part is that we error if any pub(X) doesn't end up at publicity level X, when its inside a private module and not pub re-exported.

The third (and main) part, making the meaning of pub a module local property:

The main issue with "pub can mean different things" is finding out what it actually means in the end. If finding out is made easy, changing the meaning of pub shouldn't be that bad, should it? Therefore, I propose a different mechanism to the current one: that you can put an attribute #![pub = pub(...)] to a module where you can choose the ... to mean anything from crate to world to super or even in path if you want. All the uses of pub in the module would mean pub(...) like declared in the attribute.

Features:

  • It should default to world, so in most places pub something would mean pub(world) something.
  • Attributes of any super module won't affect any sub module.
  • The places where deeper hierarchies are wished are still very nicely supported as you can just put one attribute per file.
  • Any pub(world) item which doesn't end up in the public API either through reexports or through the module hierarchy should give an error.

Advantages:

  • It would fix the "finding out the meaning of pub" issue
  • It would still enable people to write modules full of content that is pub(crate) without having to type pub(crate) all the time.

Disadvantages:

  • Modules that are half pub(in path_a) and half pub(in path_b) or similar situations won't profit very much, but I think they are rather sparse.
  • Its still less convenient than the current system (you need to type #![pub = pub(...)] in every module; thats better than typing it for every file but still an effort), but I guess that combined with the `inline mod proposal, it would end up being used less, and also I guess the opinion of the lang team is that convenience in this aspect doesn't matter

What do you think?

Screenshots

"collapse directories, but display everything a directory contains if its uncollapsed" mode:

“only show opened files” mode:

8 Likes

I do worry about the sibling scoping question. I know I, for one, often track down bindings by searching purely within the current file. With this proposal, I’d have to change that workflow to greping within the current directory, or using tags more consistently, etc. Yet, I suspect that in the end, these other workflows are an improvement — e.g., tags allow a more direct jump to definition regardless of where that definition lives, whereas my current workflow often requires following a chain of imports.

And also just a note, many Rust developers will be using an IDE (like IntelliJ Rust, VSCode, etc) that provides code navigation out of the box, and thus wouldn't need to change this particular workflow based on this proposal.

4 Likes