Add a 'use mod' semantic

CAD97 · May 21, 2024, 8:59pm

That ambiguity between module local or crate item can be avoided by writing use self::modname::* or use crate::modname::* instead to disambiguate. But yes, all item name resolution needs to remain fully unambiguous after all macro expansion is complete.

There could (and would need to) be similar rules for implicit module mounting — e.g. consider all unresolved use to be module mounting after all macros that can be resolved without implicit mod have been expanded — but while it should be fully feasible to do so, the question is whether it's worth making name resolution even more complicated.

It still could be. Default binding modes (also known as match ergonomics) are another feature which serves to make the simple common case simpler at the expense of making in this case type inference for the bindings more complicated.

But there are two still common cases where implicit module mounting can be problematic:

Module visibility. Still requiring pub use modname for any module paths that aren't okay with being mod local means a majority of modules will just replace pub mod with pub use and no other differences. Learning how to manage multiple source files once seems better than learning once for file splitting and again later for module organization^[1].
Modules that don't export any items to be used. If requiring a module always meant a use it'd be more obviously desirable to make use the way to load the module, but impls don't need to be used and can be in any module. So a module that doesn't export any items to outside the module still needs to be mounted despite a use of the module being "unused" except for the side effect of mod loading.

Then there's some bonus questions: does use super::modname work to mount a sibling module, or does implicit mod only work for self::modname paths (after resolution)? What about qualified paths not part of a use tree? If only specifically a use self:: works, I suspect "missing use" issues to come up in similar volume to "missing mod" today.

The error I've actually seen more often — writing mod m to access a sibling module instead of use super::m — isn't even addressed by implicit mod. (And might even make it worse. Thinking mod m looks for ./m.rs or ./m/mod.rs is easy, but this is only true in mod.rs or the crate root file; the search path is actually based on module path^[2] instead.)

I can imagine this leading to someone writing pub mod foo { pub use bar::*; } due to decoupling the concept of "module" from "file" and learning use file::* as the way to split main.rs into multiple files. ↩︎
Rooted from the closest parent module with an explicit #[path] specified. Pre-2018, mod was only allowed in crate root, mod.rs, or #[path] modules, which made this less divergence harder to discover. ↩︎

zackw · May 23, 2024, 10:01pm

If I could make just one change in this area, it would take the language farther in the direction of explicitness: Remove all the rules for guessing which file contains the content of mod m; Instead, you would always have to write

mod m = "m.rs";

for an out-of-line module. (In case it's not clear, the new syntax here is equivalent to #[path("m.rs")].)

I mention this mainly to underscore the lack of consensus. I actively want those mod lines, I think Rust would be worse without them, and I think it would be better with the above change. @yigal100 In order to change my mind you will need to bring actual evidence and not just assertions.

josh · May 23, 2024, 11:02pm

Many people tell us the need for mod m; is confusing and annoying. This includes both experienced Rust users and people who teach/train new Rust users. That is sufficient evidence to identify that there's a potential problem here. There are tradeoffs involved in any potential solution to this problem, but there's not a lack of evidence of a problem.

We should be cautious, we should not make changes lightly, and we should take everyone's use cases into account. And it's possible that, after taking all the tradeoffs into account, we'll end up concluding that we shouldn't make a change here at all, because the downsides outweigh the upsides. But complete dismissals of the problem or the potential upsides don't seem particularly productive here.

zackw · May 23, 2024, 11:38pm

Please understand that I do understand that mod m; can be confusing and annoying, but I think it would be more confusing and annoying to have rustc infer the module hierarchy from the filesystem, and I think mandatory specification of the relative filesystem path for every out-of-line module would be less confusing and annoying.

parasyte · May 24, 2024, 12:08am

I personally find mod name; neither confusing nor annoying. But I am all for any changes that reduce cognitive load or make the language generally more accessible. One minor annoyance like naming your modules isn't a big deal in isolation. But compounded with other minor annoyances, they add up quickly. I think this is just one improvement in a long line that can be made, and it would favor most Rust users.

The removal of extern crate was a much bigger change than it might seem on the surface. Almost no one uses it habitually anymore. We don't need to think about whether an external crate needs to be defined outside of the package manifest. Things Just Work. I would appreciate the same benefits for mod. Even if I'm not annoyed with it right now, there is a future where I will look back and think, "gee, I'm glad I don't have to write mod name; anymore, this is great!"

josh · May 24, 2024, 12:58am

Can you elaborate on why you think these two things are the case? Do you think that's the case in general, or that it's the case for use cases that need to do something out of the ordinary?

2e71828 · May 24, 2024, 4:36am

Personally, I think that mixing explicit inline/#path modules with filesystem-inferred ones will likely be the largest source of confusion— Each system is fine in isolation, but the interactions between them can get complicated.

Even today, there are situations that I have to think twice about (which I usually avoid instead). For example, with this lib.rs, do I need to write src/from_file.rs or src/inline/from_file.rs?

mod inline {
    mod from_file;
}

At least today, I’m pretty much always certain of the module path I need to refer to within Rust code, which is something I need to know/figure out much more often than the implementation filename. I worry that adding implicit mod statements from the filesystem structure will introduce similar ambiguities/corner cases into the module tree.

Nemo157 · May 24, 2024, 11:05am

I agree with this (assuming inline modules only refers to inline modules containing outlined modules, purely inline modules will be fine since they're isolated to the file), but also I think that filesystem-inferred modules will work fine for all of my crates, so would really like to be able to switch to that to simplify them.

Even when you need to #[cfg] a module out the majority of the time switching to #![cfg] will be fine. The only times you really need an external #[cfg] mod foo; is for new syntax, which is a very rare situation.

It seems to me to reduce ambiguities, all module paths become a trivial transform from the file structure.

2e71828 · May 24, 2024, 11:31am

This is only true if we drop support for explicit mod statements and require them to be inferred from the filesystem structure instead. Due to backwards compatibility concerns, however, I don't forsee that happening.

While filesystem-derived automounting might be a better system in isolation, the argument that things will become less confusing by adding yet another option to an already (apparently) confusing system doesn't ring true to me.

Nemo157 · May 24, 2024, 12:00pm

It will still become simpler in the projects that exclusively use it (which could be the majority of projects).

simonbuchan · May 24, 2024, 12:02pm

Honestly the only two things I would really like to see here is:

Let me put multiple mod declarations in one line, eg. pub mod foo, bar, baz;
Rust fmt should put any use of a mod next to it.

Otherwise I'm pretty meh on this one way or the other.

Vorpal · May 24, 2024, 12:39pm

What if there is a name conflict between an online module and a file system module?

What about case insensitive file systems? Could there be confusing error cases when switching between case sensitive and insensitive file systems? (Most of those situations probably create confusing issues already, but let's try to avoid adding new confusing cases.)

Nemo157 · May 24, 2024, 12:43pm

Error, same as mod foo; mod foo {} today.

Use the case the file system gives, whatever that happens to be, since the file system is the one defining the module names it doesn't matter if it's insensitive or not.

EDIT: To be clear, the scheme I'm thinking of is:

Starting from the crate root file (lib.rs/main.rs) walk the file tree from the containing folder, generating a module skeleton for all encountered mod.rs/*.rs files in the expected structure.

No kind of "look for a file when an unknown use is encountered" or anything dynamic, a simple static module layout based on the filesystem alone, with no ability to override it. Probably if you need something more complicated you disable the automatic module detection and specify them all manually as you do now.

Vorpal · May 24, 2024, 1:05pm

I see three main use cases where this doesn't work (there may be more):

Extra generated modules from build tree

My personal experience has been that the only case I needed something more complicated was to include generated rust source files from OUTPUT_DIR, for example protobuf bindings, bindgen, etc.

In those cases I would like the standard module tree with some extra things tacked on. It seems a bit severe to ban me from using automatic modules just because I'm using protobuf.

Platform specific modules / conditional alternative modules

I can see the argument that if you want to override with conditional module paths based on OS / architecture then you need to commit fully to an explicit module tree. Because then you are excluding files from the source tree (and adjusting their name where they are mounted in the module tree). Trying to mix in this case will quickly become complicated.

Conditional modules (either included or not)

This is a simpler case, where a module is either included or missing based on some cfg. No funny "one or the other" or tweaking of paths. Typically used for enabling modules based on features, should in theory be solvable with interior cfg attributes in the file, typically at the beginning of the file. Possible downside is that now rustc still need to open and parse the file.

Nemo157 · May 24, 2024, 2:35pm

True, this seems like it’d be ok to support. The key is just that it can’t override anything from the generated module tree, just add new stuff to it, so any in-tree #[path]s probably won’t work.

There’s two main ways to do this today (with a few variations of each but keeping the main module system components):

// in lib.rs, mount a different file onto `crate::foo`
#[cfg_attr(unix, path="foo_unix.rs")]
#[cfg_attr(windows, path="foo_windows.rs")]
mod foo;

// in foo.rs, have a different submodule and re-export its contents
#[cfg(unix)] mod unix;
#[cfg(unix)] pub use self::unix::*;
#[cfg(windows)] mod windows;
#[cfg(windows)] pub use self::windows::*;

The latter would still work, by removing the mod declarations and moving their #[cfg] into the sub-modules themselves.

farnz · May 24, 2024, 3:23pm

FWIW, at work we handle these cases with multiple crates already; for each protobuf .proto file we use, there's a something-proto crate that contains the generated code and a tiny amount (if needed) of "glue" to interface it to Rust, and then a higher-level crate that imports something-proto and uses it. We do similar with -sys crates for things that use bindgen.

As a result, while this would really hurt your workflow for such things, it'd barely be noticeable to us. Makes evaluating it a little trickier, since we'd need some sense of which workflows are more common.

zackw · May 24, 2024, 3:44pm

I think it's the case in general, although I'd like to restate it with some more nuance.

No "generated" module trees, nay, never

I think it would be a catastrophically bad idea for rustc to walk the filesystem, find all the .rs files in the current crate's source tree, and load them all in as modules. If Rust started doing this I would probably quit using the language altogether, that's how much I don't like it.

I think this primarily because it is very common for me to have .rs files in my crates' source trees that should not be loaded into that crate as modules, ever. They might be isolated test cases for bugs in my dependencies, or they might be out-of-line example code for the documentation, or they might be a pile of notes to self that I want syntax highlighted as Rust. Sometimes (such as the out-of-line example code) they will be checked in. Frequently they do not compile in isolation. Frequently they are self-contained programs, and will therefore break compilation of the crate if they get included.

Technically all this means is that I need some way to tell rustc to load only some of the files within a directory tree as modules of a particular crate. Given the choice, though, I strongly prefer an inclusion mechanism, like what we have now, to a hypothetical exclusion mechanism (load all the files but these) even though it's more long-winded, because it's easier to reason about and it means I can create a new one of those isolated test cases whenever I want and not have to do any ceremony to keep it out of the larger build.

It's worth pointing out that I can't think of any language that actually does walk an entire directory tree and treat each file within as a module. Interpreted languages often seem to do this, but that's an illusion caused by their dynamism. If you have a Python package, for example, you can put as many .py files as you want inside and the interpreter will only read those that are actually mentioned in import statements. In situations where you want Python to load all the files in a directory as modules (such as for test enumeration) you have to scan the directory yourself and feed each file to the module loader yourself.

I agree that the current behavior of `mod foo;` is confusing

We have a great example in this very thread:

I don't know the answer to this myself without looking it up! (And having looked it up, it sounds like the answer changes if mod from_file has a #[path] attribute, which can't help but confuse people even more. Or else the Rust reference itself is badly worded...)

And you surely remember how long and how much debate it took to nail down the new behavior when we started allowing

foo.rs
foo/bar.rs

as well as

foo/mod.rs
foo/bar.rs

Explicit `mod foo;` cannot go away completely

Nobody in this thread seems to be proposing that #[cfg_attr(unix, path="foo_unix.rs")] mod foo; should go away, but what that means is that mod foo cannot completely go away. That in turn means I cannot get behind any syntactic sugar along the lines of "if there is no foo in scope, then use foo; means the same thing as mod foo; and use foo::bar; implies mod foo;". That is how Python works and it is, I think, fairly close to what the people asking for "generated" module trees want to be able to write.

But it would add complexity rather than removing it; on top of the question of whether mod foo; refers to foo.rs or foo/mod.rs (and what if you have both?) and the question of how an intermediate inline module changes lookup, you now have to wonder whether there's an explicit mod foo, presumably with a path attribute, anywhere in the current scope.

Requiring paths for all out-of-line modules would reduce complexity

Suppose you always have to specify the path for each of your submodules, and these paths are always relative to a directory with the same name as the current module's file (i.e. inline modules definitely don't affect anything) (in "mod-rs" files, as that term is used in https://doc.rust-lang.org/reference/items/modules.html#the-path-attribute, we use the current module's directory instead -- this is necessary for backward compatibility and also means the behavior at the crate root is more natural).

This would be more typing than what we have now, and would take the language in the opposite direction from what the people proposing this thread said they wanted. But it would be fewer rules for what rustc will do than what we have now, and that would be worth the extra typing. It would also give us a path toward limiting the special "mod-rs" behavior to the crate root itself, for even more predictability.

Vorpal · May 24, 2024, 3:58pm

I would argue that is not best practice. Examples should go in an example directory, docs in a doc directory, etc. Notes in your note management system (e.g. Obsidian, Logseq, org mode, ...)

Many now popular build systems for C/C++ do this, such as Meson and Bazel. Cmake can be made to do it, though it is a relatively new feature, not yet widely adopted, especially since it had performance issues early on (from my understanding the kinks have been worked out in the last few versions).

While I don't feel strongly that we should have implied module trees, I'm not against it either. However the opposite of that, which you propose, is fairly useless. It adds no new information, it makes the common case more verbose without reducing complexity.

Yes you could argue that it makes the corner case of a file module in an inline module simpler. But I don't think that is a real problem, I have never seen it in a real code base. I believe it belongs in weird-exprs.rs. Technically legal, perhaps mildly amusing the first time you encounter it, but if I ever saw it in a real PR outside of a parser torture test I would reject it.

zackw · May 24, 2024, 4:29pm

They are in an example directory or a doc directory! But that directory is nested inside the crate root (where else am I gonna put it? Particularly if rustdoc needs to be able to find it) and thus would get scanned by rustc.

Notes in your note management system

Just as you shouldn't assume I'm using an IDE, you shouldn't assume I am using a note management system. Also, these notes specific to a crate, that I intend to share with my collaborators, and therefore -- regardless of what is being used to edit the notes -- they belong right next to the code, and get version controlled along with it.

(In general, 'your workflow is bogus' is not a constructive response.)

Many now popular build systems for C/C++ do this, such as Meson and Bazel

Meson can do that -- using file globs, IIRC -- but you aren't forced to, you can go on listing each .c file explicitly. I have never done anything with Bazel.

I don't have a problem with an opt-in mechanism for declaring all the .rs files in a directory to be submodules, hypothetically something like

mod self::*;

However, people will want to write more specific file globs and maybe we should have a plan for that before we add any such thing.

the opposite of that, which you propose, is fairly useless. It adds no new information

This is a pretty fundamental disagreement we have here, then, on a conceptual level. To me, writing

mod foo = "foo.rs";

for every module would be worthwhile, despite the redundancy in the common case, because it would make the mapping from modules to source files be completely explicit, and completely under the programmer's control. Putting module X in a file with a different name, for whatever reason, ceases to be a special thing requiring #[...] notation. Teaching the language becomes simpler; you no longer have to learn rules for where to look for the code of mod foo, because there are no rules, it's just wherever the right hand side of the assignment says.

if I ever saw [a file module in an inline module] in a real PR outside of a parser torture test I would reject it

I could get behind a change to disallow file modules within inline modules altogether; but that's a separate thing from what's under discussion here.

Nemo157 · May 24, 2024, 6:21pm

If by "crate root" you mean the directory the Cargo.toml is in, note that that is the manifest/package root. A single cargo package can contain multiple crates, the root used for implied modules in my mind is the directory the root file of the crate is in, commonly src but also others such as src/bin/foo for multi-binary-crate packages. (This does mean some patterns such as having a package with both src/lib.rs and src/main.rs or src/bin/foo/main.rs will not function very well, but I'm sure there'd be nice ways to support these patterns that could be designed with a bit more thought, all my crates are single library/binary crates so I haven't had to consider it).

Topic		Replies	Views
Yet another module modification proposal language design	13	1389	March 25, 2019
Modules that inherit `super` scope language design	5	878	March 25, 2019
A discussion about improving "import" codes language design	6	760	October 15, 2019
Revisiting modules, take 3 language design	82	8111	March 25, 2019
Pre-RFC: privately imported names should be visible to submodules language design	1	1085	March 25, 2019