The module scheme (`./module.rs` file + `./module` folder instead of just `./module/mod.rs`) introduced by the 2018 edition maybe a little bit (more) confusing

Greetings,

I recently went back to work on the few of projects written in Rust and decided to renew them to the newer editions (was pre/half-pre 2018 based).

During the work, I noticed you folks has turned away from the mod.rs file, and now encouraging the use of ./module + ./module.rs approach instead. The 10.5. File hierarchy section of the Rust By Example book don't even mention mod.rs anymore, and 7.5. Separating Modules into Different Files is calling it "older style".

As I understand it, the main ground for the switch was to avoid confusion of editor tabs? Well... if true, then why should it be addressed by changing a programming language? Shouldn't it be a flaw of tabbed text-editor and needs to be addressed by their developers?

There are countless files named .DS_Store, Desktop.ini or Thumbs.db in created by the operating systems automagically in everyone's computer, and you don't see it bothers Apple or Microsoft.

So yea, I don't like the idea.

When I started learning Rust, I pictured mod.rs as a special manifest that exposes/lists the structure of a module, just like how main.rs or lib.rs introduces the structure of a program (that's the files you should be reading first time you got the code).

I also picture Rust's module file hierarchy system as "directory based", in my mind, everything of a module should be contained in one directory, including mod.rs and whatever you have in that module. That's also how most people use normal folders: put everything related neatly inside.

My first problem with ./module + ./module.rs is that, the ./module.rs is outside the ./module folder, so now you have two place to look.

Another problem (actually, related to the first problem) is that, now even when ./module.rs is outside of ./module, it could still effects how things work inside ./module.

For example, say you have a module structured as following:

.
β”œβ”€β”€ module
β”‚   β”œβ”€β”€ a.rs
β”‚   └── b.rs
└── module.rs

in file a.rs, you define pub fn a(), in file b.rs you have

use super::a::a;

pub fn call() {
    a();
}

and in module.rs you have

mod a;
mod b;
pub use b::call;

Everything is fine right now, but imagine somewhere down the line you made a mistake by removing the mod a; line from ./module.rs which is a file located outside of ./module, then... guess what the editor will tell you there is a mistake in ./module/b.rs which is located inside ./module. For normal people, this is very confusing because why should a edit done outside of a folder causes error inside the folder? That's just not how scope works normally.

So to make people understand the whole thing, you'll have to put in additional effort to educate them, i.e. more cliff to climb.

Speak of confusion, here's my third problem: back in the day, everybody knows that mod.rs is a special file, just like main.rs and lib.rs. It's a relatively easy mental model to remember.

Now every .rs file you see could be a mod.rs in disgust. But normal .rs and mod.rs don't work the exact same way, so some day, someone will ask "why is my mod something statement would compile in one file, but stops working as soon as I moved it to the next file?", so... more cliff to climb?

(also, dummy, that mod something in one.rs imports something completely different if you put it into two.rs, assuming ./one/something.rs and ./two/something.rs both exist)


OK, after all of that, what I'm trying to say is: there is no question that the ./module + ./module.rs approach is here to stay (I hoped not, but here we are). But, can we also support mod.rs at the same priority? Not "just an oldstyle", not "discouraged", but "another way/style"?

Reading around the Internet, there are quite a few stubborn mod.rs supporters around Link 1, Link 2. I think I'll keep using mod.rs too if that's OK.

And that's all :slight_smile:

1 Like

I think it would be good to revert/soften up those descriptions that describe it as the old/deprecated style. While I tried the new style, I've mostly reverted to using mod.rs everywhere since for me conceptually it feels cleaner to keep all the modules files together in a directory. In the library ecosystem, I don't think the "new" style is much more common than the "old" style at this point.

I guess it doesn't help that VS Code's file browser separates directories from files (all directories first, then all files), so that feels like a convention that mod.rs works better with.

(I've heard from other folks I work with that they prefer the "new" style, for one part, because they search their files by name and so having a bunch of mod.rs files is less convenient than having those files have the more specific name of their containing module.)

11 Likes

For those not aware, there is a clippy lint to prevent the use of self_named_module_files or mod_module_files. I have #[warn(clippy::self_named_module_files)] set in all of my projects.

To me, the benefit of foo.rs is I can add a directory of additional modules without having to move the original foo.rs. However, I tend to prefer my main.rs, lib.rs, and mod.rs files to be as minimal as possible so I'd want to move things around anyways.

The big reason I don't use foo.rs and instead mod.rs is when I see foo.rs I don't tend to think to check if there is also a foo/ and get confused when it looks incomplete or when I see mods in the file.

Technically, foo.rs is also inconsistent. For main.rs, lib.rs, and mod.rs, a mod bar.rs means "look in this directory" while for foo.rs a mod bar.rs means "look in a child directory.

I wonder if an "auto mod" feature would make foo.rs even more confusing.

7 Likes

I wish IDEs started to finally somehow visually bundle foo.rs and foo/* together; the optimal solution would be to display a module hierarchy rather than a filesystem hierarchy in source directories. IDEA/RustRover is almost there; they support "file nesting" meant for tidying up transpiled stuff, eg. to keep foo.css and foo.css.map under foo.sass, but it can't nest a directory under a file, just other files :confused:

4 Likes

In VSCode you can set the "Sort Order" setting to "mixed" in order to interleave files and folders in the explorer.

1 Like

One of the issues with module/mod.rs is that it leads to some confusion when trying to navigate and modify the module hierarchy.

With the module.rs approach, you can reliably say that the way to find a::b::c is to replace the :: with / and append .rs: a/b/c.rs. And you can always add a submodule by making a new directory, without having to git mv a/b/c.rs a/b/c/mod.rs.

In the absence of mod.rs, it's easier to know that whenever you descend a level in the module hierarchy, you descend a directory. You don't have these sideways moves from mod.rs to submodulename.rs, or unexpected downwards moves to submodulename/mod.rs.

There's an exception to that: main.rs and lib.rs still act in many ways like mod.rs, and that can be a source of confusion when projects grow past a single file and begin to modularize. It also causes some confusion in tests.

This isn't an issue when using things like src/bin/abc.rs to define an abc binary, because submodules of that are src/bin/abc/xyz.rs.

Personally, I'm tempted to take that model further: if we moved bin and lib.rs up a level, we could have:

Cargo.toml
bin/name.rs
bin/name/submod.rs
lib.rs
lib/a.rs
lib/a/x.rs
lib/b.rs
lib/b/y.rs
10 Likes

Personally, I'm tempted to take that model further: if we moved bin and lib.rs up a level, we could have:

If I were to see a top-level bin/ in a project, I would assume its a collection of scripts.

I've generally found it more difficult to spot source in Go projects as they put .go files in project roots. At least this is limited to lib.rs.

A benefit to this scheme is it feels like it'd make it easier to encourage people to put bins in bin/, avoiding the confusion over having a main.rs and a lib.rs in the same directory and the user using mod without much awareness of what is happening.

3 Likes

Yeah, that was a major motivation. For crates containing both a library and one or more binaries, it'd keep things more clearly separate, and avoid having people use mod lib; in their main.rs (which is almost always a mistake).

The mod.rs style also works better, ish, for using modules with tests or examples or bins, if you like Cargo’s automatic discovery. But also taken to its extreme, you’d never have a library source file not named lib.rs or mod.rs, and that seems like silly boilerplate along the same lines as the Java package-names-in-filesystem. So I’m glad the option is there for single-file modules; maybe it’s just when submodules start getting involved that I should be more consistent about moving the parent to foo/mod.rs rather than leaving it as foo.rs.

1 Like

Some more benefits to mod.rs

  • You can grep a single directory for a mod, rather than specifying a file and a directory
  • Tab completing file names is easier
1 Like

Not having to rename files when deciding to add submodules is a big plus for me. Especially when you split an existing module into several sub modules. Why? The rename tracking of git is fairly limited and it will loose track if there are also big changes at the same time.

The fact that IDE tabs tend to be handled badly is also annoying (but not a deal breaker, I survived __init__.py in Python). Of course in Python those tend to often be empty (or maybe just a doc string) and just a marker to make it a module. There is no equivalent to the mod submod; of Rust.

6 Likes

I think making main.rs less special by stuffing it into /src/bin/ would be pretty great for combating mod lib;, as well as other cases where two root crates both have multiple modules in the same root directory.

1 Like

This has probably been discussed in the past (if so, please point me in the right direction), but I don't think either is the right option.

As others said: module_name.rs is often separate from the directory ("mixed" setting in VSCode only slightly improves that) and mod.rs can sometimes be hard to find (as its somewhere in the middle of all the files in a (folder) module.

In addition to that it can be a bit annoying if you have lots of things in one module but want to separate them into multiple files (requiring many/wildcard re-exports).

Personally I'd love if all files where in one directory (name of the module) and:

  • mod.rs works as it does today (backwards compatibility)
  • my_module/submodule_name.rs works as it does today (backwards compatibility and used everywhere - the normal situation)
  • my_module.rs works as it does today (backwards compatibility)
  • my_module/_whatever.rs all files starting with _ are treated as part of my_module (instead of a _whatever submodule).
    • The namespace (pub and non-pub functions) is shared (except for imports?)
    • They would behave (almost) like if they where concatenated and named mod.rs (except for imports which would be merged?)
.
β”œβ”€β”€ module
β”‚   β”œβ”€β”€ _mod.rs // Option 3
β”‚   β”œβ”€β”€ _module.rs // Option 4 (module name shown in the tab)
β”‚   β”œβ”€β”€ _whatever_you_want.rs // Option 5 (will be merged with other `_*` files)
β”‚   β”œβ”€β”€ a.rs
β”‚   β”œβ”€β”€ mod.rs // Option 2
β”‚   └── z.rs
└── module.rs // Option 1

All files marked with // Option would be allowed to contain mod directive(s) and multiple can exist (as long as there are no naming conflicts and no duplicate mod directives.

Advantages:

  • Directories can be used both like they are now and like most other languages do it (everything in one folder is in the same module (except for subfolders which make submodules).
  • Everything related to that module is in the same place (unless using the currently recommended way of a separate file or pudding mod my_module in lib.rs.
  • The _whatever.rs files are always on the top (as long as files are sorted alphabetically)
  • mod.rs can just be renamed to _mod.rs (or something else starting with _) and will show up at the top of the file list.
  • Those that like finding it by name can just search for the file starting with _ (or name the file something meaningful, e.g. my_module/_my_module.rs).

Disadvantages:

  • We have 2 ways to do things, now we have 3/more
  • Finding stuff if the module itself is split into multiple files gets more difficult (but not impossible) - Most of the time you still have only one file.
  • Adding this may require an Edition boundary, as mod _whatever; is currently allowed and would now have to be a folder: _whatever/mod.rs or _whatever/_something.rs, I'm not sure how many crates would be impacted by that, as its used mostly for things like mod _internal or mod _private.

Personally I am 100% appreciative of no longer having dozens of files in my workspaces with exactly the same name. [1]

Some other improvements previously discussed can be found in a few old threads. Particularly this one: Automatic modules (no more mod.rs!)


  1. I am also not a fan of index.html, and I find __init__.py especially egregious. β†©οΈŽ

4 Likes

It kind of makes sense sense if you think about path processing, but the problem with it is that Rust encapsulation boundaries don't correspond to directories, so you get two different hierarchies.

A Rust encapsulation boundary consists of a module and all its submodules, but a directory only contains all the submodules without the top-level module.

1 Like

I find that I use both styles, although I use mod.rs more often. Specifically, I use mod.rs for "structural" files that primarily consist of docs, mod and use, without defining any items themselves, and file.rs for files that directly contain a notable amount of code and might have a private helper module if needed.

I only rarely actually use the latter organization, but I sometimes do, and this rough guideline gives me a clear point at which to refactor from using file.rs to file/mod.rs.

Oh, it's more fun than this. What files a mod item looks for isn't actually relative to the current source file at all; it's relative to the current module path[1]. This means that if you write mod a { mod b { mod c; }} it will actually mount the source from ./a/b/c.rs.

I actually do (weakly) agree with the sentiment in retrospect that mod foo; always being mod foo { include_one_of!("./foo.rs", "./foo/mod.rs"); } was a reasonably simple model… except that even then it wasn't quite true. It was more true than it is today, but the quirks around module mounting (and especially with #[path] mod and include!) still existed, they were just more hidden.

But on the other hand I've also made one particular usage of the new module rules that would've been much more annoying: I have a macro in a bindings crate used like

lib_class! {
    /// Docs
    class System = LIB_SYSTEM;
    mod lifetime, setup; 
}

lib_class! {
    /// Docs
    class Group = LIB_GROUP;
    mod lifetime, control;
}

which expands to roughly

pub use self::system::System;
pub mod system {
    use super::*;
    opaque_type! {
        /// Doc
        pub struct System;
    }
    // a bunch of impl for System
    pub mod lifetime; 
    pub mod setup;
}

pub use self::group::Group;
pub mod group {
    use super::*;
    opaque_type! {
        /// Doc
        pub struct Group;
    }
    // a bunch of impl for Group
    pub mod lifetime; 
    pub mod control;
}

and for this crate it's much cleaner to see all this in the single file than it is to split it into separate ones.


  1. With the added bonus of how #[path] mod and include! work. In reductive short, each loaded module has a filesystem path it logically corresponds to. lib.rs is $DIR/. mod foo is ./foo, relative to the parent module path. #[path] mod sets the filesystem path explicitly. And I'm never fully certain about include!, but I think it sets the path for the loaded spans to the included file, not the source where include! was written? But also the module mount path has transparent macro hygiene. So everything just works until it doesn't, pretty much. β†©οΈŽ

2 Likes

I agree that we should reword the docs here to not give a clear preference for either style. Clearly there's good arguments for both.

FWIW in VSCodium, the following helps a lot with mod.rs files (it fixes the "tab title" issue mentioned in the OP):

    "workbench.editor.customLabels.patterns": {
        "**/mod.rs": "${dirname}/mod.rs"
    },

That describes my style pretty well, too.

I only recently discovered this and was quite surprised, too.

8 Likes

TIL. Thanks!

2 Likes

There is not much in the Rust document about structuring your code, if memory serves. It doesn't help that the IDEs don't implicitly support this "module directory descriptor" technique.

I expect had this explanation of directory based module encapsulation where mod.rs serves as the directory descriptor much like lib.rs and main.rs more priority would have been put on IDEs to be accommodating somehow.

IIRC there's actually another complication due to the #[path = "..."] attribute. This has some fun quirks like

#[path = "thread_files"]
mod thread {
   // Load the `local_data` module from `thread_files/tls.rs` relative to
    // this source file's directory.
    #[path = "tls.rs"]
    mod local_data;
}

and it turns out I was not the only one who was surprised, e.g.

That is, the #[path] attribute means that the actual module hierarchy need not mirror the source directory structure.