Data point about the new module system learnability and musings about language stability

It occurs to me that I had a similar problem learning about C++ namespaces. Do we have data about people coming from modern C++ to rust?

@orthoxerox, I'm not sure that our discussion is on topic anymore, so I'm replying to you under a <details> tag

[quote="orthoxerox, post:20, topic:9770"] Let’s say you have structs `Foo` and `Bar` that refer to each other, because they form a FooBar graph. Both implement a whole bunch of traits. For me the natural impulse is to create `foo.rs` and `bar.rs` , but refer to them as a single module. Maybe spit out `debug.rs` or a similar file for trait impls that have no important business logic. [/quote]

If you need to create both foo.rs and bar.rs containing your struct, and then referring to them as a single module foobar, you can do so:

//! foobar.rs
mod foo;
mod bar;
pub(crate) use foo::Foo;
pub(crate) use bar::Bar;

//! foobar/foo.rs
pub(crate) struct Foo { /* ... */ }

impl Foo {
    pub(crate) fn some_ctor() -> Self { /* ... */ }
}

//! foobar/bar.rs
pub(crate) struct Bar { /* ... */ }

impl Foo {
    pub(crate) fn some_ctor() -> Self { /* ... */ }
} 

Et voilà. Since foo and bar modules are not exported publicly, they are not visible from the world/the rest of the crate. Since foobar::foo::Foo has crate visibility, it is visible to foobar, but not to the crate (because the crate doesn't have access to foobar::foo. Since foobar reexports (crate-)publicly Foo, the rest of the crate sees foobar::Foo.

Now, while this pattern can work sometimes, I personally often find that having a file per struct is not what I want. I have two needs when organizing files:

  1. Maintaining a sane file size
  2. Keeping local invariants together

Often, these needs go together: when the file grows too big, the invariants tend to become "not so local anymore", and I'm in a need of refactoring and splitting files.

So I find myself splitting modules rather by privacy concerns than by structs. A typical struct that will share the same module as a struct Foo is a struct Builder that builds Foos. Since a builder is an explicit, reified, stateful constructor, it makes sense and is easiest that it has access to the internal of the class it builds. On the contrary, trait implementations that can be seen as "API sugar", "extension traits", or orthogonal concerns (e.g. serialization when it is gated by a config flag) can be moved to other files.

I think that quote is right on the money. Yes, the module system is unexpectedly difficult (albeit powerful). Regarding "documentation love", where do you think such documentation should go? I think the book already has a pretty thorough (if a bit long -- modules are complicated) explanation.

To add a little more anecdata (from my own experience learning modules, entirely self-taught): I also got tripped up the first time I tried to move something to another file. In java/ruby/python (the three languages I have the most experience with), you’d create a class (or just some methods) in a sibling file to your main, import/require the sibling file, and just use those methods/that class. You can do this in any file, including non-main files.

I got some help with this on the main rust users forum: https://users.rust-lang.org/t/import-from-sibling-files-in-src/22361

In rust (if my understanding is correct), you must namespace literally everything, and you can’t really have sibling files that can refer to each other in arbitrary ways (see my fix commit). From my perspective (and I’m still definitely a newbie), this seems like a step backwards in usability. You have to have this hierarchy that doesn’t really exist in the dynamic languages.

So to clarify, the stumbling block is:

src
\ main.rs
\ a.rs
\ b.rs

// main.rs
mod a;

// a.rs
mod b;

doesn’t work? (src/b.rs needs to be src/a/b.rs or src/a/b/mod.rs.)

Actually, I could see that being made to work. My original intuition was that mod a; looked for ./a.rs or ./a/mod.rs, which basically works in 2015 modules if you don’t use #[path = "_"] mod _;. This is because mod foo; would be an error in non-mod.rs files, which break that simple rule.

Now in edition 2018 you can put mod _; in any file, but it still checks just src/path/to/_.rs and src/path/to/_/mod.rs. I think adding ./_.rs to that search list, even if only to emit a better error, would be a great idea.

If we made it work, we could teach modules first as mod foo; adds a module for ./foo.rs (which works in every file, not just main/lib/mod), then introduce ./self/foo.rs to organize files.

But it does have to deal with including a file twice (though that does already happen with #[path = "_"]).

(Side note: I’ve just hit on why I was weakly against allowing non-mod.rs-mod. If you only allow mod.rs to start a folder, you have a consistent rule for adding modules: mod foo looks for ./foo.rs or ./foo/mod.rs. This is the same in every file. With non-modrs-mod, depending on whether you’re in lib/main/mod or not, it looks for {./foo.rs,./foo/mod.rs} or {./self/foo.rs,./self/foo/mod.rs} respectively.)

2 Likes

I don't quite remember what went through my head when I was reading the crates and modules chapter in the crab book, but it was something like "I guess I can come back to that later, when I actually need to define distinct modules in my program, my current one is an executable anyway, so I don't really care about its API surface". The subchapter also starts with inline modules, which makes the examples look contrived.

Giving the chapter or a subchapter a more obviously practical name like "Compiling multiple source files" or "Taking your program beyond a single source file" would probably catch more eyes. Such a chapter could then start with an explanation that a Rust program is not an amalgamation of equally important files, but a strict hierarchy of modules that must be explicitly traversable from main.rs or lib.rs via mod declarations.

7 Likes

That annoyed me as well when I was learning modules. When teaching modules it's super tempting to use this syntax, because it's easier. But it's not helpful :frowning:

edit: I've filed a bug about this:

5 Likes

Well, rustc can scan for .rs files located in [subdirectories of] the directory containing lib.rs, i.e. usually src. That leaves three main sources of false positives:

  • .rs file which are part of a different crate:
    • If they're part of a different Cargo crate, they shouldn't be in subdirectories of src, but that excludes rustc crates which are lib/bin variants of the same Cargo crate. I don't think there's an easy way to deal with those, but perhaps Cargo could just pass a flag to disable the check altogether when building crates containing multiple lib/bin variants...
  • .rs files which are conditionally compiled and currently disabled:
    • rustc can theoretically check for mod declarations which have been disabled by #[cfg]. Such modules could have sub-modules which rustc wouldn't be aware of, but those could be heuristically identified by path.
    • However, include! is more freeform, and you could have weird things like mod declarations or include! calls produced by macro expansions... so it wouldn't be perfect. But those may be rare enough to expect projects using them to explicitly disable the warning.
  • .rs files which are intentionally unused:
    • Probably okay to just expect the user to explicitly disable the warning if they want to leave .rs files around.

...But there's a much easier 80% solution: only check for missing files when encountering an unresolved import. That is, if the user writes a path foo::bar and there's no foo, scan for foo.rs and, if found, suggest a mod foo; in the appropriate place.

16 Likes

This isn't actually true about Java at all. Your package declarations do not have to match the directory hierarchy at all.

If you’re building with any normal system, they do. They only don’t when you’re using javac directly to compile one .java at a time. If you compile a project directory, the paths have to match. (Though I think most now have a “root path” option? It’s been a long while since I’ve built Java projects.)

Practically it’s a requirement in all mainstream build systems even if it’s not for javac.

That is ingenious! Could you open an issue about this?

8 Likes

Related to the stats: the sample size is really to small to draw any firm conclusions, and the tests were not independent - they were in the same course! In cases like this I recommend valuing qualitative evidence, like any shared narrative from the students, over stats.

2 Likes

In the latest revision of Chapter 7 (still only on nightly), we've done exactly that-- we now have a subchapter named "Separating modules into different files" as a result of the feedback in this issue which is similar to this thread.

The latest revision of Chapter 7 does this as well-- when defining modules, we start out by saying they're for organization and for privacy. Your data points validate a lot of the changes we made in the latest revision!

We'll be thinking about the feedback in this thread, thank you all! We did explain modules and files in the way we did deliberately, however, and not because it's "only convenient for the teacher". I really feel like the way the Rust module system works, you should think about the modules first, and then think about the files (by defining modules inline first and then extracting them to files named with the modules). It seems like folks have the most trouble when they put code in files first, and then try to shoehorn their files into the module system.

18 Likes

I think this gets to the heart of the problem: people naturally view files as a separating point, because they provide separation at the filesystem level, and also separation within their editor.

And since other languages rely on the natural intuition of file separation, of course people will carry that intuition over to Rust.

But that intuition of file=module would still exist even if Rust was their first programming language, because that intuition is based upon the coding environment (the filesystem and editor).

In my own personal Rust projects, most of the time I use the file=module approach, it's only in a few select cases where I actually use inline modules.

That's not because I don't understand inline modules, it's just because it's easier and more natural to use files for separation (remember, editors are specifically designed to organize based upon multiple files and folders!).

I think we should do some polls for experienced Rust programmers to see how often they use inline modules (and why they use inline modules). That will help guide how much the book should focus on inline modules vs how much it should focus on files.

8 Likes

I use inline modules for macros. It is very useful for macros that you can generate namespaces solely using code and without touching filesystem in any way.

2 Likes

It’s probably not about how often inline modules vs separate files are used, but about the mental model.

I remember how I learned Rust modules having prior C++ experience.

  • mod m { ... } is namespace m { ... }, but not as transparent (the basic concept).
  • mod m; is namespace m { #include "m.rs" } (a sugar for out-lining module bodies into separate files).

It worked pretty well.

3 Likes

Oh, no doubt inline modules are useful. But the question is about how we teach modules to beginners, so that naturally excludes macros (which should be taught later, once the student has reached the intermediate level).

Basically, inline modules need to be taught, there's no denying that, but the question is whether files should be taught first (and inline modules later), or the other way around. In other words, how much emphasis should be put on them.

3 Likes

To me files are a stronger unit and more fundamental than a module, and definitely were when I was learning modules (and failing to get them, and being extremely frustrated that they fail when I use multiple files, and the book only shows inline case that is too easy and has no practical use for me).

As I've explained in the issue, the current chapter doesn't explicitly state the crucial difference between how modules are split in Rust and most other languages:

If you take an inline module and split it into files (- lines), in Rust it's:

+// Outside module
+pub mod instrument {
-    // Inside module
-    pub fn clarinet() {
-        
-    }
+}

but in C++/PHP/Go, if mod was equivalent to namespace or package, it'd be:

+// Outside module
-pub mod instrument {
-    // Inside module
-    pub fn clarinet() {
-        
-    }
-}

mod m; also maps more or less to JS const m = require("m.rs"), but that construct doesn't have an inline analog.

To me this was problematic, because mod is like namespace in inline examples, but the inline examples don't apply to multi-file case. When files are involved suddenly the analogy is wrong, and it's not 1:1 mapped to namespace, but becomes a different syntactical construct with the namespace being outside of the file, rather than inside.

2 Likes

FWIW, my lighting fast introduction to modules went something like that:

  • In rust, functions, structs, traits, enums are items accessible through a path (e.g. std::mem::replace, String::new(), String)

  • Each item has a visibility that is controlled by the keyword pub (world-visible) or qualified keyword pub(crate) (crate-visible) that can appear in front of the item (by default an item is said private). To use an item through a path, the item must be visible in the current context. cue example

  • A module is also an item, that contains other items (functions, structs, traits, enums, other modules).

  • The module name appears in the path of the inner items.

  • To use an item contained in a module, both the module and the item must be visible in the current context. So a module allows to restrict the visibility of its inner items to the module’s visibility.

  • To instantiate modules, you can either define them inline, or declare them explicitly, and then define them implicitly in a file with an expected name:

    • Explicit definition:
    // in src/lib.rs
    mod foo { 
        fn bar() {} // path: crate::foo::bar
    }
    
    • Explicit declaration, implicit definition as a file (“implicit” in that the filename is the module name and doesn’t need to be repeated in the file itself)
    // in src/lib.rs
    mod foo; // path: crate::foo
    
    // in src/foo.rs
    fn bar() {} // path: crate::foo::bar
    
    • Explicit declaration, implicit definition as a directory
    // in src/lib.rs
    mod foo; // path: crate::foo
    
    // in src/foo/mod.rs or src/foo.rs
    fn bar() {} // path: crate::foo::bar
    mod sub; // path: crate::foo::sub
    
    // in src/foo/sub.rs
    fn baz() {} // path: crate::foo::sub::baz
    
  • The explicit declaration is needed because the compiler starts from the root source file and adds modules (defined inline or as files) as it discovers them through mod declarations. Not adding files automatically has use cases (such as conditional compilation).

  • items defined in parent modules are visible to the current and children modules. By default, items defined in children modules are not visible to parent modules. Use the pub keyword to get a different visibility for an item (still restricted by it’s module visibility).

  • Things to keep in mind about modules:

    • A module cannot be defined in multiple files, but it can have submodules (visibility rules allow for a rough equivalent of modules split over several files)
    • A module can only restrict the visibility of its inner items, not extend it. For an item to be world visible, its entire path must be pub, and the item must be pub.
    • A pub item in a module will only be visible to those who have access to the module.
    • structs are not the privacy boundary in rust, modules are. This is especially useful for e.g. builder patterns, unit testing
    • Modules aren’t compilation units, crates are (well, something something incremental compilation)

Granted, this is more “cheatsheet-level” material, but I find the distinction between declaration and definition useful, and it works by analogy to how items can be declared and defined at different places in C++. Nowhere do I draw the parallel to C++ namespaces, because I think they are an ill fit for rust modules. namespaces don’t control visibility and can be defined part by part (in multiple files or even in the same part). They are more some kind of mangling sugar IMO.

9 Likes

Just joined this community, I will love to join this tutorials class, I don’t know how far you guys have gone and what are the chances of me joining.