Revisiting Rust's modules

tomaka · July 28, 2017, 5:34pm

What I'm pointing out is that these are all anecdotal evidences. When I say that I didn't have trouble with the modules system, it is also anecdotal evidence. The real question is: what is the proportion of people who have troubles with the current system, and what is the proportion of people that don't?

Kixunil · July 28, 2017, 5:43pm

This is very important use case to me and I’d hate to be unable to continue using it. In some sense it’d break backwards compatibility to remove it (having stray files is allowed already).

carols10cents · July 28, 2017, 6:36pm

Since this is a very preliminary proposal and not implemented yet, no.

Empirical evidence is hard to get in CS; for programming language design in general, the Quorum language is the best effort I know of right now and I don't think they've studied module systems.

If we were to design a UX study to compare the current system and proposed system, what would it look like? What sorts of material would we give people, and what sorts of tasks would we be asking them to complete?

There's a lot in this thread of people with particular use cases that are complex and different from the common cases. If the common cases "proved" (as much as it's possible to prove anything here) to be more learnable, would people accept making the complex cases more difficult? Or would only a solution that makes all cases easier be acceptable?

On a different note, I've seen people learning Rust who genuinely have trouble understanding the module system because the module system is confusing, and I've also seen people who are used to organizing their code a particular way and chafe against the common idiomatic Rust way of organizing modules. The latter group will probably resist whatever we do here unless we make whatever they're used to the default (and since that varies from person to person, isn't possible). I'm still digging into this proposal, but it seems like this is going even more towards "this is the convention, if you follow this, everything will work, but if you don't, it's going to be a struggle". So this group might get MORE vocal, even if we make the experience better for new users who are happy to follow the conventions (once we make them more understandable). I don't know what to do with this information exactly, except to caution that we aren't able to get complete, empirical feedback here and complaints don't necessarily mean we've made things worse.

matthieum · July 28, 2017, 7:27pm

I like a lot in your proposal and much prefer it to @aturon’s.

In general, I would rather have a simple system even if it requires some boilerplate, and does not allow fine-grained encapsulation.

I’m going to build on it, by considering those related concepts in turn:

code organization, also known as “How do I access this item?”
privacy, also known as “Who has access to this item?”

Note: I will not address the extern crate issue; I quite like the current situation and I am not sure that it needs changing.

I strongly believe that code organization should, by default, be tied to the filesystem much like Python modules are:

Python is a widely used language, and therefore a lot of people are going to be familiar with it,
This de-facto eliminates a lot of boiler-plate,
Commenting out is easily carried out by renaming the file (changing the extension, or using a leading underscore like unused items today).

So, for code organization, I would simply:

make the module hierarchy reflect the directory/file hierarchy,
only include a directory if it does not start with an underscore and contains a mod.rs file (or lib.rs, bin.rs),
only include a file if it does not start with an underscore and ends in .rs,
error out if any “to be included” file has a name which is not a valid identifier in Rust.

This makes code organization intuitive.

This does not immediately address the Facade Pattern, and I do not think it is necessary to address it here.

Actually, I would even favor deprecating the mod keyword altogether.

The only other use of mod I have in my tests is #[cfg(test)] mod test and I feel like unit testing would benefit from being more keenly integrated in the language. The fact that I like small files and large test suites also means I favor putting tests in a separate file, and therefore I could easily envision:

foo/
    test/
        bar.rs
        baz.rs
    mod.rs
    bar.rs
    baz.rs
lib.rs

Then:

foo/test/bar.rs is considered a child module of bar.rs, and therefore gets full access to its private items (access still requires super: use self::super::*; at the top is easy enough),
If really necessary #![cfg(test)] would decorate foo/test/bar.rs, though to be honest I’d be keen on inferring it (aka, if the test directory does not contain a mod.rs, then the files inside are test files for its .rs siblings) and have the compiler report test/xxx.rs if there is no xxx.rs file to test.

Note: I much prefer this organization to Java’s, because it keeps the tests close to the code, instead of having to navigate to another subtree entirely; at the same time, it’s tidy: all tests are neatly tucked into a separate directory. Oh, and having two files means visualizing both in parallel works even in editors that do not allow opening the same file twice without contortions…

I am of the opinion that privacy should be the default, and public should be opt-in, for the simple reason that it forces a conscious choice of making something public, thus avoiding accidental leakage of internal details. It also does not hurt that the compiler messages can immediately spot the issue without ambiguity when trying to access a (too) private item so it’s an easy compilation error to solve.

Regarding the different levels of privacy, I am afraid that too much is too much: the vast array of choice (pub, pub(crate), pub(restricted), pub(mod), pub(self), …) is simply bewildering.

Encapsulation is certainly a desirable property, however too many options may just be that, too many. At some point there are diminishing returns eating into the language’s complexity budget. As such, I’d favor coarser granularity because it’s simpler.

I would simply use 3 levels of privacy:

the default is private, restricted to the current module and its children,
pub means that the item is public, immediately; that is, if foo/bar/baz.rs contains a pub struct Hello; item, then $crate::foo::bar::baz::Hello is accessible outside the crate (and each step of the path is accessible individually),
pub(crate) is similar to pub though restricting the scope to the crate itself.

This does mean that there is no strong encapsulation within a crate; however at the same time such encapsulation would be an edit away from not existing, so I am not sure how valuable it is. I could, maybe, be convinced that pub(super) would be a useful 4th level. Maybe.

Note: I’d really have a dedicated keyword rather than pub(crate) which feels like a second-class citizen; maybe local or protected?

With that in mind, the facade example would require in future/mod.rs:

pub use self::{and_then::AndThen, flatten::Flatten, flatten_stream::FlattenStream, ...};

As well as each individual item being declared either pub or pub(crate) to be visible from future.rs (cannot re-export what you cannot access).

It’s a tiny bit of boilerplate, for sure, but:

it’s literally a couple keystrokes,
it can be made to work with glob patterns easily enough to get “inline” module (pub use sub_module::*;),
it provides nice navigation benefits: if you have future::AndThen you go to future/mod.rs where you have a nice redirection panel pointing to either future/and_then.rs or future/and_then/mod.rs.

Note: I would favor relative paths over absolute paths, but this seems completely orthogonal.

Platform specific implementations require two annotations, if one wishes to paper over the differences:

in the module itself #![cfg(unix)],
in the containing module: #[cfg(unix)] pub use self::unix::*;

However I see the two annotations as playing distinct roles:

the latter, decorating pub use, decides whether to re-export the symbols or not,
the former, decorating unix.rs, allows using unix-only functions inside the module.

This does not seem completely unreasonable to me. Is it so widely common that it requires more attention?

tomaka · July 28, 2017, 7:38pm

Do you have an example?

The only thing that comes to my mind is these sort of weird reexports, which are clearly unidiomatic. But I've only ever seen this in the stdlib (and I guess it's being kept because it would be too annoying to fix) and never in any third-party library.

U007D · July 28, 2017, 7:54pm

Wonderful discussion, everyone. Thank you to the Rust Language team for sharing your thoughts and for requesting ours.

In terms of the proposal, I’ve found that Rust’s “convention over configuration” policy to be pretty compelling, as long as the convention is well-documented and easily discoverable. The proposal put forward by @aturon and @withoutboats, at a high level, seems to be another good example of this–I like it.

As a newb to Rust and as one who has (and still does) struggle with modules, there are a couple small things that would go a long way to clarifying the module system for me.

Outside of Rust, module is often used to mean “generic piece of code”, similarly to the way node refers to a generic data element in a structure. This means that mod does not serve as a good mnemonic (at least for me) for what is happening when I use it–in fact, after “getting it”, I came back to writing in Rust after a short two-week vacation, and could not recall whether I needed to mod or use a module to “import” it. Aliasing it to import would certainly help me to understand/remember what the mod keyword does.
I recently ran into a case where my lib.rs sits alongside my main.rs in src and I’ve created an args.rs module, an error.rs module and a consts.rs module all referenced by lib.rs. I decided I wanted to hide all this ‘noise’ in the root directory and tried to move lib.rs to lib/mod.rs. I’d hoped to put args.rs, error.rs and consts.rs into lib, alongside mod.rs. But this didn’t work. Apparently as a crate name, lib.rs is hard-coded. I feel strongly that orthogonality of rules is very important, so we don’t have to say that a system behaves in such-and-such way except for when…

So what does integrating the proposal into the issues I’ve run into look like?

mod's lack of mnemonicness goes away as a problem. There’s no import (keyword with an explicit mnemonic), but that’s probably OK–no new keyword, no redundant keyword, easier to teach, etc. From everything I’ve had to do so far, my code would have less header boilerplate, so that’s a win.
The proposal does not address the hardcoded crate expectations. I would like to see the special ‘hardcoded’ src/lib.rs (and presumably src/main.rs for binaries) loosened to lib.rs or lib/mod.rs (+ equivalent for binary crates) – the same rules that apply everywhere else.

With the above, at the highest level, the proposal seems handle all the use cases I’ve run into in my short time with Rust, while reducing the code needed to build modular systems. And that’s pretty attractive from where I stand.

Thanks again, @aturon, -Brad

aturon · July 28, 2017, 7:58pm

Just a quick aside -- I really appreciate you saying this, and would've even if you'd come out against the proposal. It can be pretty demoralizing reading numerous comments that begin with "I disagree with everything you wrote"

rpjohnst · July 28, 2017, 8:07pm

I like this, I haven't personally run into a need for pub(super) or pub(in foo). But I'm not opposed to what's currently specified, since the pub() notation makes it pretty clear that it's just a variation on pub visibility. I haven't followed the pub(restricted) RFC enough to know what use cases people had in mind.

(For reference C# uses internal as its equivalent to pub(crate))

le-jzr · July 29, 2017, 12:05pm

@aturon About “Path confusion”: To me the greatest confusion is not the inconsistency between use and everything else, but the inconsistency between different cases of use. use std::*; is relative to the root, while use self::*; and use super::*; is relative to the current crate. This one was confusing me big time, and sometimes still is. Though I guess that’s just a symptom of it being the opposite of the traditional arrangement.

I love the rest of the post. It would be amazing if modules worked like that.

liigo · July 31, 2017, 2:37am

std is imported/bound to the root of the current crate

ihrwein · July 31, 2017, 10:44am

@aturon: I really like the core of the proposal and the intention behind it - make Rust easier to learn and use. I’ve just one nitpick: pub(mod) seems to be more reasonable than pub(self) (I don’t like to use self outside of methods). This proposal combined with the “deduced extern crate” RFC could eliminate a ton of noisy use declarations.

est31 · July 31, 2017, 12:39pm

I disagree. See the inline mod proposal which is IMO the most favorable solution, and works without having to use these ugly _ characters.

DanielFath · July 31, 2017, 12:50pm

I'm no UX experts. But here is how I understand the problem. We need to first get answers for some questions:

What are the tasks that are most confusing for users?
What type of users have these problems (novice/intermediate/experienced)?
What do they most likely complain about?
What is their mental model of module system?

Answering #4 will tell us how to make a new model. Then using those questions we could set up some kind of tests. Since it seems our audience is newcommers, we could possibly try doing some A/B tests when teaching, looking for which group has more trouble learning.

However this only gets us half-way and we should see if experienced users are impacted.

My personal thoughts on aturon's idea is that it seems good, and is very Pythonic, but I'm not sure if it will sit well with people not familiar with Python.

le-jzr · July 31, 2017, 3:32pm

I'm not familiar with Python, but I find the system very natural. Much more so than the current one, which forces you to deal with modules and reexports whenever you want to just split code into multiple files. Organizing code using multiple files is ubiquitous, so this is a pretty major papercut.

Incidentally, the proposed system is similar to what Go is doing as well.

DanielFath · July 31, 2017, 4:06pm

So you are familial with Go, I presume? Since Go takes a lot of inspiration from Python, then it just proves my point. Rust is meant to be used by people with C/C++ background. Are they just as satisfied? Who knows?

Your example is what is popularly called - Anecdata. First, there is strong bias, to not report you aren't as good at this. Second, there is bias for people coming from Go/Python background, do all Rust newbies, come from similar background.

This is why I kinda want a UX test, albeit how unfeasible it sounds.

le-jzr · July 31, 2017, 4:25pm

You assume that background in Go runs contrary to background in C/C++. No, I was using Go as a replacement for C. Or trying to. In the end I switched to Rust precisely because Go doesn’t suit my projects.

Compared to C and C++, Rust’s relationship between files and modules is like a fist to the face.

jugglerchris · July 31, 2017, 4:27pm

I haven’t fully formed an opinion about this proposal yet.

However, I wonder if there are any better charcters than _ for marking a private module; in particular one which is not a valid identifier character to avoid some confusion.

A . would work if it didn’t hide files in Unix etc. Perhaps + or -? Admittedly anything else would still look odd, and it’s hard to find a charcter that won’t have odd effects with shells when unquoted. My only other thought is a digit, say 0foo for a private foo module.

DanielFath · July 31, 2017, 5:03pm

Well, yes. Based on Golang User survey 2016, you are more likely to come from Python and JavaScript than C background. I obviously went for the odds.

Ok, I believe you, but your experience is still a single data-point. We need more and we need to have diverse set of users to see the impact.

le-jzr · July 31, 2017, 5:28pm

Indubitably. I think presenting the ideas for discussion here achieves that, at least partially. Lacking a clear way to empirically test the impact in advance (though you are certainly welcome to try), I don't think we actually have a better option than gathering anecdotes from as many people as possible. I'm just contributing to that effort.

Ixrec · July 31, 2017, 8:20pm

Most of what I have to say has already been said by others in this thread, but since my primary background is in C++ (and Javascript) this seems worth responding too.

Regarding the general principle of module systems being related to the file system in some way, I am strongly in favor of relating them. In C++ they are unrelated, and that leads to a significant amount of boilerplate in every single C++ source file. It adds up to so much that you start tuning it out like a form of banner blindness and what was once "explicit" quickly becomes background noise. Rust is already far better than that just because it uses the filesystem at all.

Regarding this specific proposal, I'm in two minds on it. On the one hand, all of aturon's arguments at the beginning made perfect sense to me and it does seem like an absolute slam dunk if reducing boilerplate is the primary goal. But if improving teachability is the primary goal, I'm a lot more skeptical because the current proposal does not simplify the module system, it does not solve any of the baffling rules around paths in use statements, and it introduces a distinction between module-private and file-private that did not previously exist. In contrast, improving our error messages or getting rid of extern crate do seem like slam dunks for teachability. My current gut feeling is that this proposal would be an improvement, but far from a complete solution, and probably not a big enough improvement on its own to justify the many inevitable problems that arise when a language has two different systems for doing exactly the same thing.

Topic		Replies	Views
Revisiting Rust’s modules, part 2 language design	118	14341	March 25, 2019
Data point about the new module system learnability and musings about language stability language design	50	3950	July 19, 2019
A Potential Rust Learning Project Group announcements	40	8547	April 8, 2021
Follow up: the Rust Platform	31	11169	March 25, 2019
Module, SubModule, subdirs, etc language design	6	1199	June 10, 2023

Revisiting Rust's modules

Related topics