Revisiting Rust's modules

Just to be clear, the goal would be to deprecate mod declarations etc, and phase them out in a later checkpoint (aka epoch). Similarly to the work with extern crate, we don't want to end up with two parallel systems.

I assume this is what you mean by the proposal not simplifying the module system?

I agree that it introduces a new privacy layer (module vs file), but it's not so clear to me that that's counter-intuitive or difficult to learn.

I'm curious whether you have any thoughts on how to solve the problems with use statements. The decisions there were made long before my time, but my understanding is that experience showed absolute paths were much more common than relative paths for use, and hence were chosen as the default.

A final note -- I think there are multiple goals here, including the ones you gave, and also improving one's productivity when reading code. (This RFC takes a "convention over configuration" approach that streamlines one's understanding of code that falls into the common case).

All that said, I think we should strive for some level of confidence that we're improving learnability in a module system revamp. It's very hard to discuss that rigorously, and people often toss in demands for hard empirical evidence that are just not realistic for us to gather.

It's also worth noting that "learnability" is not a single value, but better thought of in terms of learning curve. In particular, if we can let you get substantially farther in using Rust before needing to understand more subtle pieces of the module system (like file vs module privacy), that can still be a win. That's one reason I like this particular proposal: to get started, you just need to know that "each directory is a module; the files in a directory together give you the module's contents".

(Sorry I don't have time to write a more coherent comment just now, but I'd appreciate any thoughts on the above!)

5 Likes

I don't have time for a proper response right now either but I can definitely throw out some of the more half-baked thoughts I left out of that first post.

Not quite, when I said simplify I was talking about understanding the module system at the "rigorous" stage: The directories-are-modules idea by itself adds things to the system and takes nothing away, so there's simply more to understand. But because it does dramatically reduce boilerplate it absolutely makes things simpler in the "pre-rigorous" stage.

If this idea is combined with removing extern crate and mod declarations...that does seem like it'd be simpler for pre-rigorous and rigorous overall.

The distinction between module and file is pretty self-evident, what's less obvious to me is the "mechanics" of it: How do I tell when something is pub(file) versus pub(mod), when does moving some code from one file to another "break" it or not "break" it, etc. This is probably the sort of detail that would get hammered out in an actual RFC and not too important right now.

Part of the problem is I never fully grokked the rules myself, so it's hard to think up possible solutions. But to me the biggest glaring issue was always that the rules change when you move from the crate root to anywhere else in the create. This basically means that the moment you go from having all your code in one file to trying to have multiple files, all your imports "stop working" for no apparent reason. That case in particular I suspect can be fixed in a backwards compatible way. In fact, I suspect it's like the match ergonomics problem where there's a way to make the thing novices try first "just work" so they don't even have to go through that extra error-edit-recompile cycle. I should really go learn the module system properly so I can figure out what that would be.

I believe the boilerplate reduction in this proposal is a slam dunk for the "parachute reader" described near the top of this thread, precisely because that lack of syntactic noise (and "fake" search results!) helps readers home in on the "real code" much faster. In other words, I didn't mention that use case simply because I'm not worried about it.

I completely agree that this is the ideal outcome of this and the other module system proposals. The "pre-rigorous" stage gets much, much easier because there's so much less to deal with early on. Ideally, the module system would "stay out of their way" after that and they can remain "pre-rigorous on modules" for a very long time (maybe forever?) while they worry about far more interesting problems like lifetimes and traits.

My concern is simply that it's not obvious to me that is how it actually will go. Right now it's just as easy to imagine getting hung up on some new corner case like "why can't I call this foo.rs method from lib.rs? (oh, it defaulted to pub(file) for some reason?)" and being forced to get to the rigorous stage not because you're trying to best organize your public API structure but instead because (like today) you have to figure out how to make the compiler happy.

But again, that's a concern that will largely be alleviated by hammering out all the details in a proper RFC. That's why I chose the term "skeptical": I have no strong reason to believe this won't be a huge net improvement for learnability, I'm just not yet convinced that it will be either. I look forward to being convinced that it will!

1 Like

One reason I don’t believe the directory-based proposal will make things any easier to learn, even with removing mod declarations, is that it doesn’t appear to match many other language’s module systems, which is where beginners’ unmet expectations often come from. The closest language is probably Go, in which names from adjacent files are implicitly available- however, that’s the one thing the proposal is least sure on!

The one aspect of our current system that does match other languages is the direct correspondence between a module and a file. Python, Javascript, Lua, and D all behave this way. Ruby and PHP are similar in how they handle multiple files, though they handle namespacing separately. Even Java and C# tend toward (or require) one class per file, which is their visibility-grouping mechanism.

I think Rust’s major departure is mod declarations in particular- not in how it uses one file per module. Most of these other languages, in contrast, seem to determine which files to include via imports or the build system. Worse, some of them have module M; or package org.etc; declarations that are syntactically identical but semantically do the reverse of Rust’s mod declarations.

I think if our goal is to make Rust easier to learn, the best way to go about it is to keep our strongest similarity to these other languages. If we keep the module/file 1:1 mapping, we can get filesystem correspondence to reduce boilerplate, remain backwards compatible, and remain explicit about what matters.

5 Likes

I'll give another datapoint: coming from C and C++, I found Rusts module system very surprising; of course what C++ does is hardly perfect ('something better than headers' is a big reason to look for a c++ replacement), but: what I later saw in haskell was more like "what I'd have expected a module system to be".

I can see that Rusts system has potential though... I can imagine that with some tweaks to simplify the learning curve it could be awesome.

Alternatively, if it gained more features (e.g. module wide type-params?- I miss the ability of C++ to share type-params with nested classes) the extra complexity would pay of.

I think this would be my number one request actually; it would greatly ease generalizing a system in it's entirety. (imagine if a module-wide type could be mapped directly to a module type-parameter)

lower hanging fruit: Another suggestion I have for simplifying is allowing a way to automatically promote a key item into the parent's namespace ,e.g. std::collections::vec::Vec,
std::collections::hash_map::HashMap,
std::option::Option .. those declarations would be flagged in such a way that you get std::collections::Vec, std::Option etc automatically;

If the names match the files there will be no possibility of clash. use super::* would then let modules get the main-types of any siblings directly.. i.e. mymod/foo.rs mymod/bar.rs, mymod/baz.rs all see struct Foo, struct Bar, struct Baz with minimum fuss.

You would still have the 'obvious place to look', struct Vec is in vec.rs, struct Foo is in foo.rs , etc.

Might this be 'less of an upheaval' than the proposed switch from files to directories, whilst achieving the same end result in common cases (facades with one main item per file..)?

Maybe this could be done if the names match, or you could have a special way of defining a 'main type' in the module (struct self{}), or you could name it struct super::Vec {}.. not sure what the best way would be.

1 Like

First: I think tackling and improving the ergonomics of the module system is a valuable goal, and there’s a lot I like about this proposal philosophically. The identification of specific patterns that are issues for large libraries is :100: as far as I’m concerned, too.

However, I do have a couple reservations, and one of them is primarily about the experience of learning. In particular, I have a really serious concern along lines that @dikaiosune raised up-thread, based on my day-to-day experience outside Rust, in the day job. Work right now is a mix of JavaScript and C#, which have quite different module systems both from each other and from Rust. (This is a little long, but please bear with me, because it is going somewhere, and the context is illuminating.)

In JavaScript, files are modules,[1] which can export items or not at will, roughly analogous to Rust’s current system. In C#, namespaces, which are the thing most closely equivalent to Rust’s modules, are defined by declaring namespace TheNamespace.WhichMay.Be.Nested and they’re open to extension: you can reopen them pretty much anywhere and add members to them. In JavaScript, as in Rust today, you have to import members from other modules explicitly and globs are allowed but culturally frowned upon.[2] In C#, you open a namespace and automatically have access to all of its public members, which is much closer to (though certainly not identical with) where this would land us—especially in one important area.

We have a pretty rapidly growing codebase, with new services popping up, but just our main monolith has about 8,000 C# files in it, which have ~7,800 namespace declarations, of which ~7,100 are unique.[3] I tend not to have the latest version of that monolith checked out in my normal work environment, because I mostly work on the JS on the front end. I do, however, semi-regularly review pull-requests that are part of that codebase.

And C#'s module system is awful for reviewing pull requests without pulling all the changes locally and doing a compare locally, and for that matter without being in my Windows VM with Visual Studio open—because it’s usually impossible (and that’s not an exaggeration) to be able to know where a given item came from without having Visual Studio open. The fact that I cannot see where an item came from—ever!—means it’s essentially impossible to learn how the pieces of the codebase fit together without having Visual Studio up and running.[4]

I’m a reasonably sharp guy. But as a result of this design decision for C#, I have found it incredibly difficult to get up to speed on this codebase, largely because, well… I don’t actually like having VS up and running unless I need it. And it seems ridiculous to say I need VS up and running just to take on the meaning of a 10-line change. It’d be a lot easier if I could just open a file and have some idea whether a given type is coming from our own codebase, from a third party library we use, or from ASP.NET MVC,[5] or from the C# standard library. Put another way: my experience of C# would be much more pleasant and have a much lower learning curve if it were possible to meaningfully navigate a codebase using VS Code or Vim in macOS.

By contrast, in modern JavaScript, and in current-day-Rust, if I want to figure out where something comes from… well, it’s normally pretty obvious. Because in both languages we tend to use glob imports relatively sparingly, I can usually search the top of the file for ::the_name if it’s a bare item or ::the_namespace if it’s the_namespace::the_name.

What this highlights, to me, is that there’s a really important distinction to be made between learning the module system and learning a codebase via the module system.

We need to lower the learning curve of the module system itself. I’m 100% agreed about that. I’m also entirely agreed that it would be great to minimize some of the boilerplate and especially some of the duplicated boilerplate (use and mod and external crate, oh my!). But whatever we do in that regard, we should be very careful not to increase the difficulty of just reading a codebase, because that’s an incredibly important part of what learning a language-and-its-libraries entails.

Unfortunately, the proposal here seems to me to very much increase this difficulty, precisely because it would push Rust’s module system much more toward implicitness and as a result dramatically lower the discoverability and navigability of the codebase at a plain-old-text level. That increase in cognitive load is something I think we should avoid while finding a way to improve both the learnability and the ergonomics of the module system itself.


Footnotes

  1. modulo an important detail about whether the file includes export.

  2. When I do use them I still actively namespace them:

    import * as SomeModule from 'some-module'
    
  3. These are rough estimates: rg -t cs '^namespace .*$' | uniq | wc -l.

  4. It’s awesome when tools can help you by saving you time, but when you design a module system around the assumption that people will always have those tools available—and it’s quite clear that C# was designed in precisely that way—it means it can fall down horribly when those tools aren’t present.

  5. Why Microsoft thinks everything has to be capitalized…

7 Likes

I’ve been thinking a lot about the module system since this post and, in particular, since participating in the discussion about the extern crate inference.

While I like a number of things about the proposal of “directories determine modules”, I also wonder if it can be simplified more. In particular, I’ll disagree with the assertion that it’s “dirt simple”.

To me, one clear part of the proposal is that mod foo; declarations have to go. They’re a clear example of a paper cut that requires duplicate effort every time you make changes, which is a clear opportunity for friction in the development process and understanding. To me, the simplest fix here is just to say that any module in the same directory can use from it, without further ado. If we have that, what does the underscore-prefixed directory thing still buy us, and is that really worth it?

The big conflict appears to be that we would like the module hierarchy to emerge from the filesystem layout, while at the same time making item visibility easy to understand locally (preferably just by reading?). I also feel there’s some tension between the desire to have modules in a strict DAG versus allowing more creativity on how people structure their crates. I feel both of these ideas were not clearly named in the blog post (or maybe I just didn’t get it) – in general, I feel like there maybe also needs more discussion on the conceptual level on what today’s module system is versus ideas that are hiding in there or that cause complexity.

3 Likes

IMO this proposal solves exactly the wrong paint points:

I have absolutely no problem with mod as it stands. It’s different than other languages, but it’s logical and once learned not too surprising. Actually I rather like the explicitness of it.

OTOH the “path confusion” problem confuses me almost every time I encounter it. And the proposal does not solve it at all.

I’d also prefer a more local approach to the extern crate problem. I really don’t like that those have to be (or usually are) at the crate root. I also don’t like how extern crate injects a module (or something similar) into the crate root. I’d prefer something like:

use ::my_module::MyType from my_crate;
2 Likes

Regarding path confusion, we can simply decide on a different behavior (rel. vs abs.) whenever Rust 2.0 happens. We can just make all paths relative if that’s easier for everyone.

Please no... In Rust 2012, all paths were relative. I think I am speaking for everyone when I say anyone who used Rust in 2012 remembers absolute use statements being a large improvement. This was changed in librustc: Make `use` statements crate-relative by default by pcwalton · Pull Request #4174 · rust-lang/rust · GitHub, although I can't locate the discussion. It probably happened on the mailing list.

1 Like

Thanks, @chriskrycho, for your very thoughtful post!

I wonder if you could elaborate a bit on this point, especially since part of what I like about this proposal is some of the ways it improves code discoverability. But it is a mixed bag:

Discoverability improvements:

  • The full module structure is discoverable from the directory tree. This is already true today in a technical sense, but many single-file modules are not actually "significant" as modules; they're used purely to re-export one level up and never as a meaningful path.
  • The public vs crate-internal module structure is also discoverable solely from the directory tree. In particular, this makes it easy to tell when an item is likely to be re-exported elsewhere: when it is pub but defined within a crate-internal module.

Discoverability reductions:

  • The fact that public items from sibiling files are in scope definitely hurts discoverability, which is why it's a worrying part of the proposal. (Note that it's not enough to simply require useing these definitions--what you'd want is to know which file they come from, but the proposal tries to treat files as largely insignificant). On the flipside, though, aside from these items everything else needs to be used in a way that makes the location clear. So this means that there's a pretty limited scope in which to search: sibling files.

In any case, I feel that a proposal that improves early learning curve but hurts scalability is not great, and it's entirely possible that the one in the OP has this flavor (though it's hard to say without experience with it!). My hope is that we can keep iterating until we find something that's does better on these tradeoffs.

I wasn’t able to find this discussion either :-/

That said, I did encounter a couple of interesting data points:

If others have more links to prior discussions/critiques of the module system, I’d love to see them!

I feel that having a mod.rs file that’s in any way significant is the least intuitive part of things for me as a beginner.

If you want a structure like:

  • foo
  • foo::bar
  • foo::baz

Then your files on disk should just be

  • foo.rs
  • foo/bar.rs
  • foo/baz.rs

If that gets fixed, a lot of the other stuff like modules that have one main type aren’t a big deal. Honestly, the collections module could just have a pub use statement at the top to re-export the main types of all its sub-modules if we need it.

4 Likes

I just ran into some other problems where I found the module system really stupidly unintuitive. For me this has a lot to do with use paths being absolute rather than relative, which makes it counter-intuitive and weird when you move some code around.

I had written some code like this, in the crate root:

use std::borrow::Cow;

enum State {
    Foo,
    Bar,
}

fn fun(state: State) -> Cow<str> {
    use State::*;
    match val {
        Foo => Cow::Owned(String::new()),
        Bar => Cow::Borrowed(""),
    }
}

#[cfg(test)]
mod tests {
    use State;
    use fun;
    use Cow;
    #[test]
    fn simple() {
        assert_eq!(fun(State::Foo), Cow::Borrowed(""));
    }
}

I then moved that code into a separate module, which gave me errors like this:

   --> src/newmod.rs:124:9
    |
    |         Bar => Cow::Borrowed(""),
    |         ^^^ not found in this scope
    |
help: possible candidate is found in another module, you can import it into scope
    | fn fun(state: State) -> Cow<str> use newmodule::State::Bar;

In other words, items that are right next to each other (and apparently can reference each other via use in the expected way) suddenly can’t see each other because they’ve moved to another module.

After fixing that (I had to really look at the hint to understand what had gone wrong), there was a similar problem:

error[E0432]: unresolved import `fun`
   --> src/fun.rs:137:9
    |
    |     use fun;
    |         ^^^^^^ no `fun` in the root. Did you mean to use `newmodule`?

Now, this one had thrown me for a loop before, because intuitively I tried use super::fun; which didn’t work, before I landed on use fun;, even without truly understanding that this was because of absolute use paths.

By the way, I think the way that use statements in a nested module are warned about as unused even if they are being used right there in the same module is totally weird, too. I guess this is because #[test] is actually some kind of conditional compilation, but that’s totally not obvious from the annotation itself. Took me a while to figure out I needed to set #[cfg(test)] on the tests module.

So for me, this is the number one thing that keeps being unintuitive about today’s module system. Actually the way the crate root has access to things directly actually makes this worse, because as long as you’re writing stuff in the crate root, you don’t notice how you have to use absolute paths. This is why I’ve been thinking the whole concept of the crate root is problematic: it is special in some senses and makes your code behave differently than other modules, and because it is the first module most users will encounter, it makes learning even more tricky. I’m wondering if we can make the crate root less special in some ways that don’t break backwards compatibility too badly.

3 Likes

To counter all the "the module system is so confusing" talk, you might also be interested in this internal dropbox study where the module system was rated as simple by a majority:

(image taken from this video, around 37 minutes in).

4 Likes

Counter-counter point :slight_smile:

4 Likes

More seriously, the fact that a nontrivial number of people do find it confusing, and that even for experienced users there are numerous redundancies and code reasoning issues, suggests that we ought to at least explore this territory. Consider the number of people on this thread who have at least agreed that the problems stated are real, even if they disagree with the strawman.

Rather than arguing about whether we should do anything, I’d love to keep working together to iterate on ideas for improving the module system while keeping its benefits intact (and trying to dig into what exactly the benefits are). This blogpost was one strawman proposal (and I know you’ve seen some of the radically different variants before); I’m working on a follow-up post right now taking a rather different approach and set of priorities.

7 Likes

I in fact I agree that there is room for improvement on some aspects. But I do disagree that such a radical change is needed or the only way to fix this issue. My favoured approach for the facading pattern is the inline mod proposal. Also the weird exception that the crate root is special should be fixed. Both these issues are independent from dropping the explicit mod syntax. I disagree that mod foo; is useless or boilerplate. I've brought many arguments about the shortcomings of approaches without them, I won't repeat them here for brevity's sake.

And I disagree that not knowing whether an item is reexported or not is such a bad issue that the new system must make this impossible.

I also see the ability to separate the public API from the file system layout as a feature, not a liability. Maybe this is because I came from C/C++ where the public api and file system are completely separate and I have seen the value of this.

Many of the proposals I have seen were very radical and were dropping useful features. And even if you said it means no worsening: I clearly prefer no change over some radical new invention. Rust is barely 2 years stable, and has still to prove to be not a moving target that exchanges core concepts quickly.

3 Likes

These discussions grow long and complex, and it is hard to follow everything. Ultimately, I trust the core team to make the best decisions, so I’m not worried, and I hope the discussion will help in finding best ideas.

Now, about my personal opinion. Rust explicitness and clarity is one of the biggest reasons I love Rust. Rust is almost always “context-free”: this.field makes sure I am never confused where does a given variable come from, no exceptions and ? make sure that any nonlinear change in the code flow is explicit, Result makes errors clear etc. I sometimes spend more time “typing Rust code”, but I rarely spend time “trying to understand Rust code”.

While current module system might have some pain points, I’d like to say that is already quite good, and all in all “unconfusing”. After you understand a couple of simple things, it’s obvious and easy to deal with, even if sometimes inconvenient (like already mentioned moving code around requiring symbol fixes, or redundancies here and there). The last missing piece for me was pub(crate) feature, but after that, I can express what I want, and I always know what is going on. Since every technical decision is an art of compromise, I’d like to ask to make explicitness and context-free reading-clarity high-priority items when compromising.

Also, IMO, it matters very little how confusing or “different” the module system is for the newcomers. What matter is how clear it is after they get it. In business, the “stickiness” factor is more important than a raw number of new users. Example: code editors and IDEs come and go, but people stick for decades with Vim or Emacs, because once you get through an initial learning phase, it becomes your second nature and it just makes a lot of sense and just feels good. Many Rust features (like lifetimes, ownership, Result, Sync+Send etc) seem odd or even painful at the beginning, but after you get used to them using any language without them feels… wrong. Modules should be no different. So please, lets not design them for people that don’t use Rust yet, and focus on people that will be educated Rust users already (in the future). :slight_smile:

3 Likes

I draw the opposite conclusion from that slide. Most features have a majority rating it as simple, only 5 of 13 are rated “simple” by less people than modules. Our argument has never been that the module system is the most complicated part of our language, but that it should not be a particularly complicated part of our language because they do not provide a huge amount of functionality. I expect lifetime annotations to have a certain amount of inherent complexity, I don’t expect that from namespacing and privacy.

I do not think modules should be rated as more complex than iterators, C FFI, or basic traits, but in that survey they are. This validates the core belief that too many complexity points have been spent on modules, and we should make the system easier to use.

The number of simple ratings is not a zero sum game; in a perfect world, everyone would think every part of the language was simple and easy! This survey supports that modules are one of the areas most in need of improvement.

(Other incidental thoughts about that slide: not surprising cargo doesn’t do great, AFAIK dropbox doesnt actually use cargo. Interested in how badly trait objects did, at some point it would be worth seeing how that system could be made simpler and more intuitive.)


I also think Aaron’s gist should be taken seriously. That’s a really great collection of the statements that I knew I had read, but not taken the time to collect.

5 Likes

A year ago there was a group of Rust users that argued relentlessly that ? was not explicit & that we absolutely should not accept the RFC or stabilize it. But you give it as a good example of explicitness! Here's a great post by pcwalton remembering features that were once similarly argued against as too implicit, but today are taken for granted by everyone. While I've tried to be empathetic and open-minded to the concerns people have about implicitness, I see this pattern reoccurring in our conversations and I consider it when I evaluate peoples' comments.

7 Likes

There is a problem with that IMO: you want the content of the module foo to be inside the directory really.

My suggestion is to have a way of promoting items from foo::bar, foo::baz into foo, (e.g. writing items like struct super::Foo{}, fn super::Baz() ). or 'self' as an identifier name. This would match the use cases we see so often: std::collections::vec::Vec, std::option::Option.. there is often repetition of a name. If it's restricted to a name matching the file, it would eliminate clashes.

Combined with 'not needing a mod.rs to bring things in', this might be enough. (you'd basically just need to make a file per submodule, and/or file per sibling-visible item)