[lang-team-minutes] the module system and inverting the meaning of public


#48

It seems like you’re saying “the old defaults were good for unsafe code writing, but bad for usability [debatable]”. OK, that’s worth fixing, but Rust can’t give up its features that let you provably isolate unsafe code from safe code. How are the needs of unsafe code authors preserved with the proposed models?


#49

It does seem like that, but I’m not really. I think the needs of unsafe code authors just puts these needs into starker relief. In my opinion, the <= model is well-suited to any system that wants to support “closed” reasoning. i.e., it optimizes for making it easier to know just how much code can see a given item. To be honest, I’m not entirely sure what >= optimizes for – it seems like it optimizes for knowing what is the proper path to an item from a given point.


#50

But how you can recover “closed” reasoning with a >= model, when you need it in order to trust some unsafe code?


#51

ripgrep – you know the set of places you have to search to find a pub use, if any exists.


#52

Its important to remember this is always within a single crate.

This is also a rather narrow concern. You can’t re-export fields. It would have to be a type which if exposed could be used to break some invariants. That type would need to have methods which are “safe” but break invariants unsafe codes rely on. Given that situation, my own feeling is that you are already engaging in bad practice if you haven’t marked those methods unsafe. (Similar to the arguments in favor of unsafe fields).

I think this is more valid as a general ‘encapsulation’ concern than a concern about unsafe code in particular. The change would mean you have to be suspicious when you see a pub use during code review.

But as I already posted, we can have exact exposure if we give up the notion of hiding the true path of an item. We can also consider providing a less fine grained form of module privacy - you would just attach #![internal] to your modules that are “implementation details” that you don’t want showing up in the public path of your API. This would be equivalent to marking your module doc(hidden) and pub(crate).

Setting aside the backwards compatibility concerns (which are real, but again, are outside the scope of the discussion we’re trying to have), that seems very close to the ideal system to me.


#53

So @withoutboats and I had a pretty interesting chat today about the module system and privacy, trying to dig down to first principles. I’d like to summarize that here.

A rational reconstruction of the status quo

First, the module system currently couples together three concerns:

  • Namespacing
  • Privacy scopes
  • Interaction with the file system

While in principle you could separate these concerns, there are a lot of advantages to tying them together. (And of course, there’s some flexibility here given that inline mod declarations are a thing).

One consequence of this coupling, though, is that when there’s a mismatch between these dimensions, you have to fight the module system a bit. For example, it’s pretty common to have a submodule in a file which defines a single public type, along with a bunch of private code – where the module itself isn’t exported, and instead the public type is reexported at a different path. In that case, there’s a mismatch between what you want for privacy/file system – i.e., a separate file defining its own scope of privacy – and what you want for naming, which is to export the name at some other path.

For these kinds of mismatches we tend to use “facades” and other patterns. Generally these all involve making a submodule private, while some of its contents are public and re-exported. This is basically a “design patter”, a way of using the tools of our module system to achieve a certain goal.

One consequence of this design pattern is that pub has two distinct meanings:

  • The item is world-visible
  • The item is defined “in the wrong place” and re-exported elsewhere, but you need to trace the re-exports to discover its visibility.

The pub(restricted) model makes the most sense with the first meaning of pub, since the (restricted) part is supposed to decrease the level of publicity. But because of re-exports, that it doesn’t work out perfectly.

What this means for module system improvements

There are a fair number of ideas in flight for how to simplify and streamline the module system. I think that what @withoutboats has been working toward is basically doubling down on the three-way coupling mentioned above.

Here’s a strawman proposal bringing these pieces together (which came out of discussion with @withoutboats):

  • Introduce implicit modules, as per the original post. In particular, discover modules via the filesystem organization, implicitly introducing appropriate mod delcarations. (A file ty.rs leads to an implicit mod ty; delcaration, modulo privacy, discussed next.)

  • Default the visibility of implicit modules to be the maximal visibility of the items they transitively contain. Again, this is as per the original post.

  • Keep the semantics of pub(restricted) as they are today, in particular the <= interpretation of privacy. At this point there’s widespread agreement that <= privacy is very important to provide, and that pub(restricted) gives a good model for specifying it.

  • Introduce #![internal] as a module-level attribute that can be applied to implicit modules. It has the effect of setting the visibility of that module as pub(crate), regardless of the items it contains.

What are the implications of this design?

  • You get = privacy by default. In particular, if you don’t use #![internal], the stated visibility of an item is its precise visibility.

  • You always get <= privacy, as was the intent with pub(restricted) and our overall privacy system.

  • You can express the facade pattern, by using #![internal]. However, the fact that you write this attribute helps mitigate some of the downsides of the facade pattern: a reader of the code gets a visible, local heads-up that the actual visibility of the items is going to be determined by a super module that re-exports them (though bounded by their local restriction). In other words, we’ve made the design pattern a bit more first-class, and given you a way to write down your intent. But the attribute doesn’t introduce anything fundamentally new; it’s just a way of specifying pub(crate) for the module.

  • Increases the coupling between files and namespaces, by discovering modules directly from the filesystem. (It almost always works that way in practice today, but you have to explicitly set it up. We can make it so much simpler and smoother.)

  • Increases the coupling between privacy and the rest of the system, by automatically setting the visibility of implicit modules based on their items.

Where this all heads, in my mind, is an approach where the module system just sort of disappears into the background. You don’t really think about modules at all; you just think about paths, files, and items contained in them. Things like the facade pattern, while still expressible, are a clearly delineated and more “advanced” feature, so you can start with a simpler mental model (where you get = privacy everywhere) and later learn how to tweak it with #![internal]. I think there’s a real chance from going from “Rust’s module system is complex and hard to learn” to “What module system? I just write code where it belongs and get the namespacing and visibility I would expect”.

To the extent that things are more “implicit” here, I would claim that we are just using already-explicit information that would otherwise be redundant. In the vast majority of cases, the module hierarchy exactly mirrors the filesystem hierarchy; why force you to repeat that structure? It can’t be for the sake of explicitness – it’s already explicit, just represented in a different way. Likewise with privacy: the appropriate privacy of a module is usually implied by the privacy of its items. And when that’s not right, we actually give you greater explicitness by allowing you to express your intent via #![internal].

This also doesn’t change any of the scoping rules; you still have to use an item to gain access to it. So all bindings for a module are discoverable within the file defining it.


#54

So I’m excited about this plan. We’ll have to talk about the back-compat issues, but I’ll ignore them for a bit. I wanted to re-state something that we discussed on IRC that I didn’t fully appreciate (though you said it). That if you step back a minute from the system we have today, and imagine instead the system that we would have with this scheme, it basically looks like this:

  • You write code in files. You never write mod declarations (assuming you don’t use inline modules).
  • Within a crate, you can always name items from elsewhere in the crate at their “true path” (that is, the path implied by the location of the file that they are in).
    • Note however that you might get a privacy error if the item is not “sufficiently public”.
  • From outside a crate, you can usually name things at the “true path”, unless they resident (transitively) within an #![internal] module
    • In that case, there should be a pub use that re-exports them at some other path
    • It seems like this would be a good use for a lint: make sure that there exists some path to every pub item from outside the crate (and perhaps a unique path?)

What I wrote above isn’t quite right, maybe, in that you might not be able to name something from internal to the crate from anywhere else in the crate, because maybe the publicity of the module is “more narrow” than that (if it only contains e.g. pub(super) items). We’d have to work those rules out in detail. But it seems like we could even make it a “binary switch” (pub(crate) or pub, depending on whether it contains (transitively) a pub item that is not within a #![internal] module) and it would be good enough. After all, you could then always name the module but not its contents (if they are declared with a more narrow level of privacy).

That is certainly an appealing vision to me!


#55

I hadn’t noticed that, but it’s an excellent point. Again, it just feels simple and streamlined, moving in the direction of purely local reasoning about both visibility and namespacing.


#56

I’m arguing for keeping ability/necessity to explicitly name the crate structure within source; there’s multiple reasons, all mostly very particular to how I do my devstuffs.

Firstly, I use “UNIX as IDE”. I’m aware that is quite an uncommon use case, but that’s what I’m used to. My development routine involves more than just vim - cargo - run cycle. I tend to leave around a lot of files that do not necessarily involve my current checkout. For example this is what git status of my primary rust-lang/rust checkout looks like (last time I cleaned up was a few weeks ago):

HEAD detached at real/master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   src/librustc/mir/tcx.rs
	modified:   src/librustc_mir/transform/qualify_consts.rs
	modified:   src/librustc_mir/transform/type_check.rs
	modified:   src/librustc_trans/mir/rvalue.rs
	modified:   src/llvm (new commits)

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	src/bootstrap/config.toml
	src/libcore/target/
	src/librustc_mir/transform/inline.rs
	src/librustc_mir/transform/intrinsics.rs
	src/rustc/std_shim/target/
	src/target/
	test
	test.rs
	test.s
	test2
	test2.rs
	worktrees/

There’s some in progress playground that just persists between checkouts (e.g. transform/inline and transform/intrinsics) that I never got back to. These files would not even compile with current checkout. Since this is a pretty clean state, you don’t see a number of .rs files that most commonly appear because I checked out a differently named file from another WIP branch, an older commit or different project altogether (mostly to diff against or have open in a vim buffer for reference).

These files are never something I would like rustc to even consider for compilation, and to me it would be obnoxious if rustc changed to an implicit scheme. If it did, I would most likely spend some time to figure out why my code does not compile or even worse compiles and I accidentally depend on it without intending to check-in the temporary junk. And then some extra time every time to rename offending files to some other filename like transform/intrinsics.rs.this-is-junk-rustc-please-no-touch-import-compile).

Then there’s also partial outputs of commands such as what would result from something | vim - that I later write to disk so I wouldn’t have to craft the something command again. These rarely end up having the extension .rs but when the result of something is a valid(-ish) rust file, there’s no reason to save it with some other extension. I also don’t really pay attention to my $PWD, so it kinda ends up within src sometimes.


Quite a different concern (not something related to my current use-cases) is the tendency of various environments to restrict inspection of the filesystem. There’s a difference between recursively going through the whole tree looking for files ending in .rs and open("some/particular/file.rs"). From the standpoint of some sandbox implementation anyway.

Then there’s filesystems which have arbitrary limitations not applicable to a module system (no directory support (or limited depth, how you gonna scan this for a module tree of all things? S3 is example), short filenames (ye olde FATs), case-insensitive filenames (macOS), etc etc).

If I tried I could probably think of many more reasons to encode module structure within the source and not derive implicitly from the filesystem.


As far as privacy goes, I’m content with current system. I’ll admit I had problems with it at some point, mostly surrounding std::{os,sys} weirdness, but never when authoring or contributing to libraries. With that in mind, I do not find a change here necessary, but I wouldn’t oppose any change to that part of the module system either.


#57

This is also my workflow. But I never leave junk files in my src directory, partly because I never cd into a directory that contains code files, I always operate one level above it. I actually make a clear habit of drawing this distinction between the directory which contains my crate and my ‘workspace’ above it, because I tree that directory regularly to remind myself of the module structure (because these are already closely coupled).

Its true of course that different module systems will provide better support to different workflows. If you create junk files as a part of your workflow, you will need to adopt the same discipline I have about where you create them. This is a trade off, but in exchange you get the various advantages of this system that @aturon has already laid out. But I don’t want it to be framed as a negative consequence for “UNIX users,” because as a UNIX user my experience would be strictly improved.


Not addressing any particular concern, in general the negative response these ideas have generated seem to me to be “the response of a puddle to a proposal we change the shape of its hole.” That is, the current system has been naturalized for people who use it regularly, and there is an inertial force in favor of the status quo.

Obviously no change is without downsides, but none of the downsides enumerated have seemed to me to even nearly equal the upsides of a different system (which are mainly to do with localizing information and simplifying the mental model). I really do think that many users have just normalized the costs of the current system and are writing them off when they should not be.


#58

One thing I think isn’t perfectly captured by this summary is this: we discussed this coupling in terms of Rust’s existing system; that is, right now we have already coupled namespacing, privacy, and file structure. In a totally decoupled system, you could imagine all of these scenarios:

  • A file which is at its own path with its own set of items in scope (today the only option)
  • A file which has its own set of items in scope, but for which its items are directly included into another
  • A file which is just verbatim included into another file

And of course other combinatorics of these three freely moving things. My impression of C++'s system is that these things are relatively decoupled, and I find it extremely confusing.

I would describe the partial coupling of our system today as a leaky abstraction - it partly joins these concepts into one, but not entirely, so that you can manipulate them independently, but in a sort of hacky and imperfect way (facading). This partial coupling is a large part of what leads to confusion about the exact meaning of our keywords - mod, use, even pub.

So I see this proposal as the culmination of the existing, partially implemented system. I see #[internal] as the solution to the major downside of completing that coupling.


#59

I’m still not convinced the described changes are necessary. The cost of the current system is two extra words per file mod name;, which is negligible. Okay, maybe pub as well, three words.

At the same time drawbacks described by @nagisa are real - you can’t temporarily “comment out” a module, you can’t keep unfinished stuff in the tree. All the directory traversing complexity leaks into the language - we need to somehow guess and ignore separate projects in nested directories and VCS directories/dot directories/etc. It may be okay for Cargo, for which guessing things is half of the job, but I don’t want this in the language.

I have an impression that the module system became too boring and familiar so people are eager to start changing things just because.


#60

One more concrete example for directory traversing. Imagine a crate with a build script build.rs and code in src directory. If build.rs included src implicitly as a module, that would be a pretty large build script! Supposedly, only some modules (like mod.rs or lib.rs) would include other modules implicitly, but I haven’t seen the rules written out.


#61

Definitely some directories should not be included by default. I would think src and src/bin are good examples.


#62

Both of these are unproductive statements to make. People have described the problems with the current system at length, which is certainly not in terms of “character count.” Moreover, your second statement is baldly assuming bad faith. This reaction makes this conversation much more difficult than it ought to be.


#63

I don’t have really strong feelings about the shape of the module system, but commonly write module subsystems that are organised to support development and expose an API that supports usability, whether by other modules in the same crate or external users.

The proposed system seems like it wouldn’t stop you doing that, so I’m happy with it. I wouldn’t mind some strawman examples of how common patterns would be expressed. I can’t be the only one who feels they aren’t fully grokking the implications yet :smile:


#64

I wonder if this is something we share too; when I started reading @nagisa’s post, I thought “well I use UNIX as my IDE too.” But I always stay at the root of the project, and open files from there.

Many of our users that describe difficulty with the module system describe this cost as much higher than two words; there’s a lot more mental overhead with even remembering that you need them at all, and then when it doesn’t work, figuring out how to fix it. Once you learn this stuff, yeah, it’s low overhead, but before you do, it’s much higher.

The people who have trouble consistently describe it as “foreign” or “alien”, not familiar. If it were familiar they wouldn’t have any trouble!

In fact, this has been one of my running theories about people who struggle with our module system: they assume it works just like the one that they use in their current language, and then hit a hard wall when it doesn’t work.


#66

My main critique on the proposal by @aturon is about the idea of “implicit mod” and “implicit crate”.

  • It seems that the idea is to make the language “simpler” and “easier to understand”. Well, I doubt that will happen. As these systems have big shortcomings, and apparently the idea to cirumvent those shortcomings is to introduce complexity – that previously wasn’t needed. Like @withoutboats suggestion to not include some directories by default like src and src/bin. Or just take the rust compiler itself. Its test suites (src/test/run-pass, etc) consist of single file crates that all share the same directory. With the new change one would either have to put each of them into their own directory, or add some special flag to the compiler. Both options are bad, as one introduces complexity in the directory layout, and the other introduces complexity in the compiler. And this makes the “it will make things simpler” argument moot.

  • Right now you only have to open lib.rs to find a list of all the crates used. I find this useful as I don’t have to open Cargo.toml this often. I use unix as an IDE and open files via the shell and not from my text editors drop down list for files so opening additional files incurs an extra cost to me. Also, it gives me a clearer overview. Cargo.toml has different notations for importing features and scanning the file doesn’t feel as easy as scanning the lib.rs/main.rs, especially as it sometimes mentions a crate name multiple times and in multiple places in the file (for a feature name, for the actual dependency, maybe part of the github url or path if its not from crates.io).

  • Its not just more useful for reading and understanding a program, you can also quickly comment out a module by just commenting out its mod line. With implicit modules, you’d have to open the actual file.

  • If you previously had a crate with a file inside and didn’t use it with mod, or, which is probably more prevalent, if mod sth; is preceded by #[cfg(...)] then including that file unconditionally would constitute a breaking change. If the mod contained non working code, projects that previously worked would break now. Breaking changes should only be done if there is a real and actual benefit to the change, and there is none to any of the proposed ones.

  • The case sensitivity issue pointed out by @nagisa . Both Windows and Mac have case insensitive but preserving file systems. Means that if you store a file as Awesome.rs and include it with mod awesome; then the OS will direct rustc to the correct file. Right now Rust allows non upper case mods to be declared, imported and used. So just assuming that a file named Awesome.rs must represent a mod awesome will mean a breaking change. I have thought about it and I haven’t found a way to make it possible to use non lowercase filenames together with implicit modules, so apparently everyone will be forced to lowercase their module names if they remove mod or use. Uppercasing file names is not something I prefer doing personally, but I guess some people like uppercasing. Also, you’d have to keep code in the compiler around for legacy projects that include upper case modules via mod (to not break their code).

  • There are proposals out there that suggest to replace mod with use, so that a file only gets parsed/opened if there is an use statement connected to it. I think this will cause ambiguity when reading code. What does use foo; mean? Is it for including a module? Is it for using a crate? I don’t like python and dynamic languages in general for their ambiguities, and I don’t want them to leak into Rust just to ease them to come to Rust.

Also, Rust 1.0 has been released. Rust is stable now. This means, its only going to gain features, but not lose them. What this proposal is about is a new feature, an additonal thing to learn. Will this make the language simpler? No, by the definition of stability, Rust is only getting more complicated.

If simplicity were really a concern, the only things you do should be to make pub(restricted) adopt <= privacy (I think it uses = atm?), to be consistent with the current system, and stabilize it.

I mean otherwise you are headed towards a mess where you have to support two systems for all eternity, unless the 1.0 promise was a lie.

All the books about Rust printed and written have the old system, all the university courses teach the old system, all the code has the old system.

Add that to the fact that the new system won’t be much of an improvement, and you got needless churn.

When Rust 1.0 came out I thought, great, now I can learn something, and apply that knowledge for the time to come, you know, like it used to be and still is with C, and don’t have to adopt to the newest fad. But seems I was mistaken and I need to change code constantly if I want to keep it idiomatic.


Revisiting Rust's modules
#67

Please do not include statements like this in your post. We all write them when we feel strongly about our position, but we should strive to replace them with more constructive feedback.

When directed toward me, as these are, they knock the wind out of me, and instead of considering & responding to the other comments you’ve made, I want to close the tab and go do something else.


#68

These statements are not directed towards you as a person, they are about the ideas. I understand that there is some personal effort inside that proposal, but this proposal being met with criticism or rejection doesn’t mean I doubt in your potential, or would dislike any new ideas from you.

I include these statements to make my opinion on the matter clear and put it beyond doubt. Otherwise I fear I’m misunderstood; that I liked things that I point out, while I actually don’t.