Revisiting modules, take 3

withoutboats · August 4, 2017, 9:06pm

This is a third module proposal, very similar to the second one but iterating a bit more. It clarifies a few things from the previous proposal and introduces one additional significant change.

The trouble with “everything is an item”

First, a digression.

In Rust, everything is an item except for let statements and expressions. Every module is just a collection of items (and modules are items), and you can access any item in the crate by naming its path. This means an extern crate, a use, a mod are all actually just items, which create new names in the present path context which alias names defined elsewhere.

This sounds very simple, and from one angle it is. But I would contend its the source of a significant amount of confusion around the module system:

Users often don’t expect extern crate rand; to bring rand into scope, and are confused about why rand::random works in the crate root, but nowhere else.
Users often report confusion about when to use mod and when to use use; they often feel that mod is a “weird use statement” because from their perspective, they’re using foo by declaring mod (not adding it to their build, and then it happens to be in scope because a mod is an item just like a use is).
use being an item is also not the behavior people expect, usually. They thing of use as bringing names into this scope, but not as “mounting” those names to be visible in other scopes. Here’s an example of the surprising behavior that this creates:

use std::io::{self, Read, Write};

fn foo() -> io::Result<()> {
    ...
}

mod bar {
    use io::{Read, Write}; // NOTE: No `std::`
}

As an advanced Rust user, you might appreciate the real brilliance of having these conceptual unifications, and see a certain aesthetical appeal in the way that use io arises out of these orthogonal rules. I certainly do. But as a user trying to figure out why I have to use std::io; in the crate root, and I can just use io; everywhere else (because I’ve stumbled upon that discovery by mistake), this is not a good UX.

I also know there are some users who don’t just appreciate this unification, but consider it extremely important. I’d be happy to hear this position more fully articulated - I haven’t received a clear impression of why this would be so important. But I’d also encourage users to keep an open mind to a system in which:

use is not an item; it brings a name into scope, but does not mount it in the module hierarchy at this location. It does not have a visibility attribute (no pub use)
A new keyword, export, is an item, and does take a visibility attribute.

That’s the biggest new idea of the proposal, but now into the full details of this proposal.

The proposal

Absolute path hierarchy

The absolute root of the path graph, from which paths beginning :: traverse, is not the module of main.rs or lib.rs, but a module which contains these symbols:

One module for each crate passed by --extern, which contain that crate’s module tree underneath it (that is ::std, ::serde, etc)
A special crate module, which contains this crate (from the previous root module).

That is, absolute paths are written ::std::iter::Iterator; to get something from another crate, and ::crate::module::Item; to get something from this crate.

Use statements

use statements are no longer items, and do not mount names, they only import them. That is, items imported with use cannot be accessed from other modules.

use statements take absolute paths beginning at the ::crate:: module, not the true root. So you have:

use module::Item; // an item defined in src/module.rs
use ::std::iter::Iterator; // an item defined in another crate
use self::child::Item; // an item defined in a child module
use super::sibling::Item; // an item defined in a sibling module

Only thing that’s changed in this syntax is :: on crate imports (and some sugar will make that better soon).

Because use statements are not items, they do not have their own visibility. Instead, we have a new kind of statement called export statements.

Export statements

export statements work a lot like use statements do today - they both mount and import names, impacting the resolution of relative paths in this module and absolute paths in other modules. There are a few key differences from use statements, designed to improve user experience.

First (and most importantly), the path they take is relative to the current module, not the crate. So you are replacing:

pub use self::child::Item;
// replaced with
pub export child::Item;

However, they can still take absolue paths with ::, just like use statements can (and again, the next section has syntactic sugar to make this nicer).

pub export ::crate::module::Item;
pub export ::std::iter::Iterator;
pub export super::sibling::Item;
pub export self::child::Item; // equivalent to not having self::

The last big difference is that export is pub(crate) by default. However, export can take any visibility scope, including pub(self) if you really only want to make this item visible in submodules.

The from syntactic sugar

To avoid having to write :: in use and export statements, we also add the from syntax. from takes a single ident, which is any of these:

The name of one of the --extern crates, including std.
The speical symbols crate, self, and super.

Both use and export statements can be prefixed with from, which desugars them to taking the appropriate absolute path:

from std use iter::Iterator; 

from super use sibling::Item;

from tokio export Service;

from crate pub export module::Item;

We could also possibly support a more complex syntactic sugar that takes multiple paths, but I haven’t worked through it fully:

from std use {
    cmp::{min, max},
    collections::HashMap,
    iter::once,
    thread,
}

Modules

Mod statements would no longer be necessary to pick up a file a new file in the crate. Instead, rustc would walk the files it knows to walk (see next section for more info), and mount a module tree from that, possibly before parsing any Rust code.

Files mounted this way would have a pub(crate) visibility, if you wish to change that publicity, add an export statement to their parent.

pub export submodule1;
pub(self) export submodule2; // if you are very concerned about using something
                             // from this submodule elsewhere in the crate

Though the names of modules are mounted automatically, they are not imported into their parent, and so they are not visible to relative paths from their parent unless they are imported with use or export. That is, you cannot use the name of a submodule without somhow bringing it into scope through use or export statements.

Modules of the form mod foo { /* code */ } would still exist, with no change to their semantics.

File lists

rustc will only automatically load modules that are listed through two command line arguments it receives: --load-files and --ignore-files. The files that it loads are all the files matching load-files, subtracting those which match ignore-files. Both take a list of filenames, supporting basic standard glob expansions.

When building a lib.rs or main.rs, cargo tells rustc to load any files matching src/**/*.rs and to ignore any matching src/bin/**/*.rs (that is, all .rs files in the src directory, except those in the bin directory).
When building a binary in the root of the bin directory, cargo tells rustc to load every file in the bin directory.
When building a binary in a subdirectory of bin, cargo tells rustc to load every file in that directory.

This introduces a firmer convention around multi-crate packages. In my survey, most multi-crate packages already conformed to this, though a sizable minority did not (mainly by having both main.rs and lib.rs in the same directory):

For packages with 1 binary and 1 library, the bin should be in src/bin while the library is in src/lib.rs. src/main.rs is only used by packages without a library.
For packages with multiple binaries, they should each sit in their own subdirectory of bin, none should sit in the bin directory directly.

However, some users will probably want to opt out of these defaults for multiple reasons:

They do not want rustc using glob imports to import all of the files in their src directory.
They do not want to structure their multi-crate packages in the same way.

For this reason, cargo will expose a way to control these arguments to rustc directly, probably through the Cargo.toml. My hope is that most users won’t do this, though, and it will result in a more consistent package structure across the ecosystem.

briansmith · August 4, 2017, 10:06pm

Files mounted this way would have a pub(crate) visibility

I think there's one question that should be answered, which isn't obvious: Is it actually important for modules to have non-public visibility? Or, does it really only matter that (non-submodule) items within modules have visibility controls?

If visibility on modules is actually important, then I think the default used everywhere else (private) should be the default on modules too. Conversely, if it isn't important (or can be made to be not important, by changing our conventions to be more item-specific) then there's no reason to have explicit visibility controls on modules.

withoutboats · August 4, 2017, 10:10pm

The reason we prefer pub(crate) is that even if a module exports no items, another crate could do something awful like:

from your_crate use module::with::no::items::*;

And now if you delete src/module/with/no/items.rs, that's a breaking change.

However, inside your crate, this is not a breaking change - you just delete that line that wasn't doing anything. I don't understand the reason to make modules less than pub(crate) - the benefit of it is that for the purpose of your own crate, items now say exactly what their visibility is - if they are pub(crate), they can be seen anywhere in the crate for sure.

Some users see this as a downside, but I don't see why you would mark an item pub(crate) that isn't supposed to be pub(crate). I think this is a bit of a holdover from previous patterns where you mark items pub and then make the module private to mean pub(super), but we have pub(super) now.

briansmith · August 4, 2017, 10:19pm

If this is the only issue then I think we should just get rid of the idea of visibility for modules completely, because the problem of people referencing a module with nothing in it hardly seems like an important problem to solve. Then we could just say that visibility is a property of items. I think that would be easier to learn than having different defaults for modules than other items.

aturon · August 4, 2017, 10:31pm

Thank you, @withoutboats! I think this variant has a lot of promise, and I'm feeling really optimistic that we're getting ever closer to a winner.

Here are some kneejerk thoughts:

The from form seems under-motivated here. IMO, if we let you write use ::std::blah no one is going to use the more verbose from form.
You don't spell out the rationale for not bringing submodules into scope automatically. I worry that this will be surprising and unergonomic. What's the upside?
Did you consider making export not use relative paths per se, but rather based on what's in scope, i.e. working the same way as paths in non-use items? I suspect that would be more clear.

Finally, I want to say that I think @briansmith's point has merit:

As we've been going through this module system work, I increasingly feel that the visibility of module names is totally insignificant (as opposed to the items within them). Taking that to its conclusion, though, would in practice require writing more constrained uses of pub than today -- we often lazily write pub items in a private module, when we really should be saying pub(crate) or something even more narrow. I see this as a bit of a wart in Rust code today, but to fix it we'll need to make pub(restricted) more ergonomic IMO.

It's also worth noting that if we remove module visibility altogether, export may no longer be as motivated.

phaylon · August 4, 2017, 10:34pm

I have been wondering in the past (and maybe even read a suggestion about it somewhere) if it would be possible to reset item visibility with a bare pub. E.g.:

pub submodule::SomePublicItem;

which in the context of this proposal would make submodule fully public, but all items in it are only pub(crate) except the SomePublicItem with specific visibility. This could help avoid having to reexport for visibility reasons only.

Just bringing it up since the mount/import split might make this appear less magic. And now that I’m done writing and looking at the newest comments, it might also be relevant in the “visibility for modules” topic.

comex · August 4, 2017, 10:48pm

To echo a point that others have made in previous threads:

I really don’t think it’s necessary to auto-load all .rs files in the directory tree, given all the drawbacks (for users who prefer alternate directory layouts, sometimes have stray files lying around, etc.). It should be sufficient to lazy-load modules when they’re referenced through the absolute path hierarchy; this would cause some confusion with modules that only have impls and not items, but it’s not that bad (such modules aren’t too common and the compiler could have smart diagnostics), and I’d say it’s ultimately less trouble than the alternative.

(signed, someone who prefers alternate directory layouts)

No particular opinion on the rest of the proposal.

arielb1 · August 4, 2017, 11:01pm

This - private use statements being accessible - is actually a fairly new feature introduced by @petrochenkov (item_like_imports). In Rust 1.0, use was like use with this proposal & pub use was like export (but with an absolute path).

It's actually a "feature" I don't like - it confuses me sometimes: e.g. librustc/ty/mod.rs contains a use ty::substs::Substs;, so you can use use ty::Substs; within any (of the many) submodules of the ty module but not outside of it.

petrochenkov · August 4, 2017, 11:07pm

And from the other angle it's simple as well, because it's objectively simple, and documentation can describe the rules in a few lines. This proposal, on the other hand, immediately starts with introducing more complexity^* (in very backward incompatible way).
The premise seems fundamentally wrong, it's oriented on people discovering the language purely by intuition and taking an invitation to read even basic docs as an insult. The language can't match every guess and expectation inherited from any other language, the new model won't match them as well.

(I won't proceed to the technical details, I didn't fully process even the "take 2" yet.)

^* To clarify, I'm not against introducing this separation in principle, I even suggested it myself in https://github.com/rust-lang/rfcs/pull/1976#issuecomment-301903528 (in backward compatible way), the "non-item" imports just shouldn't look like items.

petrochenkov · August 4, 2017, 11:10pm

The design was actually by @Kimundi (originally) and @nrc, and @jseyfried did the implementation, I mostly helped by staying nearby and being supportive.
(Before it was implemented, people actually reported imports being non-items as confusing! (There's no escape from this hell.))

rpjohnst · August 4, 2017, 11:24pm

I like these use defaults. In particular: Removing private uses as items child modules see is a nice touch. Bringing back intra-crate absolute paths in use is also a win, and is another step closer to backwards compatibility.

I’m not convinced of the mechanism, though- removing pub use and adding yet another keyword (export) with yet another path default (relative) feels like a step back. Could we perhaps implement it as an even narrower level of visibility, only for use items? It might be odd for different item types to have different defaults, but then again submodules don’t tend to rely on private uses of their parents. One exception might be mod tests { use super::*; }- this could use some clarification. (edit re @arielb1: apparently this is how it was until very recently!)

I would also prefer to preserve rustc's ability to discover an entire crate’s source files, without --load-files. While they’d both be rarely done, explicitly useing impl-only files seems much easier than managing globs. Further, globs still have the problem I mentioned in the last thread- deleting a file that’s not referenced anywhere else won’t trigger a rebuild in many build systems.

bluss · August 4, 2017, 11:56pm

The goal to remove mod is good.
Creating a module tree from the file system alone is not desirable, because it’s inflexible and stray files are included even if you didn’t ask for it.
- Instead, it seems ideal to create the module tree by combining explicit use and the file system. A file is not part of the crate unless it is used by the crate root directly or indirectly.
- use determines if a file is included, directory structure determines what the path to the module is in the namespace.
As much as I agree that it’s confusing that non-pub use “mounts” items, maybe that can stay and help solve the other goals with few changes.

For example, extern crate can maybe be removed like this: Add path prefix crate:: that simply resolves a path in an extern crate. (Alternative syntax: extern::). Intended use is through use statements.

withoutboats · August 5, 2017, 12:12am

Modules can't have no visibility of their own because of other constraints we have.

First, the less than or equal to rule of visibility: If you have an item that's marked pub(crate), we guarantee that it is never re-exported at a greater visibility than pub(crate) (and similarly for any other visibility restriction).

Second, we want to support facading, so that items are visible in the public API, but not where they are defined (for example, the flattening in the futures crate discussed in a previous blog post).

The only way to support both of these is to have pub items in a pub(crate) module. So we cannot have modules be public always.

ahmedcharles · August 5, 2017, 12:34am

I sort of have the same feedback to this proposal that I did with part 2.

use std::io::{self, Read, Write};
mod bar {
    use io::{Read, Write}; // NOTE: No `std::`
}

I think the issue with this is that I expect io in bar to refer to ::bar::io not ::io. Hence why I prefer relative use statements and if this read ::io::{Read, Write};, I don’t think it would be as confusing, because it would be obvious that it’s referring to the io in the crate root, which (if you search for io) is obviously added by the use statement. Like @petrochenkov says, having unusual syntax/semantics isn’t the problem, it’s having people learn them that’s the problem. This sort of stuff could be easily documented in the book, but the last time I checked, it wasn’t and I’m not sure why.

The book doesn’t seem to explain name lookup and it waits until after explaining mod, to explain use and it doesn’t have any example where use is not used at the crate root. So, given all of those things, why would Rust name lookup make sense, even to people who read the docs? Why are we trying to change things on the basis of learning Rust when the things we are changing are things that aren’t documented in an accessible way? If I missed the documentation somewhere, please feel free to point it out.

I like the inline mod proposal, because it’s a targeted fix for a targeted problem. I like the from <crate> use <path>; syntax because it removes the need to add external code as items into the root crate while preserving all of the consistency related to how normal use statements work. I suspect relative use statement would help with understanding what use statement refer to, but that’s not a guarantee and it doesn’t really add anything fundamental and it’s not backwards compatible, so I could take it or leave it.

withoutboats · August 5, 2017, 12:35am

Not convinced this is the right decision. The goal was to make the name appear somewhere at the top of the file if you're going to use it, but for many use cases this is not ideal (e.g. having a lot of use self:: statements is not great).

That only works if submodule are in scope by default (e.g. your previous point), but that sounds fine to me; the only difference then, as far as I can tell, would be that export would be able to see things that use had imported.

My belief is that the from form is more obvious and intuitive than the :: form. Here's the imports from src/ir/mod.rs in chalk:

use cast::Cast;
use chalk_parse::ast;
use lalrpop_intern::InternedString;
use solve::infer::{TyInferenceVariable, LifetimeInferenceVariable};
use std::collections::{HashSet, HashMap, BTreeMap};
use std::sync::Arc;

// the items after this line are at the *bottom* of the file for some reason:

pub mod debug;
mod tls;

pub use self::tls::set_current_program;
pub use self::tls::with_current_program;

Not to pick on Niko or anyone else, but I personally strictly separate my std imports, my extern crate imports, and my imports from other files in the module. So I'm already pretty unhappy with how this is organized. But reorganized to be better sorted, here's how it is with the "unsugared" form:

use ::std::collections::{HashSet, HashMap, BTreeMap};
use ::std::sync::Arc;

use ::chalk_parse::ast;
use ::lalrpop_intern::InternedString;

use cast::Cast;
use solve::infer::{TyInferenceVariable, LifetimeInferenceVariable};

pub export debug;
pub export tls::{set_current_program, with_current_program};

I think this is already much clearer, but I don't think its intuitive that :: means "an external dependency." As a new user looking at this file, I don't feel like I would be able to figure out why the first four imports are different; I suspect I would get exactly the wrong impression - that :: meant "from the crate root."

So I would prefer if the from sugar were the idiomatic way to import things from other crates:

from std use {
    collections::{HashSet, Hashmap, BTreeMap},
    sync::Arc,
}

from chalk_parse use ast;
from lalrpop_intern use InternedString;

use cast::Cast;
use solve::infer::{TyInferenceVariable, LifetimeInferenceVariable};

pub export debug;
pub export tls::{set_current_program, with_current_program};

To me, this seems very easy to interpret!

But several people have suggested out of band that everyone will just prefer the :: form, because it is shorter. I'd really like this not to be the case, but I'm not sure how. In order to be a successful improvement, the from syntax needs widespread buyin. Here are some ideas of how to achieve that:

Make the absolute paths extern:: instead of ::. Leading :: would stop being a thing in this world. I actually kind of like this "aesthetically," because it makes extern/self/super/crate the four keywords that "begin non-relative paths," making them more consistent. Now its not less typing to not use from.
Only support the multiple-path form of use with from, as a carrot to encourage from.
Declare from idiomatic and have rustfmt use it.

withoutboats · August 5, 2017, 12:53am

New variation on the proposal, generated from talking to @scott in IRC:

Changes to absolute paths

We drop the :: form entirely. Instead, all absolute paths begin with one of these keywords:

self and super, which mean what they do today
crate, which means imported from the root of this crate.
extern, which contains all of the external crate dependencies. So extern::std::iter::Iterator is the absolute path to Iterator.

Changes to use and export

Use and export cannot take an absolute path. Instead, their fully desugared form always uses from.

// allowed
from std use iter::Iterator;
// not allowed:
use extern::std::iter::Iterator;

However, we do have sugar, so that without from on use means from crate and on export means from self:

use module::Item;
// sugar for
from crate use module::Item;

export child::Item;
// sugar for
from self export child::Item;

RalfJung · August 5, 2017, 1:06am

Wait, that's legal? I thought I always had to import items from the super-crate.

Oh wait, you assume this is at the crate root. (It just literally took me more than a minute to realize that this is the reason the example works.) So actually this is use ::io::{Read, Write} inside bar, which indeed makes perfect sense if you buy into this "everything is an item" thing. Which I will freely admit I like

As already expressed previously, I like having a grammar of paths that covers all paths, making e.g. types in other nameable for error messages or for usage without use.

However, I don't like the fact that in this proposal, use module::Type and const x: module::Type = ... do not use the same rules to resolve module::Type. That's a step backwards compared to take 2. This is exactly the "relative vs. absolute path" problem, which to me in the single most confusing part about the current module system. use and export and other places that paths occur should be uniform in how they interpret any given path.

I also do not buy the motivation for the from syntactic sugar. From a UI perspective, I would then certainly expect to also be able to write e.g. from std::io use Read, Write; (as indeed other languages do, e.g. Coq -- and also Python, I believe?). However, such sugar is entirely redundant: We already have use std::io::{Read, Write}. Curly braces are currently needless limited, but they can be made to scale much better than from. I think it is much more consistent to be able to write e.g.

use ::std::{
  io::{Read, Write},
  cmp::min,
};

or even (with the "four roots")

use extern::{
  std::io::{Read, Write},
  regex,
};

rather than using from here.

I personally do not think "use is an item" is particularly hard to learn, and indeed it is beautiful, but I am also not extremely attached to the idea. If it confuses people, I don't mind that being changed. I suppose the hope is that export sounds more like this item being also available elsewhere, which is plausible. However, writing export foo:Foo will not actually make this item available to other crates, which one might expect from a command named "export". Furthermore, I am concerned that people will think that export is mandatory to export anything, i.e., that non-export things are in fact not exported.

Also, I find the role of use and export to be somewhat confusing -- from what I understand, export is (the new version of) use + being an item? IOW, export is like the current use (well, with changes to what paths are)? I expect the question "when do I use, when do I export, and what is even the difference between the two" will be a very common one.

RalfJung · August 5, 2017, 1:14am

Most of what I wrote in parallel to you posting this still applies, I just wanted to drop one more thing because crate as a keyword came up repeatedly here: I do not find it obvious at all that this means current crate. I know that pub(crate) also does that, but that's a different context. Reading crate::foo::bar, my first inclination is to parse this as "the bar item in the foo crate" -- just like how in URLs, the first item is special for many protocols (think: crate://foo/bar). Maybe that's just my brain being weird, just wanted to warn about possible misunderstandings here.

Also, it would be great if it was possible to have a name for an item that works both in the crate it is defined in, and in other crates. For example, if this crate is called foo, then something like extern::foo could be allowed. That would provide a way to communicate names that are meaningful without knowing which crate they are interpreted in.

withoutboats · August 5, 2017, 1:21am

This is appealing, but isn't it possible to depend on a crate with the same name as the crate under compilation? (And not uncommon, if you have both a bin and a lib form of a crate).

RalfJung · August 5, 2017, 1:38am

Actually, thinking about your latest proposal again, I think this is moving in the right direction. You didn't say this explicitly, but I assume use would be relative to the current crate like paths everywhere else. I mostly like the path syntax, except for crate but I also agree that ::path looks rather ugly. crate seems acceptable. Also, self::foo and foo should be equivalent in every context. (Your only mentioned absolute paths, not sure if you implied this.)

However, I think forbidding use extern::std::iter::Iterator is totally the wrong approach. That just destroys all uniformity of the way paths work. I do agree we need some sugar to avoid use extern::std::io::{...}; use extern::std::fmt; .... If that ends up being from rather than curly braces, that seems suboptimal to me, but whatever. But nobody should be forced to use the sugar, that would be just wrong. Your argument above was ":: is unintuitive", which no longer applies now that absolute paths always start with a keyword. So, I'd appreciate if you could give some motivation for why you want to enforce this as a hard error.

(I am not commenting on the other aspects, about file system control etc., as I care much less about them. I just hope in the end it will be possible to have files be anonymous modules or completely flat. )

Oh, I forgot about this. Two crates with the same name in the same crate hierarchy. ugh Can we deprecate that?^^

Topic		Replies	Views
Can we change the language term "item" to something else language design	18	2879	March 25, 2019
[Pre-RFC] Yet another take on modules language design	3	2918	March 25, 2019
Yet another module modification proposal language design	13	1389	March 25, 2019
Revisiting Rust’s modules, part 2 language design	118	14222	March 25, 2019
Relative paths and Rust 2018 `use` statements language design	43	5421	July 5, 2018