The Great Module Adventure Continues


#1

UPDATE: I’ve got a summary comment here for the thread up to comment 76. =)


Hello all,

Welcome to another episode of the Great Modules Adventure! When last we met, our brave adventurers had at long last come to the fabled land, having accepted RFC #2126. There they took a brief respite, and made ready to begin with the Impl Period. In that period, a great much work was done, and indeed the outlines of the module system came to be implemented. But lo there were a few niggling questions to resolve still… and that brings us to this discussion thread.

In less flowery language: during the impl period, we made a lot of progress implementing the “Clarify and streamline paths and visibility” RFC (along with a number of related RFCs). In fact, we made almost too much progress – we implemented a couple of different variants, and I’d like to get a discussion started about what we really want.

Overriding goals

I thought it’d be good start by reminding ourselves of the overriding goals:

  • Make it simple to see where an import is coming from
    • Today, it’s not always obvious what comes from external crates and what is internal.
      • use foo::bar could of course refer to either one.
  • Allow paths that appear in use to also be used in fn bodies
    • A common mistake today is to use paths like std::cmp::min from inside a function.
      • (Not just for beginners; I do this fairly regularly -nmatsakis)
    • To add to the confusion, this works if you are at the crate root, but not otherwise.
    • It’d be nice if we had a guarantee that one could copy a path from a use somewhere else in the program and it would always resolve to the same thing.
  • Maintain backwards compatibility and avoid fallback
    • For implementation reasons, it is strongly preferred if we can avoid any notion of “fallback”
  • Avoid redundant extern crate declarations
    • When using Cargo, adding to Cargo.toml should be enough. One less thing to do wrong.
  • Streamline and clarifiy visibility rules
    • For most crates, we should be able to declare things in one of three modes:
      • private to the module (default)
      • private to the crate (use crate keyword)
      • public to the world (use pub keyword)
    • Lints can check when things are declared as pub but no in fact reachable from the crate root

Outline

I am going to propose two-and-a-half a few different variants.

  • First I will describe “extern paths”, which achieves all of our objectives, but at the cost of verbose syntax.
  • Next, I will toss out various other syntaxes that are equivalent to “extern paths” that have been brought up from time to time.
  • Finally, I’ll discuss the “absolute paths begin with crate name” variant that is closer to what the RFC proposed, but with various technical problems solved. This variant fails to achieve the objective that paths in use should also be usable in fns. (I’ll also try to compare this against what I think the RFC was proposing.)

I’ve also implemented a sample project in the two distinct styles, so you can get a feeling for how they look in practice.

Variant 1: extern paths

A “picture” is worth a thousand words, I suppose. Here is an example source file showing how this scheme works. The key point is that use extern::{..} path is used to select code from other crates, and use crate::{...} to select code from this crate, so the imports section looks like:

use crate::file_read::for_each_line;

use extern::{
    regex::Regex,
    std::{
        env,
        process,
        io::{self, Write}
    },
};

fn main() {
    ...
}

In this formatting structure, I am leaning on the “nested import groups” RFC to create a block of “extern imports” (which fits with many style guides). Of course it’d also be legal to do use extern::std::env as a standalone line.

Details:

  • In this version, absolute paths always begin with a keyword:
    • use crate::foo::bar – selects foo::bar from the crate root
    • use extern::regex::Regex – selects Regex from the crate regex (no extern crate needed)
    • use self::foo::bar – selects foo::bar from the current crate, starting at the current module
    • use super::foo::bar – selects foo::bar from the current crate, starting at the parent module
  • In-scope paths like foo::bar work as they do today
    • As today, they are not permitted in use P or in a pub(in P) positions.
  • Deprecations:
    • Absolute paths like ::foo::bar still work, but are deprecated. They are equivalent to crate:foo::bar.
    • Writing use P where P does not begin with an absoluate path keyword is deprecated but considered equivalent to crate::foo::bar
  • Backwards compatibility:
    • This proposal is fully backwards compatible and does not require opt-in, though all existing code would be using the deprecated style.
    • A rustfix utility could trivially convert paths.

Variant 1b: other syntaxes

There were various other syntaxes proposed around explicit absolute paths. The key idea here is to introduce new syntax for absolute paths. Some other proposals, along with my cutesy names for them:

Proposal  | Extern Crate       | Local Crate
--------- | ------------------ | -----------
 @crate   | use @regex::Regex  | use @crate::foo::bar
 @::      | use @regex::Regex  | use @::foo::bar
 [crate]  | use [regex]::Regex | use [crate]::foo::bar
 []       | use [regex]::Regex | use []::foo::bar
 :        | use regex:Regex    | use crate:foo::bar

Some of these may be ambiguous in expression position, I don’t know. I’m terrible at finding ambiguities. @petrochenkov will have to tell you. =)

Variant 2: absolute paths begin with a crate name

Here is an example source file showing how this scheme works. The key point is that in use foo::bar, foo is assumed to be a crate name, and use crate::foo is used to select from the current crate, so the imports section looks like:

use crate::file_read::for_each_line;

use regex::Regex;

use std::{
    env,
    process,
    io::{self, Write}
};

fn main() {
   ...
}

Details:

  • In this version, absolute paths are not syntactically distinguished, but instead distinguished by where they appear.
    • As today, use P and pub(in P) would be absolute paths, but other paths are in-scope.
  • Absolute paths are assumed to begin with either the keyword crate or a crate name:
    • use crate::foo::bar – selects foo::bar from the crate root
    • use regex::Regex – selects Regex from the crate regex (no extern crate needed)
    • use self::foo::bar – selects foo::bar from the current crate, starting at the current module
    • use super::foo::bar – selects foo::bar from the current crate, starting at the parent module
  • In-scope paths work as today.
    • You would be able to “switch” from an in-scope path to an absolute path by using the :: prefix:
      • e.g., ::crate::foo::bar or ::regex::Regex.
    • Note that crate::foo paths cannot be used in a fn body – rather you type ::crate::foo
      • see the “Niggly parsing ambiguity” section below for an explanation
    • Open question: What should happen with self and super paths? For consistency, I might expect ::self and ::super, but of course the older forms exist and are unambiguous.
  • Backwards compatibility:
    • This proposal requires opt-in. Code like use foo::bar or ::foo::bar changes meaning under this proposal, since foo is assumed to be a crate name.
    • In leadup to the new epoch, we can deprecate paths like use foo::bar where foo is not a crate and offer a rustfix like tool to convert to use crate::foo.
  • How is Variant 2 different from the RFC? Two differences:
    • No use of fallback in name resolution. This is technically challenging. This may be worth revisiting. This is what makes the transition harder.
    • Using ::crate::foo within functons and not crate::foo. This wasn’t clear in the RFC, actually, but I think was the assumption that we would do the latter. This however ran afoul of a niggly syntactic ambiguity, discussed in the next section.

Variant 3: leading ::

(Proposed by @matklad, added as an update)

A kind of blend of Variant 1 and Variant 2 – in this version, absolute paths look like:

  • ::<crate>::path, where <crate> is either the keyword crate (local crate) or the name of a crate (e.g., ::std::cmp::min)
  • self:: and super:: paths expand to ::crate::path::to::self::or::super::module, basically.

A use statement uses absolute paths, so you write use ::std::cmp::min or use ::crate::foo (or paths relative to self and super). Here is an example:

use ::crate::file_read::for_each_line;
use ::regex::Regex;
use ::std::{
    env,
    process,
    io::{self, Write}
};

This has the advantage of having paths in a body be a subset of paths in a use. It also has relatively concise absolute paths (e.g., ::std::fmt::Debug vs extern::std::fmt::Debug), which can be useful.

A niggly ambiguity

One thing that came up is that there is a parsing ambiguity around the crate visibility keyword and the proposed crate::foo paths. In particular, consider these tokens:

struct Foo(crate :: foo);

Is this meant as a field with crate visibility whose type is ::foo? Or a private field with type crate::foo? The various schemes I proposed above lead to different interpretations here, I think.

  • In variant 1 (extern paths), this would be parsed as a private field of type crate::foo. This is because ::foo paths are deprecated, so there is no reason for us to prefer the crate (::foo) interpretation.
  • In variant 2, in contrast, this is a in-scope context, and hence paths cannot begin with crate here. Therefore, we should parse that as crate (::foo) – i.e., a field of type ::foo with crate visibility.
  • In the original RFC, i.e. where you can write crate::foo in an in-scope context, then there is no clear or correct interpretation for this ambiguity, but we could pick arbitrarily.

My personal opinion

I prefer some version of Variant 1, but maybe not extern. The clarity and simplicity of the proposal is very appealing, as it the fully backwards compatible aspect. I love the idea that there aren’t “two kinds” of paths in Rust.

Also, please note that variants 1 and 2 both exist and are usable in the Nightly compiler today. So you can try converting your project to these styles (or make a new project) and see how it feels!


#2

I feel like I must be missing something because variant 1 seems far stronger to me. It’s more backwards compatible, it’s more consistent by not having “two kinds of paths”, deprecating ::foo paths makes the whole thing simpler, and the nested import groups feature completely negates the rightward drift penalty and arguably mostly negates the verbosity penalty.


#3

@nikomatsakis What is the motivation behind the crate and extern keywords? It seems redundant as when you’re writing a program you already know that utils is your module and serde_json is an external dependency so this extra piece of information isn’t really adding to the comprehension of a module. What are the problems you encountered with the current pub(crate) syntax? Are those problems strong enough to warrant promotion to a dedicated crate keyword?


#4

I definitely prefer some form of variant 1. The main thing I think is that the signifier for an absolute path should be very short. It already feels when you use ::std like you are doing something wrong, and that’s only 2 characters.

Looking at the particular syntaxes @nikomatsakis suggested, we can see that the key thing here is that we need a syntax containing the name of the crate which can be unambiguously prefixed to an absolute path. This could mean sigils before the name, around the name, or replacing the :: that otherwise would connect the name to the path.

Here are some syntaxes that I don’t necessarily like, but that “work”. My point in listing them is to demonstrate that the space of possible solutions is very wide:

  • use std://collections::HashMap;
  • use from std collections::HashMap;
  • use std->collections::HashMap;
  • use crate<std>::collection::HashMap;

One I’ve always liked (but others have not) a single colon prefix:

  • use :std::collections::HashMap;

#5

Does it feel wrong because it is genuinely wrong, or because it is just unusual(that is, not familiar)?

I personally use the leading :: form a lot, and now for me std::foo feels wrong, and ::std::foo feels right.


#6

I definitely think this is a part of the dynamic, but I do think being overly lengthy is a problem. For example, the extern syntax makes something like this, which I do often, feel quite bad:

impl extern::std::fmt::Display for MyType {
    fn fmt(&self, f: &mut extern::std::fmt::Display) -> extern::std::fmt::Result {
        // ...
    }
}

#7

To me ::std “feels wrong” mostly because of the many, many weird things in C++ that involve having or not having a leading :: on very specific paths. Of course, Rust doesn’t have things like ADL, so personally I don’t trust my intuition on whether or not that syntax would be “genuinely wrong” for Rust. All I can say for sure is that I haven’t yet seen any alternatives that I like better than extern.


#8

Totally agree about verbosity being a problem!

It seems like ::std::foo and ::crate::bar would be pretty close to the minimal possible verbosity, while still being explicit.


#9

I’m feeling pretty good about Variant 1, honestly. I also hear @withoutboats here, but would expect that I’d reach for use a bit more often.


#10

Unfortunately my understanding is that double colon prefix is out because it already has meaning. :-\

A part of why I like : is that its very similar to ::, though that’s also why other people don’t like it.


#11

I am a huge fan of variant 2, and I’m a bit taken aback by this thread because I thought there was more consensus around that version from previous discussion.

The issue for me with variant 1 is that it “unbalances” the namespace. I think of the crate:: prefix as a stand-in for current_crate_name::, which is important because 1) idiomatic Rust uses multiple crates with the same name (a library and a binary) and 2) it’s like self in that it enables you to rename a crate without a huge diff. (This is also why ::self and ::super don’t really make sense to me- they’re not crates and thus not top-level items in the namespace hierarchy.)

With this in mind, variant 2 places crate:: and some_dependency:: at the same level in the namespace hierarchy, while variant 1 and the status quo place the contents of the current crate one level higher in the namespace hierarchy. Variant 1 avoids some specific pitfalls here, but it feels klugy to me because it doesn’t match other languages which place every library’s namespace at the same level.

One other important point that came up in previous discussion is that std::-in-expressions only works in the root today because of an implicit extern crate std which follows the same behavior as other dependencies. To make it work with extern crate, we could simply add use std to the prelude and let it work everywhere, making it less of a special case without contorting the namespace hierarchy.


#12

Several of the variants seem to resolve this concern by treating crate the same as the name of dependencies - @crate::foo::Bar, [crate]::foo::Bar, :crate::foo::Bar, crate:foo::Bar all treat crate as equivalent to regex or serde.


#13

As far as variant 2’s backwards compatibility and fallback, I do think that should be revisited. If I remember the previous discussion accurately, the only case where currently-valid code would actually change behavior under variant 2 is when you simultaneously a) have a top-level item with the same name as a Cargo.toml dependency and b) either don’t use that dependency, rename that dependency, or put its extern crate only in submodule(s).

This set of cases is small enough that any fallback should be able to check for those cases explicitly, at the definition site of the top-level items. If none exist, which I expect to be the common case, ::foo::/use foo:: can behave the old way with a deprecation warning (as would exist in variant 1 as well). If one exists, ::foo::/use foo:: can resolve to the top-level item rather than the dependency.

This arguably has better characteristics for people who are updating code to the new style. Variant 1 requires all absolute paths to change, either by adding a crate:: or by adding an extern:: (or whatever style from 1b). Variant 2 only requires internal paths to change, by adding crate::, and allows dependency paths to remain as they are.

Edit: This fallback would also enable the less-verbose and more-familiar ::dep/::std to keep working.


#14

That’s true, but they also introduce an entirely new syntax that has the same problem as vanilla variant 1 of requiring every absolute path to change.

IIUC, one of the reasons for these variants is to make absolute paths in expressions feel nicer. Most discussion I’ve seen around this is more specifically to make std:: nicer, which seems to be an issue when you start hitting the boundaries of what’s in the prelude. Like I mentioned previously, putting std itself in the prelude is enough to handle this case, while leaving arbitrary dependency crates to ::dep or, as I prefer, use dep.


#15

Yeah, that’s true that :: is already used in the current system. OTOH, it would be a shame if we have to use a noisier syntax instead while :: is just deprecated and not reused at all.


#16

Are we 100% sure that this is a hard constraint? Could we just break things and epoch out of fallbacks?

“for implementation reasons” does not sound like a solid motivation in itself…

If this is not a hard constraint, I suggest seriously considering ::std::foo / ::crate::foo flavor of variant 1.


#17

I don’t agree with the first goal being important (“Make it simple to see where an import is coming from”).

I’ve ran into almost every other module-related problem that has been discussed (and I appreciate the improvements being made), but never this one:

  1. I don’t think it’s hard to remember what code is yours and isn’t. I know std and serde aren’t mine, but I have frobnicator module. Even for non-trivial projects the files on disk, often always visible in the editor, are a good reminder.
    While you could have both module and a crate with the same name and interchangeably use equivalents of use extern::cookie and use crate::cookie throughout the codebase, this will make all cookie::foo ambiguous on their own (checking top of every file each time is a hassle), so the use syntax isn’t helping enough, and you’re better off using mod mycookie instead.
  2. I think the distinction is actually counter-productive, because it adds friction to graduating modules to their own crates (e.g. if my frobnicator module grows big, I’ll move it to be a crate in my workspace). For projects heavily using workspaces the this-crate/not-this-crate distinction is not useful. It’s almost an irrelevant implementation detail. A useful distinction could be my code vs 3rd party code, but the proposed paths syntax doesn’t indicate that accurately in workspace-based projects.

#18

I think this backward compatibility of Variant 1 is essential. But deprecating every single path that is legal today is extremely disruptive. I think this deprecation should be very very gradual: implement a lint, but make it silent by default at first. Consider only making it warn by default in the next epoch.


#19

I prefer variant 2 more than variant 1 because the current crate and other crates are referenced at the same level. That is to say, conceptually at a high level all crates exist in the same namespace (the package registry), and any particular crate selects a small subset of these crates to reference things from.

In variant 1, an “extra” level is created to reach outwards; this is nice because it helps with backwards compatibility – given that the current system encodes some things that don’t play well with global namespace view.

In variant 2, we put all the crates on a single level, and we have to live with some short-term incompatibility issues to work through in order to get to the more consistent conceptual solution.


#20

So, there are two things going on.


On the implementation complexity:

I was a bit vague there, and in part because I don’t remember the full state of play here. What I do remember from my time trying to under name resolution – particularly in the face of globs, macro expansion, and other things – is that it is a highly complicated, fixed-point sort of procedure.

Consider things like this:

  • We resolve a path like foo::bar to some particular item.
  • We use that resolution to expand a macro; this macro creates a new item that would have taken precedence.
    • (In this case, for example, a macro might introduce an extern crate.)

That said, my understanding of the current resolve algorithm is somewhat out of date. I’ll need to go bring things back into cache to say convincingly.

Let me also quote @petrochenkov here, commenting on the RFC:

The RFC gives the recipe:

First, attempt to resolve the name as an item defined in the top-level module. Otherwise, attempt to resolve the name as an external crate, exactly as we do with extern crate today.

It’s not as simple as that. In presence of globs and macro-expanded items the “attempt to resolve the name” can give indeterminate result. We have to loop through a fixed-point algorithm possibly trying to resolve other imports before we can resolve this one. If only determined resolution failure results in the extern crate fallback, then globs and macros expanded in the crate root may break the fallback mechanism - it will return resolution errors instead. Or we can speculatively search and load extern crates on indeterminacy, which is not desirable. Given that the fallback will have to be supported infinitely, according to the epoch policy, it would be nice if it could be avoided.

Other petrochenkov comments: =)


On the general desirability of writing use std::foo:

But let’s assume that we can make fallback work. Then we still have to decide if we want to. Personally, I am not sure. In particular, it fails to achieve the goal that one can take a path that is used in a use and use it elsewhere, at least on its own. I personally think that’s a worthwhile goal, though of course we have to decide if we find a version and we like, and it’s worth the churn.