Relative paths in Rust 2018

This is a continuation of Relative paths and Rust 2018 `use` statements.

One of the final things that needs to get resolved for Rust 2018 is the precise contours of the module system changes. The current design is described here. We've been wanting feedback and discussion specifically around the details of the path syntax and people's experience with it. That part of the design is summarized as follows:

  • use statements take fully qualified paths.
  • A fully qualified path starts with the name of an external crate, or a keyword: crate, self, or super.
  • Outside of use statements:
    • Fully qualified paths work, and have the same meaning as in use, unless a local declaration has shadowed an external crate name.
    • Paths may also start with the name of any in-scope item, i.e. an item declared or used in the current module.

One anecdotal report from a few folks who have tried out the 2018 preview is that the path changes don't quite go far enough -- they still often find themselves forgetting to write a leading self:: in situations like the following:

// trying to use an item defined in a submodule -- missing `self::`
use foo::Bar;

mod foo;
// a common mistake when trying to bring variants into scope -- missing `self::`
use MyEnum::*;

enum MyEnum {
    Variant1,
    Variant2,
}

These situations are particularly bad in Rust 2015 because the code works without self:: at the top level module, but not elsewhere. Rust 2018's current design helps by making the code not work anywhere. This post proposes a way to make the code work everywhere.

At the same time, a few folks from the lang team have been exploring a variant of the 2018 design that would help address these issues. That's what I want to talk about here.

A uniform treatment of paths

One of the unsatisfying things about the 2018 design (which is also true of Rust 2015) is that paths work differently in use statements than they do elsewhere. This is perhaps most visible with the self:: issue mentioned above: within a function, you can freely say MyEnum::Variant1, but use MyEnum::* doesn't work in Rust 2018 (since MyEnum is interpreted as a crate name). The mismatch between these two styles of paths is a frequent paper cut, and also makes the language less uniform.

But it turns out that we can alter the 2018 design to make paths work the same way everywhere. In this "uniform" approach, we break down paths as follows:

  • Starts with crate, self, or super: the path is interpreted as starting at the current crate's root module, the current module, or the parent module respectively.
  • Starts with ::: the first name in a path is an external crate name.
  • Starts with an identifier:
    • If the identifier is in scope (e.g. declared or used within the current module), resolve it to the corresponding declaration
    • If the identifier is the same as an external crate name, resolve it to that crate
    • If the identifier is a name in the prelude, resolve it to the corresponding item

This is roughly the Python2 model of paths (more on that later), and the model used by shells and similar programs when doing path searches. Start with what's immediately in scope, and otherwise look at external crates and the prelude. In a sense, this treats the external crates and then the prelude as though they're in scope, just lower-priority than names declared in the current module.

Beyond the uniformity (use and other items all work the same way!), this design also:

  • Retains the benefits of the current Rust 2018 design: makes the top-level module and submodules work the same way; makes referencing external crates more ergonomic (since they don't have to be used in submodules to refer to them)
  • Makes importing from local items more ergonomic, both in the sense of eliminating the common mistake mentioned at the beginning of the post (forgetting to write self::), and also making the paths more concise.
  • Allows arbitrary hoisting: anywhere in your code, if you have paths like a::b::c, you can take any prefix of those paths, such as a::b, hoist it up to a use a::b, and then substitute the last component (b) for the prefix in all your paths (b::c). That's a natural transformation to shorten such paths.

On the other hand, relative to the current 2018 design, you can't always tell whether a given use statement is importing from an external crate or a local item (which, notably, is also the case in Rust 2015). The mitigation is that the external crates and the local declarations of a module are all relatively "nearby" things when reading code or keeping it in your head.

What to do about name conflicts?

The explanation of path resolution above includes a series of "if"s for the leading identifier case, but there's a question of what to do when multiple of them apply.

Let's say a path is ambiguous if it starts with a leading identifier, and that identifier could be two or more of: a local declaration, a crate name, or a prelude item.

Outside of use statements, we would resolve ambiguous paths in the following order: local name, external crate, prelude. In other words, much like Rust 2015 except that we add external crate names in front of the prelude. This is a core part of the Rust 2018 design, and is pretty much the universal expectation.

However, within use statements things are trickier, because of potential circularities when macros or glob imports are in play. While it may ultimately be possible to apply the same disambiguation order for use, the implementation is much more challenging, and it's not obvious that it's desirable. So instead, we can make it a hard error to write an ambiguous use statement, and instead recommend using a leading self or :: to disambiguate.

Other edge cases

One possibly surprising thing that follows from uniform paths is that this works:

use std::collections;
use collections::HashMap;

What's happening here is that the first use brings collections into scope, and the second use then imports from that in-scope item. (You can also see this as the first use adding a private collections item to the module, and the second use importing it as a relative path).

This is an unavoidable consequence of having a uniform notion of paths. But I'd propose that we include a warn-by-default style lint, suggesting rewriting the above as:

use std::collections::{self, HashMap};

Relationship to the current Rust 2018 preview

It's possible to make the current preview forward-compatible with this proposal by simply implementing the hard error on conflicts in use statements. However, the module system changes are a defining part of Rust 2018, so it seems best to instead ship with something closer to the final design if possible.

It should be easy to implement this proposal under a separate feature flag and include it in the feedback cycle we've kicked off -- the whole point of which is to gain experience with the new features and their variants.

A couple alternatives to mention

  • We could of course pursue implementing disambiguation for use statements immediately, but I don't think it makes sense to block Rust 2018 on it (since it's a corner case with easy workarounds, and forward-compatible with adding it later).

  • We may want to avoid re-purposing leading :: as proposed here. We may instead want to preserve the existing behavior of :: from Rust 2015. This is a relatively minor point, and I'd like to focus on the overall thrust of the ideas in this post first.

The Python2 -> Python3 story

Finally, it's well known that Python 2 had a pretty similar story around paths as what's proposed here, and that Python 3 moved to something more like the current 2018 design.

You can read the relevant PEP here, but the thrust of the rationale is:

As Python's library expands, more and more existing package internal modules suddenly shadow standard library modules by accident. It's a particularly difficult problem inside packages because there's no way to specify which module is meant.

Things look quite different in Rust. For one, conflicts are not about the contents of std (which isn't growing much anyway), but rather with explicitly declared external crates. Further, the proposal includes a simple and ergonomic means of disambiguation (:: or self::). And of course, Rust's type checking (and the hard error on conflicts) means that name clashes become evident immediately.

30 Likes

This makes me uncomfortable. It feels fragile. For example, if I have an external crate called "collections" that has it's own HashMap module, then, both of these compile, but, mean two different things:

use std::collections;
use collections::HashMap;

and

use collections::HashMap;
use std::collections;

So, when re-factoring (and re-factoring tools like in IDE's) would need to analyze use statements and such before re-ordering them. This might be unavoidable, but, if we required a specific keyword/pre-fix for external crate usages in all cases, might that prevent this situation entirely? Would the downside of that be worse or better than this sort of ambiguity?

In other words, the above, to get the "collections" crate HashMap would have to be:

use ::collections::HashMap;
use std::collections;

or

use std::collections;
use ::collections::HashMap;

which would both be equivalent to each other, but, would be different from:

use std::collections;
use collections::HashMap;

and

use collections::HashMap;
use std::collections;

would be a compiler error along the lines of, "Unable to resolve collections::HashMap".

Now, this would break non-use prelude paths like:

let x = std::time::SystemTime.now();

which would have to be:

let x = ::std::time::SystemTime.now();

instead. If that is too much of a backwards compatibility issue, then, what if the rule was relaxed to allow only standard prelude items to be resolved without the leading :: instead of prelude AND external crate items. This would mean your rule:

Would instead become:

Would that not get rid of the ambiguous cases? Would it be backwards compatible? Is there some big ergonomic issue that this brings up that outweighs the benefits to a lack of ambiguity?

2 Likes

No, this is not what's being proposed- order does not and will not matter. The proposal would make that situation into a hard error, and you would have to disambiguate your use collection::HashMap statement to either use ::collections::HashMap (for the external crate) or use self::collections::HashMap (for the relative use). This is kind of in between the "oops order matters" and "you always have to use ::" that you describe.

12 Likes

@rpjohnst So you are saying that this:

along with this:

means this:

Doesn't apply? So, if "collections" were an external crate in the cargo.toml, then, this would be a hard error, but, if no such crate were referenced in the cargo.toml it would be a lint warning?

I guess that's better, but, it still feels fragile (and difficult for the compiler to get right??? @nikomatsakis ??? )

Wait, so this is now full 1path? Wasn’t that previously shot down on principle for making name resolution intractable?

(I mean, even just on its own it needs to be iterated to a fixpoint, but when you bring macros into the mix…)

1 Like

Yes, this is now full 1path. It doesn’t require full iteration to a fixed point, though, precisely because of the restriction on module/crate name collisions. That simplifies name resolution drastically. (We could potentially support shadowing in the future, but that’d make name resolution harder.)

I really like this because this achieves 1path and is very close to what I’ve proposed in The Great Module Adventure Continues, + ambiguity rules :slight_smile: !

Technical questions/points:

  • Am I correct that foo and self::foo are almost equivalent? self::foo won’t work, but foo work will only if foo is in prelude/extern crate.

  • Looks like an alternative formulation is “we add extern crates to prelude, overriding existing prelude items”.

  • After this proposal, you’ll almost never write self::, ::foo, which feels like a nice simplifications?

  • Behavior with respect to ::* imports should be specified. Currently, they shadow prelude and are shadowed by anything else.

  • Ambiguity rules are an interesting solution! I think it even might help with IDE performance, because you’ll be able to stop early during transitive imports resolution. However, we must specify how they work with type/value/macro namespaces? Is it an error to use regex when you have fn regex() in scope as well as an the regex extern crate?

6 Likes

Right. (Assuming that by your second sentence you mean "the only case where self::foo won't work but foo will is ...")

That's exactly how I think of it, yes.

Right.

3 Likes

This looks like a breaking change, since such paths in 2015 are equivalent to the proposed crate:: (they can refer to any kind of item at the root of the crate). Not having breaking changes was a big selling point of the previous proposals.

2 Likes

As mentioned elsewhere in the proposal, “We may instead want to preserve the existing behavior of :: from Rust 2015.”. That’s the approach I would favor. In particular, it shouldn’t be particularly hard to have :: follow this resolution algorithm:

  • First look at crate::
  • Then look at extern crates
  • Then look at the prelude

That would allow all of ::top_level_module::foo , ::some_crate::bar , and ::std::baz to work.

Huh? The previous proposal (which is currently implemented) implies a huge breaking change on edition switch (but no breakage without edition switch), look at any "2015 -> 2018" transition PR.
This proposal seems to keep the same approach.

EDIT: ::something already means extern::something in edition 2018 in the current implementation.

Exactly!

This proposal seems to make it tractable by declaring any ambiguity an error:

This still sounds a bit fishy, but I think we can at least to try implementing this.
E.g. select any suitable resolution during resolving imports (greedy algorithm), then proceed and perform an extra disambiguation checking post-processing stage after import resolution is complete.

The danger is that such greedy algorithm can select one of multiple possible solutions, and which one chosen exactly may depend on things like item order in a module, or expansion algorithm details.
Right now we are stopping import resolution with an error on any possibility of ambiguity so if resolution succeeds, then it's the only possible solution.
However, If we continue stopping import resolution on any possibility of ambiguity with "full 1path model", then I suspect we'll very rarely be able to make any progress.

EDIT: Perhaps for imports we can consider only candidates "planted in modules" + candidates from "known in before sets" like extern crates passed with --extern, then we can make the ambiguity detection sound.
Basically, we can try to approach "1path" (which seems to be the simplest and least confusing mental model) as closely as technically possible while keeping fundamental properties like uniqueness of the found solution.
I need to think about it.

2 Likes

Has it already been considered (a quick glance around didn’t turn up anything) to make use extern:: the prefix for importing stuff from external crates?

In that case we have self, crate, super and extern as “special prefixes” and as extern is already a keyword there should be no conflict with the current rules. It’s also very explicit about pulling in an external dependency but not nearly as verbose as the old extern crate ...; use ...;

5 Likes

This alternative was actually implemented too for experimentation purposes.

#![feature(extern_in_paths)]

use extern::regex::Regex;

fn main() {
    let r = Regex::new("a+");
}
6 Likes

Oh great, apparently my google-fu is really lacking at the moment. Definitely have to do some experimentation around these 2018 epoch related changes.

Can you elaborate a bit more on this? You talk about "soundness" later but I'm not sure what you are trying to avoid.

One thing I could imagine is that we might "incorrectly" run procedural macros -- i.e., because there is an ambiguous path, and one of the options is a procedural macro -- so the choices that resolve makes are observable via side effects.

The other thing I could imagine -- but I don't think can occur -- is that by making some choice, we would wind up inadvertendly ruling out the other choices. But I think that as long as we fully expand macros and things, if there was ever an ambiguity, we should always find it eventually and error out -- is that not true?

I think we might need stronger ambiguity rules. Something feels fishy to me about making use different from other paths. Imagine we had this setup (and we are using lexical macro resolution), and that we have two external dependencies named foo and bar:

foo::macro1! { ... }
bar::macro2! { ... }

Now, we are in a bit of a bind. If we execute foo::macro1 first, it might generate a module bar – that module then shadows the external crate bar. But if we execute bar::macro2 first, it might generate a module foo which shadows the external crate foo.

Perhaps these are the soundness concerns that @petrochenkov was referring to.

We can sidestep these by making it an error to shadow from any place.

But that does raise concerns when it comes to the prelude, since we want the ability to expand that.

7 Likes

Don’t we have the same problem already? We have shadowing behavior for ::* imports, and there are use self::foo today.

That is, is this something genuinely new, or is just old behavior, present in more cases?

To be honest I’m not entirely sure =) but I suspect we can have this problem today in some cases.

I think what I am arguing here is that this would be really common: that is, any time that you have two relative macro invocations that come from external crates, you are stuck and cannot proceed – unless I suppose you use an explicit form or pull those paths into use statements, at which point you’re fine, because if executing (say) foo::macro1 created something that shadows bar, it’d be an error anyway. (Note: If we made shadowing a hard error universally, we’d be fine.)

Maybe this is ok – sometimes you get weird errors from the name resolution system saying “you need to disambiguate this path for me to proceed” and then you change to ::foo::macro1 or something. Seems par for the course with things like T: 'a annotations – but it is something we generally try to avoid =)

1 Like

(Note that I am talking about the Macros 2.0 future here – #[macro_rules] stuff is still expanded in a weird “pre-name-resolution” pass, as far as I know, that doesn’t really respect lexical name resolution in the usual sense.)