The Great Module Adventure Continues

rpjohnst · February 2, 2018, 5:44pm

The framing of the problem for a lot of the original discussion was around "path confusion," the idea that absolute paths (i.e. use, pub(in), or with variant 3 ::) overlap with in-scope paths in the root module, leading beginners to build incorrect mental models that aren't broken until they go to add a new source file and get confused.

The last line of your example kind of undermines its ability to solve this problem- in the root module, top-level items and dependencies are still in-scope for undeprecated use, at least without further rules around what's deprecated.

Another issue is that there was some level of support for use paths staying absolute by default, both for backwards compatibility and avoiding churn (use std::time, use regex, etc don't have to change at all with the RFC/variant 2), and because a lot of uses cross the module tree, at least in some codebases.

Variant 2 helps here because it makes it immediately clear that use paths are absolute- you can't use something out of a top-level item without prefixing it with crate, because your crate is just one among many.

In that light, I think variant 2 arguably has less breakage already. Variants 1 and 3, as well as your suggestion, all force every path to change. Variant 2 only forces internal paths to change, allowing external paths (which IIRC actually turned out to be the majority based on some numbers gathered in the RFC thread) to remain undeprecated.

(And to avoid retreading some ground from the RFC thread: we really can't fix this problem without forcing some set of paths to change, based on the "path confusion" framing.)

Rantanen · February 4, 2018, 7:56pm

A thing I haven’t seen discussed yet is referring to the local or extern crate with the same path. This is a use case that crops up in proc macros, where you might need to emit tokens, such as “::my_crate::SomeTrait”, and you still want to use that proc macro inside my_crate.

Currently I’ve been working around this by having mod my_crate { pub use ::* } inside my_crate, which allows ::my_crate::MyTrait to work either through the actual my_crate crate or the dummy my_crate module.

If the idiomatic way in the future differentiates between these two scenarios, then whatever the syntax for referring to extern::my_crate should also be able to refer to the local crate.

I’m expecting this to be just an implementation detail. I’m suspecting there are no real technical issues that would prevent considering the current crate as an “extern” crate as well when it comes to resolving crate-specific use statements/paths.

rpjohnst · February 5, 2018, 2:10am

Shouldn’t hygiene solve this, so you can just give the token the correct span to have it always resolve as if it were in my_crate?

Rantanen · February 5, 2018, 8:21am

Now that you mention; It might.

Truth is, I’m a bit lost when it comes to span sites and proc macros. Mainly because the concept of def_site seems mostly useless for them: The literal def_site is a proc_macro crate, which wouldn’t export anything anyway, so I’ve always considered call_site to be the only sane option for spans that come from proc macros.

I would love to learn more about the hygiene in proc macros, but this topic isn’t really the place to discuss it. I’m happy if the hygiene solves the above scenario and in that case there shouldn’t be any need to consider that scenario when discussing the module syntax.

nikomatsakis · February 5, 2018, 5:40pm

I’ve been sitting on this for a while, and I have to say that “Variant 3” (as proposed by @matklad) is growing on me. This is basically changing two things from today:

You write use ::std::cmp::min instead of use std::cmp::min
You write ::crate::foo to select something from the root of your local crate instead of ::foo
- A key question is whether we can continue (at least in most cases) to support ::foo via a fallback mechanism
  - @petrochenkov suggested this may be possible

Like Variant 1, this meets all the criteria in terms of distinguishing clearly where things are from, allowing paths-in-a-use to be a strict subset of paths-everywhere-else, etc.

It also has the benefit that one can write absolute paths calling out to the standard library, like ::std::fmt::Debug, and they are not unreasonable (no worse than today, basically).

(That said, I do find the leading :: a bit hard on the eyes, I have to say. Maybe that just takes time?)

mark-i-m · February 5, 2018, 6:29pm

I don't really like the leading :: either... My favor is still with Variant 1 -- mostly because I like separating local and external imports. I already do this with blank lines, but having the syntactic help is even better

nikomatsakis · February 5, 2018, 6:34pm

I made this branch of the grep-test repo to illustrate this third style.

matklad · February 5, 2018, 6:58pm

Could you elaborate a bit what exactly is you prefer in V1 to V3? At least to me, they seem isomorphic syntactically, and different only in the “how do we get there”.

To be more concrete, here’s how import separation looks in V1:

use crate::file_read::for_each_line;

use extern::{
    regex::Regex,
    std::{
        env,
        process,
        io::{self, Write}
    },
};

And here is the same example with V3:

use ::crate::file_read::for_each_line;

use ::{
    regex::Regex,
    std::{
        env,
        process,
        io::{self, Write}
    },
};

They don’t see that different from each other…

rpjohnst · February 5, 2018, 7:05pm

If I understand correctly, Variant 2 is basically the same as Variant 3 but where the leading :: is optional in use context?

I also find the leading :: to be quite ugly and redundant. It seems the main reason for it is to make those paths usable in non-use contexts, i.e. “strict subset”? I’m not sure why that’s desirable, given that the other way around doesn’t work, and paths don’t usually move from use to expression context anyway. (Though intuition does, which is the point I guess?)

It kind of feels like trying to simulate the more-traditional “shadowing” style where things from outer modules remain in-scope in inner modules, which IIRC turned out to be really messy to implement in Rust’s case (and is also probably not desirable for discoverability reasons).

Either way, I suspect people coming from other languages already have a fairly good intuition around the existence of a difference between use and expression paths. That’s the whole reason use (and its counterparts) exists, for one thing. Further, the confusion from being able to write dep::foo in the root module but not elsewhere is already solved without the forced leading ::. We can also teach the rule "use for importing full paths, :: for full paths in expressions" a little more clearly given just crate::.

The remaining papercut seems to be “I want to write a full path in this expression; maybe I’ll use it later and maybe I won’t; let’s write use the same syntax as use… oh that doesn’t work.” Given that it should no longer confuse people about how things work, I’m not convinced removing that is worth the costs of 1) forcing every use path to change and 2) forcing a redundant leading ::, with or without nested imports.

mark-i-m · February 5, 2018, 7:08pm

Oh, hmm… you are right. I should have thought more carefully about that. I guess it really comes down to aesthetics for me…

nikomatsakis · February 5, 2018, 7:09pm

Maybe so, but I'm not so sure. Perhaps I'm reading too much into this, but I personally do not have a strong intuition -- that is, I regularly copy and paste a path and then get momentarily confused by the resulting ~~years~~ errors. I also think that it is pretty common for people to get confused as to why std::foo works from the root module but not other modules.

That said, if we at least make it so that one must write ::std::foo also in the root module, that should also help. At least it would be consistent everywhere.

I'd be curious to do a survey -- are there other languages where taking the literal characters from a use/import and putting them into the body fails to work (at least most of the time)? e.g., in Java, you write import java.util.ArrayList, but a full path also works in the body. I honestly forget how e.g. Python and Ruby work, been a while.

lqd · February 5, 2018, 7:10pm

Great point! For example, how would you feel about having the colon after the crate ?

use crate:file_read::for_each_line;
use std:collections::HashMap;

use regex:Regex,
    std:{
        env,
        process,
        io::{self, Write}
    };

Or, the lack of a semi-colon could, as in your examples show I believe, mean local to the current crate.

nikomatsakis · February 5, 2018, 7:12pm

FWIW, I don't hate this; it's been growing on me too. I do find the : after the crate a bit confusing, and I suspect it is ambiguous with type ascription. Not sure about the leading : -- maybe that's kind of nice, since what comes after is a crate, not a module. I like that it lets us avoid fallback, and that it kind of "hearkens to" our existing syntax.

(Is it ambiguous? I don't know. =)

matklad · February 5, 2018, 7:24pm

I have pretty much the same experience. However, the poet I trip over most of the time is not std (because I always use ::std, even in the main module), but self::. I often forget to add self:: in use paths, and add an unnecessary self in expression paths...

nikomatsakis · February 5, 2018, 7:42pm

Yes, I do this too.

rpjohnst · February 5, 2018, 7:49pm

I like this idea a lot, and I'm not sure it's been discussed much. More often I've seen the suggestion to make std::foo work elsewhere by adding use std to the prelude, but since we're already deprecating extern crate this seems just as doable.

It works in C++, Java, and C# because they use the shadowing style we've rejected, allowing unprefixed full paths in expressions; on the other hand Swift, Go, Javascript, Python, Ruby, and PHP seem to have no way to access the contents of a module without importing it first. Arguably C++ fits in the second category too because #includes are such a different beast from using.

I'm not sure if there even is a widely-used language where you have to prefix absolute import paths- even languages that allow relative paths still either make absolute the default (Python) or fall back on it (PHP). The common case is just overwhelmingly for pulling in dependencies, so in that sense I feel like use ::std would be a far bigger papercut than "can't copy a use path into an expression," especially given how many languages don't support that anyway.

This is interesting. You copy and paste a path from a use into an expression, or from an expression in another module? I'm not sure how you'd wind up doing the former, and I'm not sure how any of this would help the latter- could you clarify? self also feels like something this won't really help, regardless of variant- it's just fundamentally {not ,}needed in {expressions,use}, given our lack of module-level shadowing, no?

nikomatsakis · February 5, 2018, 7:52pm

It's not so much that I literally copy-and-paste, as that I just type the same things in both places without thinking about it, and only realize the mistake when the compiler gives me an error. And usually I stare at the error for a second "what do you mean std::cmp::min is not found?" before I realize what is going on. (For some reason, for me, it's almost always std::cmp::min that I want to call without importing...though std::fmt::Debug is high on the list too.)

nikomatsakis · February 5, 2018, 7:53pm

Huh. For some reason I thought it was an obvious fallout from the original RFC, but I'm not sure, maybe it was never discussed explicitly.

rpjohnst · February 5, 2018, 8:07pm

Perhaps, given the particular paths that people tend to write without useing them, a better solution would be to tweak the prelude as part of the new epoch? Either adding std (potentially increasing confusion around the root module) or Debug/cmp/etc. (I feel like I do this with std occasionally as well.) Debug in particular is probably extra-confusing because of its unqualified use in #[derive].

Or maybe that’s just opening up too deep a rabbit hole around exactly which things should be added.

Another option might be to tweak the error message- instead of just "Use of undeclared type or module std" we could add a fallback to check for dependencies and top-level items and suggest one of those? Still a papercut if people keep hitting it, but a) quicker to fix and maybe easier to stop running into that way, especially with some RLS quick-fix support to auto-add uses, and b) IMO less of a papercut than virtually every use breaking overnight- those feel more common than full-paths-in-expressions to me.

aturon · February 5, 2018, 11:56pm

I’m also a fan of single colon (or some other special symbol) following the crate. To me, there’s always been an analogy to “drive volumes” here.

In the past people have expressed worry about the syntactic distinction between : and :: being too subtle visually, but:

Syntax highlighters would almost certainly color the crate name differently
In epoch 2018, all use statements would being with a crate name followed by a single colon.

Other than the potential conflict with type ascription syntax (which is not stable), are there other major downsides? ISTM that this syntax achieves the full set of goals here, and helps reinforce the mental separation between the two parts of the path.

Topic		Replies	Views
My Preferred Module System (a fusion of earlier proposals) language design	10	2462	March 25, 2019
Revisiting Rust’s modules, part 2 language design	118	14289	March 25, 2019
[Pre-RFC] Yet another take on modules language design	3	2929	March 25, 2019
Revisiting modules -- `[other_crate]::path` syntax language design	32	2921	March 25, 2019
Decoupled Module Improvements language design	10	1757	March 25, 2019

The Great Module Adventure Continues

Related topics