Revisiting Rust’s modules, part 2

So, on the overall proposal: this new direction took me a bit by surprise, but my initial reaction is that it’s brilliant and daring. After reading a lot of the prior discussion, I think I have come to agree that one of the crucial sources of confusion in today’s module system is the distinction between absolute and relative paths. I think you are right to take square aim at that distinction – and I’m glad that checkpoints gives us the ability to do so.

I often think about something @wycats once told me about how he teaches today’s module system. Basically, he said something like: he teaches that paths are used to name things in the file, and use is used to bring things from other files into scope. This is not 100% right, but it’s close.

I like that this proposal makes a similar clear distinction about inside and outside the crate. In short, in this proposal, paths are about naming things from inside the crate, and from is about bringing in things from other crates. We no longer have to teach about “paths in use vs other paths” since everything is relative to the current file. That feels clear to me.

What I don’t yet know is how it will feel in practice, I think. For example, here is something that often happens to me:

  • I write in some fn std::cmp::min(...). This gets a compilation error.
  • I get a bit confused, then remember, and write ::std::cmp::min(..). This is kind of ugly.
  • I give up and write a use std::cmp and then cmp::min.

This seems to highlight some of the good and bad of the proposal in practice. For example, I can no longer write an “inline” reference to another crate anyway. I’m also going to be using a lot more absolute paths, which I still find kind of an eye-sore, but maybe I’ll get used to it – and I suspect that they will look better all collected together in a use. This is probably something we can fix by turning some knobs, anyway.

OK, I got to run, that’s all I have time for for now. =)

8 Likes

Sure.

You work on a feature in a branch A. You build, some files get auto-generated, everything is great, tests pass, you commit. You switch to branch B to work on another feature. But that branch is a bit older and did not yet have one file that was auto-generated, so the building system doesn't know about it, and it leaves it out. You try to build and you get a compile error. If you're using a mono-repo with many languages, you get it somewhere that might not even be an area of your expertise in a language that you don't know about.

Sure, a proper building system hygiene would prevent that, but in practice, it's "oh it's just how things are with language-X". Reworking company building system is practically impossible, so people learn to live with that and be constantly annoyed by the language that they don't even use. On top if it: C and C++ , with their archaic building system don't have that issue.

This issue is very nasty, because it hits unaware suspects, in random (to them) places, as opposed to the opposite problem: when someone forgets to reference a file that should have been used. At least then one fixes their own mistake.

That is a very concrete concern. I can only respond with two possible fixes: #![cfg(test)] at the top tests.rs itself, or having #[test] work on use test:

#[test] use test;

Personally, I find the whole #[cfg(test)] use tests; a weird boilerplate in itself, that could use some attention in the current usability push. Especially in combination with benches, which I typically implement as a feature flag, so everything gets even weirder.

But other than tests, is there really many places where someone puts a code in the source file and does not reference it at all by any use statement from rest of the code?

It seems to me we get annoyed by the same thing, just with different previous experiences. :slight_smile:

1 Like

To be clear the first would not work, the use test; import would fail because the cfg says there is no test module.

I think this also hints at implementation issues with "we only bring a file in if you use something from it" - we would have to intertwine parsing and macro expansion with name resolution. I suspect this runs into very similar issues to what Patrick Walton talked about about having imports fall back to parent modules.

Yes, if the file's "public API" is a method in an impl block, defined on another type. Because impl blocks don't have to be in the same file as the type they are for, this is very possible (the coherence module in chalk is actually an example of this).

That is, it actually looks like basically like this:

use ir::Program;

impl Program {
     pub(crate) fn check_coherence(&mut self) -> Result<()> { ... }
}

First, I agree with the deprecation of "extern crate" in favor of discovering the crates from Cargo.toml.

I didn't see anything in the proposals to address the biggest problem I've encountered with the module system: If some module a does impl b::T for S, and I want to call a method on a::S declared in b::T, I have to use b::T, even if I never actually reference the name b::T. I find this to be a much more significant ergonomic issue than what's been discussed so far, because it isn't just a stumbling block for learning Rust, but rather a persistent problem. (See Inconvenience of using functions defined in traits, an ugly workaround, and proposed solution for previous discussion of this issue.)

I agree with this. Everything should continue to be private by default. Controlling what gets exported from a module is a big part of ensuring that the module's API is safe and maintainable. One of the main reasons for splitting a crate into multiple submodules is to minimize the coupling between the different parts of the module, largely by taking advantage of the current private-by-default design.

I agree with this sentiment. At a minimum, I think the proposed mechanism should take into consideration the Cargo.toml include = [ ... ] and exclude = [ ... ] so that only files included in the crate are actually considered. I think this would address most of the concerns regarding problems caused by “implicitness” since those same kinds of problems are similar to the concerns of accidentally including files in a crate's published package.

I find browsing Rust projects to be annoying due to the large number of subdirectories under src/ and the large number of files named "mod.rs". I also think the current situation where we have to move x.rs to x/mod.rs when we add submodule x::y in x/y.rs to be unfortunate; I think we should be able to keep x.rs as-is while adding x/y.rs. I use #[path] on mod declarations to work around this now; if/when we get rid of mod then we should fix fix this annoyance since there would be no place to put #[path] in that case.

I don't think pub use is a great replacement for mod. pub use isn't intuitive at all as it does two things: it brings the item into the current scope and exports it from the current scope. IMO, even if we had implicitly discovered submodules, pub mod would be clearer than pub use. However, I think declaring submodules with mod is not a large burden and neither is it confusing, so I think retaining mandatory mod declarations is fine; the declarations enhance readability, IMO.

So, basically, I think getting rid of extern crate seems like a clear win, whereas I would be happy to skip the rest of the proposed changes, especially if that would free up resources for fixing other issues that persist after the initial learning curve.

7 Likes

That's a bad characterisation IMO. Of course, the compiler can figure it out without you. That's not the point, its to help you to read the code. How do you know where some included thing comes from? Its hard to tell looking at the source code alone, you need to inspect both Cargo.toml and the file system to find this out which is quite time consuming. If its written out its obvious.

6 Likes

Exactly like that? Paths starting with :: are followed by a crate name, and then a path inside that crate.

However, I forgot that inside a module (not at the crate root) there are actually three "roots" to start at: the current module, the current crate, and all crates. I don't have a great idea for the syntax to distinguish these three, unfortunately... Maybe (based on what you said) the current crate is ::crate::...? So

foo // resolve `foo` in current module, taking all the in-scope `use` into account
::crate_name::foo // resolve `foo` in crate `crate_name` (could be the current crate, if that's the name that is given)
::crate::foo // crate is a keyword, always refers to the current crate.

I am not saying this is great concrete syntax; I am just proposing the general abstract idea here. The more I think about it the stronger I dislike making references to other crates not possible in the path syntax, but only in a magic form of use.

Doing ::the_crates_actual_name should also work, which would even remove the need for the $crate hack in macros. In fact I think seeing whether this hack is still necessary is a nice test for the "path system" (or whatever you want to call it... this is not the entire module system, just the part of it that is concerned with naming things). No such hack should be necessary. (Well better macro hygiene would also make this unnecessary, but an item really should have a path that works everywhere, no matter whether we are inside the crate that defined it or not.)

I don't personally find that confusing (personally, I found the requirement to write extern crate foo; before I could use foo::name; more confusing when I was first learning Rust), but I acknowledge that there's a set of people who do find that confusing. I'd also note that there's a large set of people quoted in the link you posted who find the distinction confusing, so I think there's plenty of confusion to go around in several directions. :slight_smile:

But I don't feel like introducing from addresses that problem; from just feels like extern crate by another name, and not nearly worth using an epoch for.

I do, FWIW, agree entirely with @nikomatsakis that the most valuable thing this proposal does is to fix issues with relative/absolute path confusion. That seems well worth it. Could we do that without introducing from?

1 Like

I'd argue that its not important for you to know if inspecting baz.rs whether one of its super modules is an inline module or not. It is more important IMO to know if you have a module which of its submodules are inline modules. This would be covered by inline mod as well.

I don't understand how having mod statements prevents this. You've described a situation in which the build system automatically generated code, but then failed to when you switched to a branch. If mod statements are a thing, surely the auto-generator is also inserting those statements, and would still fail to insert those statements and generate this module.

I understand that you have had a bad experience along these lines with other languages' build systems, but it doesn't seem like removing mod statements has this implication to me.

Right, but this is what we started with and we found use ::crate::errors::*; to be pretty burdensome. Are you suggesting that use :: would start from inside this crate, but :: outside of use would start from the multi-crate root?

1 Like

Right, that's why I said I am not happy about the syntax. :wink:

No, paths should have the same meaning in use and elsewhere. I must be unclear somewhere because I cannot see how this question arises from my proposal:

foo // resolve `foo` in current module, taking all the in-scope `use` into account
::crate_name::foo // resolve `foo` in crate `crate_name` (could be the current crate, if that's the name that is given)
::crate::foo // crate is a keyword, always refers to the current crate.

This applies everywhere paths arise. So :: always moves to the global crate root.

The syntax is of course horrible. I can come up with other horrible syntax that maybe is slightly more ergonomic :wink:

foo // resolve `foo` in current module, taking all the in-scope `use` into account
::foo // Resolve `foo` starting at the current crate's root
@crate_name::foo // resolve `foo` in crate `crate_name` (could be the current crate, if that's the name that is given)

So, :: moves you to the current crate root and @ (or whatever other sigil) moves you all the way to the root of all crates.

1 Like

Could you point to something concrete for people finding the distinction confusing? From what I gathered, what is confusing is the distinction between mod and extern crate. But that is very different from the question @aturon has been asking. So far I have no yet seen anyone who would expect all external crates and all items in their crate root to be overlapping in the namespace. There also is no precedence for this in any other language (that I know of).

Yes, it is possible, and it will happen, but I don't think that's common. Proving or disproving how frequent/rare it is, is going to be hard.

Anyway, users that do use that pattern have already plenty of boilerplate removed with this proposal, and unfortunately, in this particular case, they would have to remember about referencing the file with use of some kind. The follow-up question is: what are the consequences of forgetting about it, and how hard will it be to fix it. As far as I can tell, it should be very easy to detect and fix, and there can even be lints/warnings helping with it.

I'd like to ask everyone to think in terms of finding a compromise: which things are deal-breakers for us, and which are minor annoyances, that we will learn to love after a while. We can't all get all the stuff we care about, and there are many valid viewpoints on what matters ranging from extreme verbosity and explicitness, to complete brevity and having everything deduced from somewhere. It's better to find a deal that improves as many things for everyone as we can, without making their use cases more painful, as opposed to give up on any improvements.

Which would be flagged by the "unused use" lint. Effectively, that lint would have to be removed. It's a very useful lint though, so that's quite an argument against any such proposal.

Yea, the problem we had was that we found it onerous to write:

use crate::errors::*;
use crate::ir::*;
use crate::solve::{TyInferenceVariable, LifetimeInferenceVariable};
use crate::solve::solver::Solver;

Basically, despite the confusion, it is true that the main thing you import is names from other modules in the crate, and having to write crate:: for all of them can be unpleasant.

We'd like a syntax where all of these things are true:

  • Paths in use statements are the same as local paths
  • You can easily distinguish symbols from other crates from symbols from this crate
  • Paths to other modules in the crate are easy to write when using use statements

Possibly there are other constraints I don't recall.

1 Like

So with my "extern crate sigil", this would become

use ::errors::*;
use ::ir::*;
use ::solve::{TyInferenceVariable, LifetimeInferenceVariable};
use ::solve::solver::Solver;

IOW, exactly the same as in the proposal of this thread. Furthermore, for imports from std

use @std::cell::Cell;

These are usable as paths, so you can write @std::cmp::min somewhere in the middle of a function without importing anything. And you have a way of importing another crate's root, which from ... use lacks:

use @std;
// Now we can write std::cmp::min. Not sure if that's worth it though...

So, this satisfies all your constraints?

2 Likes

You don't have have to remove that lint. If there are no symbols in a module, why complain that nothing is used? If there is a symbol: why wouldn't it be there if it's not referenced anywhere in the first place?

The blog post says:

This probably isn't a real benefit. Consider the case where you're importing foo from whatever. When you type use f then the IDE can often autocomplete the rest, "oo from whatever;" however, if you type instead from w then the IDE can only autocomplete from whatever import , which isn't as helpful.

As an analogy, imagine if Google Maps made you type the state, then the ZIP code, then the city, before you typed the address or business name during a search. It would suck. Thus, in general, we want the most specific name to come first.

Also, an IDE user will generally not manually type any part of a use at all; the IDE can infer them automatically or semi-automatically while one writes code. For example, Gogland does this in a great way when I'm writing Go code. I often don't even bother knowing which specific submodule something is in when I write Go code in Gogland. I think in not too much time, all Rust IDEs will get to that point too.

1 Like

I see, I think it does! Worth considering as we iterate forward, its not really self-explanatory but possibly there's some intersection between this and the from syntax.

Hm I guess one could make that lint smart enough to actually detect uses of functions in the impl block? Indeed, why not.

1 Like

Right, the syntax is up for debate. Maybe extern:: rather than @?

use extern::std;
// Now I can use std::cmp::min
use extern::std::cmp::min; 
// Now I can use min
// We can also not use `use`
fn foo(...) {
   extern::std::cmp::min(x, y)
}

Without syntax highlighting, this looks like a module called extern, but with syntax highlighting it works quite nicely I think. Hm. It could be ::extern:: but that's getting pretty long.

4 Likes