[lang-team-minutes] the module system and inverting the meaning of public


#119

I don’t agree with this intuition. The unit of compilation matters a great deal semantically, because it is the window through which the compiler is able to see the world. This is why the orphan rules exist, for example; so that the compiler’s view of coherence can’t be changed by any information not available at compilation time. This is also why crates are required to form an acyclic graph, whereas the internal module graph of a crate can contain cycles. This is why I can pub(in path), but I can’t pub(in [these three crates]).

Once you’ve made something public - truly public, outside of your crate - all bets are off; every crate that can discover my crate can import this item anywhere. The question you ask about re-export from other crates doesn’t make sense to me - if I can see those other crates, I can also see this crate, their re-exportation doesn’t impact whether or not I can get at this item.

The problem with @kornel’s proposal is that mod would still be used in a frequent but hard to explain niche case, when you have a library and this module’s items are pub but aren’t used elsewhere in the library. This makes a system which seems more complex to me than what we have. However, I think there’s a connection between your endorsement of that idea and our disagreement about that intuition. The reason to base visibility on the declaration on the item, rather than on the use statements, is that the item is declared in this crate, but the use statements could be declared in other crates (hence the need, in some cases, to stand in for those other crates with mod statements).


#120

In my view, just because something is on crates.io doesn’t mean it’s there for anyone to depend on. It may be only for the use of my own crates.

Yeah, looking back I guess my suggestion differs from @kornel’s somewhat. In fact, I confused myself and ended up misstating it slightly. What I really mean :slight_smile: is:

The namespace for regular names and for use paths would be separated. The latter would implicitly include modules based on filename, as well as extern crates.

So: If you’re in lib.rs and you want to make a module foo.rs “public” (i.e. re-export it to your parents, namely other crates), you’d write pub use foo;.

If you don’t want the module to be public, you can use foo; or use foo::bar;, or both. If you use foo::bar, foo.rs is loaded but you don’t get foo in the regular namespace, only bar. So the experience would be very similar to how use works today, in those cases where you can use use. The only names in scope in a given module, regardless of whether it’s a root or not, are those that appear at the ends of use paths; you don’t need to consult the filesystem or Cargo.toml, unlike some other proposals.

And extern crate foo; would become use foo; too. As I said, similar to Python import.

However, I think there’s a connection between your endorsement of that idea and our disagreement about that intuition.

Yep.


#121

Having Cargo detect whether your directory layout is tricky, and having plain rustc invocations default to the existing semantics in the absence of a flag, does alleviate backwards compatibility concerns. And to be frank, it’s much smarter than anything I came up with. I have some doubts about the accuracy of this detection by Cargo but let’s put that aside until we get into the details.

A much bigger concern I have is that this leaves the existing module system 100% in place and furthermore makes the switch implicit. This may be unavoidable for backwards compatibility, but if so, that’s (again) a downside of this change. Specifically, what concerns me is:

  1. There are two sets of incompatible rules, and which ones are used in a given piece of code is not written down explicitly [1] in the source code or in the Cargo.toml. For example,
  • to dive into a crate’s source code and understand its internal structure, I now have to manually apply whatever heuristic Cargo uses (or diff the file system against the mod lines — if all files are mentioned in mod lines somewhere, then I know it’s an old-style project)
  • if it’s not a Cargo-managed project, I have to go find and look at whatever script builds them (cf. https://github.com/rust-lang/rust-roadmap/issues/12).
  • there’s now a big semantic difference to “normal” projects if I build a small project by hand with rustc foo.rs (e.g., for test purposes), unless I always remember to pass this flag (which I won’t). Such projects are usually single-file but not always.
  1. This split will be around forever, probably even in a hypothetical Rust 2.0 as the old style remains 100% workable and suddenly removing it would probably be too big a breaking change for our tastes. This is, at the very least, aesthetically unsatisfying. It also makes the language as a whole, all the finer details that one needs to keep in mind to language-lawyer or comprehend legacy or pathological code, more complex.
  2. Regarding “How do we teach this”, not only will a comparatively huge amount of code (all multi-file crates in existence) and documentation (including the upcoming second edition of The Book) still be using mod and therefore confuse newbies [2], some (possibly many) projects will stick to the explicit mods. I don’t just legacy code dating to from before the change and old projects that never bothered to switch over, as long as backwards compatibility is kept and the change is controversial enough, it will be just a stylistic choice. We’ve had some people in this very thread announcing a desire for just that! Therefore, I fear that many Rustaceans will still have to know, and deal with, the “old rules” depending on what projects they are involved with.

[1] By the way, this should not be taken as implying I want an explicit/opt-in switch. That just has different downsides, which I assume are clear enough, as nobody proposed it yet. [2] I don’t look forward to replacing my standard explanation of why mod is needed and how it differs from use with “oh ignore this it’s legacy cruft” (especially since that won’t help anyone who needs to work with real code that involves mod).


#122

I want to push back on this, for several reasons. First, even the idea that it would be a mess is non-obvious to me. I don’t know if you’ve ever worked with, e.g., Eclipse or NetBeans, but they work exactly this way. You just point the IDE at a source directory and it compiles everything in there. And many makefiles and things in C projects that I’ve used wind up applying rules to $(wildcard foo/*.c), rather than explicitly writing out each .c file by hand.

Second, @petrochenkov is correct as to the implementation challenge; I am not sure that it is even possible to implement it this way, or at least it would be very difficult. Finding a workable definition of our module system is incredibly difficult (thanks to to features like pub use and globs). If we didn’t even have a notion of what paths exist or not that would be extra fun. (I guess we’d have to define more precisely what it means for something to be “used” anyhow…)

But finally, I think this rule isn’t really what I would want anyway for another reason. I often have Rust files that contain no items that need to be “used” from other places. Think of a file full of #[test] functions, for example, or a file that contains only a impl SomeType { /* inherent methods here*/ }. I want those to be compiled in. And, in fact, even when those items might get used elsewhere, I often start by writing some (dead) code and getting it to compile first, and then use it from elsewhere.


#123

I agree it doesn’t change the calculus, but it may be worth “rolling in” to the proposal somewhat. I was thinking a useful thing might be to try and convert some existing code into this methodology to get an idea what it will feel like. I might take a stab at it with e.g. Rayon. If we find that it seems overly verbose, then having a shorthand may help.

For these purposes, I think it’s critical that some “type-dependent” members (fields, methods) probably can just be declared pub without the need for further refinement (which would make them as “pub” as the type, basically).


#124

I wrote more on this in another thread: Use-cases for modules not implicitly included in the build

This is how modules work in every language except Go. IDEs that auto-generate use statements may change this, but I haven’t used such IDEs (when I use Java I add use statements manually). I used Xcode, where adding to the project and adding to the build (target) are orthogonal.

Makefiles with wildcards cause problems when there are merge files left by git (foo_BASE_X.c), so such implicit behavior is a problem and it is usually avoided by specifying included files more carefully.

For tests there’s a tests directory. For other “unused” files with side effects there’s mod and use to include them.


#125

These are valid concerns.

On the first point:

I think the rules cargo uses should be very simple. When you say “dive into the source code and understand its internal structure,” I think it should be visible from a combination of looking at the Cargo.toml for named binaries and glancing at the tree of the src directory. I’m imagining very simple rules (remember that the vast majority of projects contain only 1 crate & therefore aren’t at issue).

As to the difference between cargo and rustc, there are two aspects of this. The first is that this is a trade off: we could instead have the flag turn off implicit modules, but that would create a much greater “churn shock” than having the flag turn it on, because anyone using a non-cargo build system will have to fix this if it doesn’t work for them. Not sure which side of this is preferable.

The second is that it is already the case, and it increasingly will be, that compiling a project by hand that is normally managed by cargo is quite challenging. Resolving the dependencies and their transitive dependencies, managing features, and many extensions in the pipeline (such as automatic features / conditional dependencies) rely on cargo to do the work. The cargo layer is only going to grow over time.

On the second and third:

I don’t agree that we wouldn’t remove the old system if we performed a breaking change. I don’t see it as having any special challenges that don’t apply to other things we would be removing in some 2.0 event (such as macro_rules).

I agree that the social aspect of changing this (teaching it, dealing with old documentation being on the internet, most of all convincing users to make the shift) is the most challenging aspect of the change. I would even say that the less constructive responses this proposal has garnered are a demonstration of that challenge.


#126

While walking earlier this evening I had the idea that instead of the internal attribute, facading could be handled in a more direct manner. We could have a hoist attribute, which takes a path as an argument, and for the purpose of other modules’ view of the world puts the item(s) at that path instead of the path they’re defined at. pub use would then only be for ‘convenience re-exports’ like preludes or putting the major types at the top level, rather than for facading.

I’m not sure if this should be an attribute on individual items (hoisting them to the path) or on modules (hoisting all the items in that module to the path).

To give an example:

mod foo {
    mod bar {
        #[hoist(foo)]
        pub FooBar;
    }
}

use foo::FooBar;

The first advantage of this is that you now even know where its been facaded to when you look at a module, everything is inside the module.

The second advantage is that it supports a style in which mod.rs files stop being necessary. Instead every type in a module is defined in its own submodule, and possibly hoisted to its parent. This alleviates the use self:: situation somewhat; if you have no mod.rs files you’ll never need to do that.

However, I see a very big downside to this: if I hoist a type into a module, do I have to use it to make it visible there? Basically my position is that you shouldn’t use hoisting this way - that is, never actually use the name in the module you’ve hoisted it to. If you do that, that’s basically a ‘come from’ of file organization. Maybe it should be that a hoisted name isn’t visible in the module its hoisted to & can’t even be imported into it.


#127

Regarding #![internal]. It seems very similar to go 1.4 's feature to have internal packages. If I understood its proposed behaviour correctly, I doubt that it alone will solve the “I have to check every mod decl to find out whether an item is public” issue, as it seems to apply transitively, so when you declare a module as #![internal], all its sub modules will be too. In Go its solved by internal being part of the file path, so you will have to encounter it. To solve it in Rust, you could require those sub modules to bear an #![internal] attribute as well, making rustc error when a sub module of an internal module doesn’t contain #![internal]. Alternatively you could require modules to have either #!sub_internal] or #![internal], to make it possible to isolate or not isolate submodules from their internal module parents. I mean it would be quite limiting if you had to isolate all submodules of internal modules from their non-direct parents, so you’d need #![sub_internal] that has no actual change, other than indicating to readers that the current module is internal.

With #![internal] present in the proposal, the discussion about publicity becomes mostly one about defaults.

I’d like to propose a more conservative approach to the problem that works without implicit modules. What about adding just the #![sub_internal] attribute, without any other modifications to the module system (maybe it can be called #![internal] instead, but with same functionality). And to add a (default-allow at the start, default-error or always-error in Rust 2.0) lint to require it for all modules that are not pub and their submodules. And to make it recommended style to turn that lint into warn/error on a crate level, and to make clippy warn about its missing use.

Regarding #[hoist(...)]. Its more verbose to write than pub use, especially on keyboards with layouts where [] and () are behind modifiers. Other than that I’ll have to think about it.


#128

I know i’m late to the party but the module system keeps frustrating me to no end. With the rush to register ever more generic crate names like plugin, router and co, it becomes harder and harder to tell local modules from externed crates apart. I had to rename local modules and alias crates to resolve conflicts many times.

I started to resort to using super::foo in my code as a convention to refer to my own modules vs crates so that I know in a module if this is supposed to be a crate a local mod.

So while there might be a lot of things that people want to improve, the one that I think is more important than anything else is to just get crates and modules moved to different spaces. I’m happy to prefix all my imports with self or a crate name if that is necessary.


#129

I really liked it when pub only meant that you could refer to the thing through the containing module. I already don’t understand the current rules and they only seem to get trickier. :confused:


#130

You make a good point but there are a few points I want to clarify because they tip the balance a bit.

I think if we did this, we would make it a lint to have a pub item in an internal module that isn’t exported anywhere. So if you see pub, it would still mean pub, but not necessarily at this path. The fact that this is transitive does make it so you can’t be certain this path is valid without checking all of the parent modules. (This is an advantage of the explicit hoisting.) But whereas today, pub is used even when you really want pub(super), I think under such a proposal deeply nested in internal attributes would be uncommon, and so you would tend to assume that path is valid, and it would usually work.

More importantly, internal only applies to third party crates. Most of the time when I’m dealing with a separate crate, I don’t navigate the source, I read rustdoc. Rustdoc, of course, always tells me a publicly usable path. Where this ‘know the path is valid’ really matters is within a single crate; if I’m working in module X, I can go into module Y and look at the items and know immediately if I can use them in module X. For my own crates, rustdoc is not a great way to find this out, both because I have to regenerate it regularly & because it doesn’t by default document the internal APIs of my crate (whereas the source includes everything and is always available).

This “foreign : rustdoc / local : source” distinction is important because internal doesn’t apply to the same crate - within my crate I can access everything at its true path. So when I say I can look at the source of something and see whether I can use it, I’m mainly speaking within the context of a single crate.

Today its very common to break a single project into many crates, but as incremental compilation becomes more usable I expect that to become less and less common because of its implications for coherence. I also think people are less likely to do fancy facades for the crates that are within a project & not exposed outside of it, so internal won’t be a common factor in those cases either.


In terms of the syntax for hoisting being less convenient etc, if we wanted to make this fully first class (though this seems so radical even I am suspicious of it) we could imagine something like this:

mod foo {
    mod bar {
        pub(crate) @ foo struct FooBar;
    }
}

use foo::FooBar;

I’m going to convert a library (I think futures) to this system - using hoisting with this syntax instead of internal, to see exactly how even the most radical idea changes things.


#131

I see, that would work. Although it would forbid people to use pub to not have to write the long pub(crate) or pub(super) they would otherwise have to write. I’d be a big user of #![internal], I’d add it to all the modules that right now I don’t declare pub, just to make sure they are not exposed, and to make it sure for readers (including future self) that the crate is not exposed. Inside of them, I’d have to write the lengthy pub(crate) for every single item, I’d be literally punished to not expose something in the public API. I think there should be sugar for pub(crate), like cub or something else which is very short (around three characters). That could solve it, and at least make the new system not hugely more inconvenient for me and most other users to use.

To provide you with a personal story, my reasons to split up crates never were around compile times. I have a crate (lewton) whose 90% of uses is together with another crate (ogg) to decode ogg/vorbis files, still I have them separate in order for other things that may be included in the ogg container. I in fact wrote a crate to read ogg metadata from it, which is even used in servo, while lewton, which came from the same project as ogg, is not used (yet xD). And I’m considering splitting up lewton even more, e.g. to put the IMDCT decoder into a separate crate. This would help me writing an opus decoder.

There is this great feature to small crates that you can reuse the code everywhere and in multiple different places.

Another example, minetest. Its an open source minecraft clone written in C++. It has a map like minecraft, and it stores that map in different kinds of databases. Now there are big servers with big maps, and some of them want to provide a top down map to their users. For this use case, mapper tools were designed (some in python, but the most popular one atm is in C++) that get the map database as input and which output the finished map. They had to re-implement parsing of the data format, including the bindings to the database libraries, as well as the deserialisation of the actual chunks (minetest calls them mapblocks). If minetest had been a Rust project, you’d just simply could have put the map format and database interfacing into one crate and the rest of the game into other crates, and published it all on crates.io and the mappers simply could have used the crate and would have gotten the same code that minetest ran on. Of course, you can do this in C++ as well, but again, all the issues with crates downloading.

So I don’t think crates will get much bigger, and I don’t hope it.

I don’t see though how crates getting bigger will mean that you want to expose some module to the entire crate (or world) and not facade it. If it will affect motivations to facade, it will be the reverse.

I think I’m not a big fan of hoisting. I certainly wouldn’t want it to happen. It would make renaming a module very hard and require you to repeat the module name multiple times. And if you really want to find out where an item is reexported, just rg for the name of the item, or change the name and see where it errors. Of course neither is watertight though due to wildcard reexports. Also, looking at workflows, right now you can look at rustdoc output to get a list of items where you can copy each name into a (pub) use list, they are all nicely condensed, if you want to hoist a couple of items, you’d have to scroll through their implementations to set them.


#132

For comparison, wildcards are not supported in Meson and Ninja and supported with limitations but discouraged in CMake.

The paragraph under the link applied to Rust would mean that if we add a new file in the directory, we need to rerun name resolution. So a build system need to scan filesystem on each build even if there are no existing files with fresh timestamps.


#133

Current systems makes it incredibly easy to add new Rust files in IntelliJ: you type mod foo;, IDE highlights it because the file is missing, you type alt+Enter and foo.rs is created and focused (as a bonus, a parent module is converted from parent.rs to parent/mod.rs if needed. I think this should be relatively easy to add to Emacs as well :wink:). No need to switch between opened files and a project tree at all, that’s even more convenient than, say, adding a new Java class.

PS: I have no opinion about the general issue, it’s just a fact about the current system which has not been mentioned previously.


#134

Nice. It is possible that IDEs can significantly change the calculus here.

(FWIW, just to defend emacs honor and – perhaps – lighten the mood a bit, the emacs mode has M-x rust-promote-module-into-dir, which transforms foo.rs into foo/mod.rs and which I use all the time. But I confess the IntelliJ treatment is better. :smile:)


#135

It shouldn’t be too hard to add rust-implement-module (strawman name) which discovers the mod xxx; under the cursor, creates xxx.rs if it doesn’t exist, and then switches to it. I imagine that this wouldn’t be a difficult thing to add to almost any scriptable editor’s rust-mode equivalent as well.


#136

I would too, which worries me a lot, because that sounds like a boilerplate needed in basically every file. The default should probably be flipped around, and require an annotation to opt-out of internal.


#137

There are two sensible designs IMO to improve the module system:

  1. Completely unify the physical and logical layers as this tries to do.
  2. Decouple the mechanisms of the two layers entirely. Instead, rely on conventions and the 20-80 rule in idiomatic code.

Rust already has advanced cases where the two layers aren’t coupled and from the comments above I see that trying to unify causes friction for these advanced use-cases and complicates the syntax considerably. It requires to somehow narrow down and lint/deprecate non-unified use-cases which goes counter to backwards compatibility and rust will still need to maintain two systems.

I think that instead we can simplify the design by decoupling the two organization layers and instead rely on conventions to have 1 to 1 correspondence between modules and files in the general case (i.e. 80-20 rule).

@nikomatsakis: Java IDEs maintain a project file that lists all the source files AFAIK (Eclipse does that) and do not scan the FS recursively each time. The same info should be added to cargo.toml and passed explicitly to rustc.

C# has a similar compilation model to Rust and it proves the perception of C++ namespaces is incorrect since it has exactly the same design but without the problems C++ has around it which are more to do with the redundancy with the old C way (#include files) and the lack of idiomatic style that teaches how to use namespaces, the C++ stdlib being the worst offender of not leading by example.


#138

Just FYI, you can do this:

#[path="bar.rs"]
mod foo;

So the file structure and the module structure can be as separate as you want.