[lang-team-minutes] the module system and inverting the meaning of public


#112

@withoutboats

“as public as their most public member”

FWIW, implicit modules aren’t necessarily needed to express this. Some special visibility like pub(transparent)/pub(auto)/pub(inferred) on an explicit module declaration would work too. Looks like this visibility can even be calculated easily given some precautions (e.g. pub use a::*; need to make the containing module pub even if the glob doesn’t actually import anything).


#113

Sure. I believe eliminating the mod-semicolon form has its own advantages.

EDIT: Its a fair point that we can talk about each of these aspects of the proposal independently though.


#114

At least as major as Macros 2.0 if i’m not mistaken.


#115

That’s a shame. In that case implicit modules would break my builds/my workflow, because I do keep test/temp/not-ready-yet .rs files around.


#116

I like the idea that modules can be as public as their most public item, since that makes “= privacy” the default instead of “<= privacy”. The facade pattern is cool, and we should support it, but it’s weird for all modules to be facades by default.

I do like the idea of not requiring a mod foo; declaration to import foo.rs, and relying solely on the use foo::... statements and foo::... symbol paths, since they make the mod foo; redundant.

I do NOT like the idea of making every .rs file in the source tree an implicit auto-imported module whether or not it is ever used or moded. Even if I did like it, I don’t see how it could be backwards compatible.

I do NOT like the idea of replacing extern crate declarations with cargo or Cargo.toml or the --extern flag or some other extralinguistic mechanism. The details of how dependencies get resolved belong to the build system, but declaring those dependencies should stay a part of the core language.

At the moment, I think the biggest problem is that everyone reading this thread seems to have interpreted “the proposal” very differently. I’m honestly not certain which of the points I just listed are actually part of “the proposal” or not. I’m pretty sure I got at least one of them wrong on my initial reading of the post, assuming @withoutboats’s and @nikomatsakis’ interpretations are the correct ones.


I very, very, very strongly like the fact that nobody is arguing for “>= privacy”. That’s the only suggestion that ever worried me, so I’m confident I’ll be happy with the outcome here no matter what it is.


#117

Some more thoughts.

Motivation

Like @nikomatsakis said, the confusion about our module/visibility system extends beyond just new users. While @aturon and I were talking yesterday, he got tripped up about whether an import needed to be prefixed self::. Outside of this discussion, I’ve seen advanced users (including those who have voiced objections to this proposal) getting mixed up by the dual meaning of pub.

What we see with new users is a sharp divergence in how hard it is to acclimate to the basic mental model of our system. Some users find it easy to grasp, some find it very challenging. But I think in its full nuance, our current system is ambiguous and confusing for even very experienced users (including those with strong opinions about whether it is good or not). The most accute example of this is the dual meaning of pub I touched on in a previous post.

Biographically, I began this investigation because of the confusion many new users had about the module system. My goal was not to ‘make Rust easier;’ I was working from hypothesis: the module system doesn’t operate on significant useful complexity (that is, its about organization, not data, behavior, or abstraction). It isn’t doing enough to justify how confusing some users find it.

This isn’t the same as trying to just trying to make Rust easier - its about recognizing a system as more complex than it needs to be (in contrast, something like borrowchecking needs its complexity). I want to make explicit & challenge a certain narrative underlying a lot of the comments which I see as really just a polite form of ‘hackers vs newbs’ framing: the self-identification as ‘UNIX users’, the emphasis on error messages as the solution, the unfavorable comparison made in IRC of this proposal to PHP. I’m drafting this post in vim from an Arch Linux system with a tiling window manager, and I am excited about how this will improve my own experience.

As I investigated what was making the module system so difficult for users, I noticed a lot of other issues I really didn’t like about it. I found it to involve a lot of redundant ceremony that seemed to buy us very little. I found I am frustrated that I can’t find out how visible something is without tracing the modules between it and the crate root, and that the information I need to know about a module isn’t contained within it, but partly in its parent. I noticed that the public-in-private issue is an interminal quagmire.

All of these seem to me like real problems. We can disagree on how we should balance these problems against the problems that changing the system would bring - all language design is about trade offs. But to claim that the system propoesd has no advantages over the current system, or - even more extremely - to claim that the current system has no flaws, is not the discourse of a meaningful technical discussion.

“Junk files”

I’m fairly convinced that being able to rapidly comment out a module is a useful feature, even though I would never leave junk files lying around for any significant period (like, past a git push). While you’re working, it makes sense to leave files in a poorly typed state, go work on something else, and then come back to it. Certainly you might want to compile everything but the incomplete module in the meantime.

There is another really easy way to comment out a module that is already supported: tag the module#![cfg(ignore)] (where ignore is any feature that doesn’t exist). If this is percieved as too much of a ‘hack’ to recommend, we could easily support it as a first class citizen: #![no_compile] or the like. Similar to the #![internal] attribute, I like very much that such a technique moves the important information to the module, instead of putting it at the parent.

(This solution has problems if you don’t even want your module to be parsed, but they’re surmountable.)

Backwards compatibility

Backwards compatibility remains a huge open question for this proposal, and like @rkruppe I think we need to start trying to address it. I don’t want to address any specific cases in this post, but I want to lay out a framework for how it can be done.

Basically, the problem seems to be that while most crates work fine with this proposal (that is, they do not have crate trees in overlapping directories), there are many that do not. I went through about 80 binary crates a few weeks ago, and found that about 20 of them had both a library and a binary under src. About half of those followed a pattern that, in my opinion, we could continue to support, but about half of them simply wouldn’t work with this proposal without restructuring the crates.

(The other common pattern aside from binaries which expose a lib is that of multiple crates in a single directory is a directory full of ‘rust scripts,’ like rustc’s tests. I’m not concerned about the particular shapes, though, just the general principle that some crates can’t be migrated to this.)

The basic principle of how to perform the migration seems like this to me:

  1. This feature is turned on by some argument passed to rustc.
  2. If you only have 1 crate root managed by cargo, cargo passes this argument to turn this feature on.
  3. If you have multiple crate roots managed by cargo, cargo recognizes some directory patterns in which it is safe to turn this on. crates.io’s docs will recommend using these patterns from now on.
  4. If you’re not following this pattern, cargo will issue a warning to encourage you to migrate your code.

I don’t think we should have a flag in the Cargo.toml for controlling whether this is on or not. I think it should be based entirely on whether cargo determines that your directory structure matches what it considers correct for this feature.


Use-cases for modules not implicitly included in the build
#118

This sort of… confuses me semantically. Today, crates and modules are very similar to each other. They both keep some parts of themselves private from their parents while exposing others, and they can both re-export public items from their children. So why does your inquiry go out to the crate root and no further? Why not care equally about re-exports by dependent crates?

Sure, modules are more likely to be ‘internal’, i.e. have an API meant to change freely with the needs of the parent rather than providing any stability guarantee. But there’s no rule that you can’t expose an ‘internal’ API in a crate either; on the contrary, it makes a lot of sense to do so if multiple crates are logically part of the same project and there is some helper code used by all. (This is often when you’d use an = dependency in Cargo.toml, and doing a quick grep of the registry, = dependencies seem to be far from uncommon, appearing in 231 crates out of 6,204 crates that have any dependencies.) The reverse is also possible: modules can be designed to have a self-contained, ‘independent’ API. If nothing else, coherence issues and compilation time can encourage keeping bits of functionality in one crate even if they could be split out into independent crates.

I guess the intuition is that I’m just describing module-like crates and crate-like modules, that crates are meant to play a role as “stability boundaries” even if there are exceptional cases where they don’t. But that conflicts with my stronger intuition that “unit of compilation” is (should be) essentially an implementation detail.

I think my perspective is similar to @kornel’s, because their proposed solution seems most natural to me. Implicit modules shouldn’t be public by default any more than any other item defined in their parent module (i.e. frequently the crate root). “Public” should mean “public to the parent”, and it’s the parent’s job to decide if they want to reexport - regardless of whether we’re in a submodule or in the crate root, in which case the parent is another crate consuming this one. (So I don’t see pub as currently having a “dual meaning”.) This would have the side benefit of allowing implicit modules to be implemented without breaking backwards compatibility, under a scheme where rustc only looked for foo.rs when given use foo. As has been stated, public modules couldn’t be lost because they’d have to appear in a pub use anyway to override the visibility default.

In other words, use would basically work the same as Python import. The meaning would change from “look up this path in the existing module namespace” to “look up this path, and if it can’t be found, try to find it on the filesystem” (edit: that’s not quite what I want actually - see later post). use already treats its path arguments “specially” – as absolute rather than relative – and treating them as potentially not yet known to the compiler is just another kind of specialness. Then, “try to find on the filesystem” is also what extern crate does, broadly speaking, so use could handle crate imports as well – allowing extern crate to be shed from the mental model without the confusion of Cargo.toml/--extern causing things to magically show up in the namespace (as has been discussed in other threads).


#119

I don’t agree with this intuition. The unit of compilation matters a great deal semantically, because it is the window through which the compiler is able to see the world. This is why the orphan rules exist, for example; so that the compiler’s view of coherence can’t be changed by any information not available at compilation time. This is also why crates are required to form an acyclic graph, whereas the internal module graph of a crate can contain cycles. This is why I can pub(in path), but I can’t pub(in [these three crates]).

Once you’ve made something public - truly public, outside of your crate - all bets are off; every crate that can discover my crate can import this item anywhere. The question you ask about re-export from other crates doesn’t make sense to me - if I can see those other crates, I can also see this crate, their re-exportation doesn’t impact whether or not I can get at this item.

The problem with @kornel’s proposal is that mod would still be used in a frequent but hard to explain niche case, when you have a library and this module’s items are pub but aren’t used elsewhere in the library. This makes a system which seems more complex to me than what we have. However, I think there’s a connection between your endorsement of that idea and our disagreement about that intuition. The reason to base visibility on the declaration on the item, rather than on the use statements, is that the item is declared in this crate, but the use statements could be declared in other crates (hence the need, in some cases, to stand in for those other crates with mod statements).


#120

In my view, just because something is on crates.io doesn’t mean it’s there for anyone to depend on. It may be only for the use of my own crates.

Yeah, looking back I guess my suggestion differs from @kornel’s somewhat. In fact, I confused myself and ended up misstating it slightly. What I really mean :slight_smile: is:

The namespace for regular names and for use paths would be separated. The latter would implicitly include modules based on filename, as well as extern crates.

So: If you’re in lib.rs and you want to make a module foo.rs “public” (i.e. re-export it to your parents, namely other crates), you’d write pub use foo;.

If you don’t want the module to be public, you can use foo; or use foo::bar;, or both. If you use foo::bar, foo.rs is loaded but you don’t get foo in the regular namespace, only bar. So the experience would be very similar to how use works today, in those cases where you can use use. The only names in scope in a given module, regardless of whether it’s a root or not, are those that appear at the ends of use paths; you don’t need to consult the filesystem or Cargo.toml, unlike some other proposals.

And extern crate foo; would become use foo; too. As I said, similar to Python import.

However, I think there’s a connection between your endorsement of that idea and our disagreement about that intuition.

Yep.


#121

Having Cargo detect whether your directory layout is tricky, and having plain rustc invocations default to the existing semantics in the absence of a flag, does alleviate backwards compatibility concerns. And to be frank, it’s much smarter than anything I came up with. I have some doubts about the accuracy of this detection by Cargo but let’s put that aside until we get into the details.

A much bigger concern I have is that this leaves the existing module system 100% in place and furthermore makes the switch implicit. This may be unavoidable for backwards compatibility, but if so, that’s (again) a downside of this change. Specifically, what concerns me is:

  1. There are two sets of incompatible rules, and which ones are used in a given piece of code is not written down explicitly [1] in the source code or in the Cargo.toml. For example,
  • to dive into a crate’s source code and understand its internal structure, I now have to manually apply whatever heuristic Cargo uses (or diff the file system against the mod lines — if all files are mentioned in mod lines somewhere, then I know it’s an old-style project)
  • if it’s not a Cargo-managed project, I have to go find and look at whatever script builds them (cf. https://github.com/rust-lang/rust-roadmap/issues/12).
  • there’s now a big semantic difference to “normal” projects if I build a small project by hand with rustc foo.rs (e.g., for test purposes), unless I always remember to pass this flag (which I won’t). Such projects are usually single-file but not always.
  1. This split will be around forever, probably even in a hypothetical Rust 2.0 as the old style remains 100% workable and suddenly removing it would probably be too big a breaking change for our tastes. This is, at the very least, aesthetically unsatisfying. It also makes the language as a whole, all the finer details that one needs to keep in mind to language-lawyer or comprehend legacy or pathological code, more complex.
  2. Regarding “How do we teach this”, not only will a comparatively huge amount of code (all multi-file crates in existence) and documentation (including the upcoming second edition of The Book) still be using mod and therefore confuse newbies [2], some (possibly many) projects will stick to the explicit mods. I don’t just legacy code dating to from before the change and old projects that never bothered to switch over, as long as backwards compatibility is kept and the change is controversial enough, it will be just a stylistic choice. We’ve had some people in this very thread announcing a desire for just that! Therefore, I fear that many Rustaceans will still have to know, and deal with, the “old rules” depending on what projects they are involved with.

[1] By the way, this should not be taken as implying I want an explicit/opt-in switch. That just has different downsides, which I assume are clear enough, as nobody proposed it yet. [2] I don’t look forward to replacing my standard explanation of why mod is needed and how it differs from use with “oh ignore this it’s legacy cruft” (especially since that won’t help anyone who needs to work with real code that involves mod).


#122

I want to push back on this, for several reasons. First, even the idea that it would be a mess is non-obvious to me. I don’t know if you’ve ever worked with, e.g., Eclipse or NetBeans, but they work exactly this way. You just point the IDE at a source directory and it compiles everything in there. And many makefiles and things in C projects that I’ve used wind up applying rules to $(wildcard foo/*.c), rather than explicitly writing out each .c file by hand.

Second, @petrochenkov is correct as to the implementation challenge; I am not sure that it is even possible to implement it this way, or at least it would be very difficult. Finding a workable definition of our module system is incredibly difficult (thanks to to features like pub use and globs). If we didn’t even have a notion of what paths exist or not that would be extra fun. (I guess we’d have to define more precisely what it means for something to be “used” anyhow…)

But finally, I think this rule isn’t really what I would want anyway for another reason. I often have Rust files that contain no items that need to be “used” from other places. Think of a file full of #[test] functions, for example, or a file that contains only a impl SomeType { /* inherent methods here*/ }. I want those to be compiled in. And, in fact, even when those items might get used elsewhere, I often start by writing some (dead) code and getting it to compile first, and then use it from elsewhere.


#123

I agree it doesn’t change the calculus, but it may be worth “rolling in” to the proposal somewhat. I was thinking a useful thing might be to try and convert some existing code into this methodology to get an idea what it will feel like. I might take a stab at it with e.g. Rayon. If we find that it seems overly verbose, then having a shorthand may help.

For these purposes, I think it’s critical that some “type-dependent” members (fields, methods) probably can just be declared pub without the need for further refinement (which would make them as “pub” as the type, basically).


#124

I wrote more on this in another thread: Use-cases for modules not implicitly included in the build

This is how modules work in every language except Go. IDEs that auto-generate use statements may change this, but I haven’t used such IDEs (when I use Java I add use statements manually). I used Xcode, where adding to the project and adding to the build (target) are orthogonal.

Makefiles with wildcards cause problems when there are merge files left by git (foo_BASE_X.c), so such implicit behavior is a problem and it is usually avoided by specifying included files more carefully.

For tests there’s a tests directory. For other “unused” files with side effects there’s mod and use to include them.


#125

These are valid concerns.

On the first point:

I think the rules cargo uses should be very simple. When you say “dive into the source code and understand its internal structure,” I think it should be visible from a combination of looking at the Cargo.toml for named binaries and glancing at the tree of the src directory. I’m imagining very simple rules (remember that the vast majority of projects contain only 1 crate & therefore aren’t at issue).

As to the difference between cargo and rustc, there are two aspects of this. The first is that this is a trade off: we could instead have the flag turn off implicit modules, but that would create a much greater “churn shock” than having the flag turn it on, because anyone using a non-cargo build system will have to fix this if it doesn’t work for them. Not sure which side of this is preferable.

The second is that it is already the case, and it increasingly will be, that compiling a project by hand that is normally managed by cargo is quite challenging. Resolving the dependencies and their transitive dependencies, managing features, and many extensions in the pipeline (such as automatic features / conditional dependencies) rely on cargo to do the work. The cargo layer is only going to grow over time.

On the second and third:

I don’t agree that we wouldn’t remove the old system if we performed a breaking change. I don’t see it as having any special challenges that don’t apply to other things we would be removing in some 2.0 event (such as macro_rules).

I agree that the social aspect of changing this (teaching it, dealing with old documentation being on the internet, most of all convincing users to make the shift) is the most challenging aspect of the change. I would even say that the less constructive responses this proposal has garnered are a demonstration of that challenge.


#126

While walking earlier this evening I had the idea that instead of the internal attribute, facading could be handled in a more direct manner. We could have a hoist attribute, which takes a path as an argument, and for the purpose of other modules’ view of the world puts the item(s) at that path instead of the path they’re defined at. pub use would then only be for ‘convenience re-exports’ like preludes or putting the major types at the top level, rather than for facading.

I’m not sure if this should be an attribute on individual items (hoisting them to the path) or on modules (hoisting all the items in that module to the path).

To give an example:

mod foo {
    mod bar {
        #[hoist(foo)]
        pub FooBar;
    }
}

use foo::FooBar;

The first advantage of this is that you now even know where its been facaded to when you look at a module, everything is inside the module.

The second advantage is that it supports a style in which mod.rs files stop being necessary. Instead every type in a module is defined in its own submodule, and possibly hoisted to its parent. This alleviates the use self:: situation somewhat; if you have no mod.rs files you’ll never need to do that.

However, I see a very big downside to this: if I hoist a type into a module, do I have to use it to make it visible there? Basically my position is that you shouldn’t use hoisting this way - that is, never actually use the name in the module you’ve hoisted it to. If you do that, that’s basically a ‘come from’ of file organization. Maybe it should be that a hoisted name isn’t visible in the module its hoisted to & can’t even be imported into it.


#127

Regarding #![internal]. It seems very similar to go 1.4 's feature to have internal packages. If I understood its proposed behaviour correctly, I doubt that it alone will solve the “I have to check every mod decl to find out whether an item is public” issue, as it seems to apply transitively, so when you declare a module as #![internal], all its sub modules will be too. In Go its solved by internal being part of the file path, so you will have to encounter it. To solve it in Rust, you could require those sub modules to bear an #![internal] attribute as well, making rustc error when a sub module of an internal module doesn’t contain #![internal]. Alternatively you could require modules to have either #!sub_internal] or #![internal], to make it possible to isolate or not isolate submodules from their internal module parents. I mean it would be quite limiting if you had to isolate all submodules of internal modules from their non-direct parents, so you’d need #![sub_internal] that has no actual change, other than indicating to readers that the current module is internal.

With #![internal] present in the proposal, the discussion about publicity becomes mostly one about defaults.

I’d like to propose a more conservative approach to the problem that works without implicit modules. What about adding just the #![sub_internal] attribute, without any other modifications to the module system (maybe it can be called #![internal] instead, but with same functionality). And to add a (default-allow at the start, default-error or always-error in Rust 2.0) lint to require it for all modules that are not pub and their submodules. And to make it recommended style to turn that lint into warn/error on a crate level, and to make clippy warn about its missing use.

Regarding #[hoist(...)]. Its more verbose to write than pub use, especially on keyboards with layouts where [] and () are behind modifiers. Other than that I’ll have to think about it.


#128

I know i’m late to the party but the module system keeps frustrating me to no end. With the rush to register ever more generic crate names like plugin, router and co, it becomes harder and harder to tell local modules from externed crates apart. I had to rename local modules and alias crates to resolve conflicts many times.

I started to resort to using super::foo in my code as a convention to refer to my own modules vs crates so that I know in a module if this is supposed to be a crate a local mod.

So while there might be a lot of things that people want to improve, the one that I think is more important than anything else is to just get crates and modules moved to different spaces. I’m happy to prefix all my imports with self or a crate name if that is necessary.


#129

I really liked it when pub only meant that you could refer to the thing through the containing module. I already don’t understand the current rules and they only seem to get trickier. :confused:


#130

You make a good point but there are a few points I want to clarify because they tip the balance a bit.

I think if we did this, we would make it a lint to have a pub item in an internal module that isn’t exported anywhere. So if you see pub, it would still mean pub, but not necessarily at this path. The fact that this is transitive does make it so you can’t be certain this path is valid without checking all of the parent modules. (This is an advantage of the explicit hoisting.) But whereas today, pub is used even when you really want pub(super), I think under such a proposal deeply nested in internal attributes would be uncommon, and so you would tend to assume that path is valid, and it would usually work.

More importantly, internal only applies to third party crates. Most of the time when I’m dealing with a separate crate, I don’t navigate the source, I read rustdoc. Rustdoc, of course, always tells me a publicly usable path. Where this ‘know the path is valid’ really matters is within a single crate; if I’m working in module X, I can go into module Y and look at the items and know immediately if I can use them in module X. For my own crates, rustdoc is not a great way to find this out, both because I have to regenerate it regularly & because it doesn’t by default document the internal APIs of my crate (whereas the source includes everything and is always available).

This “foreign : rustdoc / local : source” distinction is important because internal doesn’t apply to the same crate - within my crate I can access everything at its true path. So when I say I can look at the source of something and see whether I can use it, I’m mainly speaking within the context of a single crate.

Today its very common to break a single project into many crates, but as incremental compilation becomes more usable I expect that to become less and less common because of its implications for coherence. I also think people are less likely to do fancy facades for the crates that are within a project & not exposed outside of it, so internal won’t be a common factor in those cases either.


In terms of the syntax for hoisting being less convenient etc, if we wanted to make this fully first class (though this seems so radical even I am suspicious of it) we could imagine something like this:

mod foo {
    mod bar {
        pub(crate) @ foo struct FooBar;
    }
}

use foo::FooBar;

I’m going to convert a library (I think futures) to this system - using hoisting with this syntax instead of internal, to see exactly how even the most radical idea changes things.


#131

I see, that would work. Although it would forbid people to use pub to not have to write the long pub(crate) or pub(super) they would otherwise have to write. I’d be a big user of #![internal], I’d add it to all the modules that right now I don’t declare pub, just to make sure they are not exposed, and to make it sure for readers (including future self) that the crate is not exposed. Inside of them, I’d have to write the lengthy pub(crate) for every single item, I’d be literally punished to not expose something in the public API. I think there should be sugar for pub(crate), like cub or something else which is very short (around three characters). That could solve it, and at least make the new system not hugely more inconvenient for me and most other users to use.

To provide you with a personal story, my reasons to split up crates never were around compile times. I have a crate (lewton) whose 90% of uses is together with another crate (ogg) to decode ogg/vorbis files, still I have them separate in order for other things that may be included in the ogg container. I in fact wrote a crate to read ogg metadata from it, which is even used in servo, while lewton, which came from the same project as ogg, is not used (yet xD). And I’m considering splitting up lewton even more, e.g. to put the IMDCT decoder into a separate crate. This would help me writing an opus decoder.

There is this great feature to small crates that you can reuse the code everywhere and in multiple different places.

Another example, minetest. Its an open source minecraft clone written in C++. It has a map like minecraft, and it stores that map in different kinds of databases. Now there are big servers with big maps, and some of them want to provide a top down map to their users. For this use case, mapper tools were designed (some in python, but the most popular one atm is in C++) that get the map database as input and which output the finished map. They had to re-implement parsing of the data format, including the bindings to the database libraries, as well as the deserialisation of the actual chunks (minetest calls them mapblocks). If minetest had been a Rust project, you’d just simply could have put the map format and database interfacing into one crate and the rest of the game into other crates, and published it all on crates.io and the mappers simply could have used the crate and would have gotten the same code that minetest ran on. Of course, you can do this in C++ as well, but again, all the issues with crates downloading.

So I don’t think crates will get much bigger, and I don’t hope it.

I don’t see though how crates getting bigger will mean that you want to expose some module to the entire crate (or world) and not facade it. If it will affect motivations to facade, it will be the reverse.

I think I’m not a big fan of hoisting. I certainly wouldn’t want it to happen. It would make renaming a module very hard and require you to repeat the module name multiple times. And if you really want to find out where an item is reexported, just rg for the name of the item, or change the name and see where it errors. Of course neither is watertight though due to wildcard reexports. Also, looking at workflows, right now you can look at rustdoc output to get a list of items where you can copy each name into a (pub) use list, they are all nicely condensed, if you want to hoist a couple of items, you’d have to scroll through their implementations to set them.