[lang-team-minutes] the module system and inverting the meaning of public

IIRC in Rust you can’t make any thing more private, you can only make things more public. It would be weird to have modules that are the opposite of that, and only public by default in one of the two modes of creating them.

Rust doesn’t even have keyword for privacy any more. That’s why #![internal] invents a new name for it, rather than use existing name like #![pub] to make a private-by-default module public.

Modules implicitly public by default would essentially deprecate the pub keyword, and make using pub instead of pub(crate) a mistake. That’s very unfortunate, because it’s an easy mistake to make, and pub is easier to read and more convenient to write.

OTOH with private by default, implicit modules behave like the rest of Rust. The simplest and shortest pub syntax remains usable and sensible. Nothing is taken away, because pub(crate) and pub mod or (I’m proposing) #![pub] can still be used.

That’s not why #[internal] invents a new name for it. You’re talking to me, no need to make assumptions.

First, not everything in Rust is private by default. In particular, enum variants and their fields are not. Rust still has priv as a reserved word dating from when you could declare an enum variant priv.

However, the proposal is not that modules be public by default, it is that they be as public as their most public member (and only if they don’t have an explicit mod statement). The entire point is that modules would fade into the background in the mental model of most programmers, and not have so firm an independent identity of their own that needs to be manipulated.

Much in the same way that variants of enums now have faded into the background, so that you didn’t even consider them as a ‘thing’ which could have been private by default, but isn’t.

As to why internal isn’t called priv, I chose a separate name to keep the concept simple instead of inviting the possibility of a pub(restricted) system for that attribute. It might play out differently, but at least for now I’m in favor of keeping that concept as minimal as possible (binary, in other words).

That would be fun to integrate into the import resolution algorithm. Currently all names are defined upfront (so you have to run fixed-point algorithm only on items) and then paths are looked up in the definitions. With your scheme every used path will be able to create a new item and potentially affect resolution of all other paths.

3 Likes

I don’t understand why it is so important to make modules “as public as their most public member”. Can you explain the benefit?

Its how you get the property that an item’s visibility declaration is all you need to witness to know how public it is. Otherwise, you need to trace up the module tree to find out if a module its in (or that module is, recursively) is not as public as this item. This would become even more difficult if, as in your proposal, we needed to use pub use statements to find it.

I hope that isn’t a major change to the architecture. The logic would be: instead of displaying "unresolved import foo" error, pretend there was mod foo.

OK, so if I understand correctly, you’d prefer pub to be used to export things as public API of the crate, from anywhere, and require use of pub(crate) for non-exported items, so that it doesn’t matter whether modules are public or not.

I see it differently, because I dislike using the long pub(restricted) syntax, so I’d like to continue using only private modules and easier to read and write pub. I also use the facade pattern, because I think shorter paths make small libraries easier to use (and I intentionally break things down into many small flat crates, rather than one large nested one like std).

For me this proposal would be a net-negative, especially if unreferenced files from the filesystem are included as public API of the crate (and they will, because they’re going to be full of pub that I use to mean pub(crate)).

I’d rather continue writing mod foo than have to start adding (crate) to all pub :frowning:

I also want the syntactic sugar for crate fn and probably super fn, but I don’t agree that it changes the calculus of this decision.

1 Like

OK, I think I’m starting to get what you’ve meant by =.

I thought the pub(crate) proposal meant to ensure that item is definitely not exported (<=). But implicit public modules also make pub mean the item almost definitely is exported (=).

For me only exporting too much is a problem, so I prefer <= over =.

<= is a straightforward addition, and it’s a solid guarantee. The = proposal has negative side effects on the way I use modules, and is not even a solid guarantee (there could be explicit mod foo making it private), so I don’t like the = part, sorry.

Apparently I was rooting for ditching mod for a different reason :slight_smile:

@withoutboats

"as public as their most public member"

FWIW, implicit modules aren't necessarily needed to express this. Some special visibility like pub(transparent)/pub(auto)/pub(inferred) on an explicit module declaration would work too. Looks like this visibility can even be calculated easily given some precautions (e.g. pub use a::*; need to make the containing module pub even if the glob doesn't actually import anything).

Sure. I believe eliminating the mod-semicolon form has its own advantages.

EDIT: Its a fair point that we can talk about each of these aspects of the proposal independently though.

At least as major as Macros 2.0 if i'm not mistaken.

That’s a shame. In that case implicit modules would break my builds/my workflow, because I do keep test/temp/not-ready-yet .rs files around.

I like the idea that modules can be as public as their most public item, since that makes “= privacy” the default instead of “<= privacy”. The facade pattern is cool, and we should support it, but it’s weird for all modules to be facades by default.

I do like the idea of not requiring a mod foo; declaration to import foo.rs, and relying solely on the use foo::... statements and foo::... symbol paths, since they make the mod foo; redundant.

I do NOT like the idea of making every .rs file in the source tree an implicit auto-imported module whether or not it is ever used or moded. Even if I did like it, I don’t see how it could be backwards compatible.

I do NOT like the idea of replacing extern crate declarations with cargo or Cargo.toml or the --extern flag or some other extralinguistic mechanism. The details of how dependencies get resolved belong to the build system, but declaring those dependencies should stay a part of the core language.

At the moment, I think the biggest problem is that everyone reading this thread seems to have interpreted “the proposal” very differently. I’m honestly not certain which of the points I just listed are actually part of “the proposal” or not. I’m pretty sure I got at least one of them wrong on my initial reading of the post, assuming @withoutboats’s and @nikomatsakis’ interpretations are the correct ones.


I very, very, very strongly like the fact that nobody is arguing for “>= privacy”. That’s the only suggestion that ever worried me, so I’m confident I’ll be happy with the outcome here no matter what it is.

1 Like

Some more thoughts.

Motivation

Like @nikomatsakis said, the confusion about our module/visibility system extends beyond just new users. While @aturon and I were talking yesterday, he got tripped up about whether an import needed to be prefixed self::. Outside of this discussion, I’ve seen advanced users (including those who have voiced objections to this proposal) getting mixed up by the dual meaning of pub.

What we see with new users is a sharp divergence in how hard it is to acclimate to the basic mental model of our system. Some users find it easy to grasp, some find it very challenging. But I think in its full nuance, our current system is ambiguous and confusing for even very experienced users (including those with strong opinions about whether it is good or not). The most accute example of this is the dual meaning of pub I touched on in a previous post.

Biographically, I began this investigation because of the confusion many new users had about the module system. My goal was not to ‘make Rust easier;’ I was working from hypothesis: the module system doesn’t operate on significant useful complexity (that is, its about organization, not data, behavior, or abstraction). It isn’t doing enough to justify how confusing some users find it.

This isn’t the same as trying to just trying to make Rust easier - its about recognizing a system as more complex than it needs to be (in contrast, something like borrowchecking needs its complexity). I want to make explicit & challenge a certain narrative underlying a lot of the comments which I see as really just a polite form of ‘hackers vs newbs’ framing: the self-identification as ‘UNIX users’, the emphasis on error messages as the solution, the unfavorable comparison made in IRC of this proposal to PHP. I’m drafting this post in vim from an Arch Linux system with a tiling window manager, and I am excited about how this will improve my own experience.

As I investigated what was making the module system so difficult for users, I noticed a lot of other issues I really didn’t like about it. I found it to involve a lot of redundant ceremony that seemed to buy us very little. I found I am frustrated that I can’t find out how visible something is without tracing the modules between it and the crate root, and that the information I need to know about a module isn’t contained within it, but partly in its parent. I noticed that the public-in-private issue is an interminal quagmire.

All of these seem to me like real problems. We can disagree on how we should balance these problems against the problems that changing the system would bring - all language design is about trade offs. But to claim that the system propoesd has no advantages over the current system, or - even more extremely - to claim that the current system has no flaws, is not the discourse of a meaningful technical discussion.

“Junk files”

I’m fairly convinced that being able to rapidly comment out a module is a useful feature, even though I would never leave junk files lying around for any significant period (like, past a git push). While you’re working, it makes sense to leave files in a poorly typed state, go work on something else, and then come back to it. Certainly you might want to compile everything but the incomplete module in the meantime.

There is another really easy way to comment out a module that is already supported: tag the module#![cfg(ignore)] (where ignore is any feature that doesn’t exist). If this is percieved as too much of a ‘hack’ to recommend, we could easily support it as a first class citizen: #![no_compile] or the like. Similar to the #![internal] attribute, I like very much that such a technique moves the important information to the module, instead of putting it at the parent.

(This solution has problems if you don’t even want your module to be parsed, but they’re surmountable.)

Backwards compatibility

Backwards compatibility remains a huge open question for this proposal, and like @hanna-kruppe I think we need to start trying to address it. I don’t want to address any specific cases in this post, but I want to lay out a framework for how it can be done.

Basically, the problem seems to be that while most crates work fine with this proposal (that is, they do not have crate trees in overlapping directories), there are many that do not. I went through about 80 binary crates a few weeks ago, and found that about 20 of them had both a library and a binary under src. About half of those followed a pattern that, in my opinion, we could continue to support, but about half of them simply wouldn’t work with this proposal without restructuring the crates.

(The other common pattern aside from binaries which expose a lib is that of multiple crates in a single directory is a directory full of ‘rust scripts,’ like rustc’s tests. I’m not concerned about the particular shapes, though, just the general principle that some crates can’t be migrated to this.)

The basic principle of how to perform the migration seems like this to me:

  1. This feature is turned on by some argument passed to rustc.
  2. If you only have 1 crate root managed by cargo, cargo passes this argument to turn this feature on.
  3. If you have multiple crate roots managed by cargo, cargo recognizes some directory patterns in which it is safe to turn this on. crates.io’s docs will recommend using these patterns from now on.
  4. If you’re not following this pattern, cargo will issue a warning to encourage you to migrate your code.

I don’t think we should have a flag in the Cargo.toml for controlling whether this is on or not. I think it should be based entirely on whether cargo determines that your directory structure matches what it considers correct for this feature.

5 Likes

This sort of… confuses me semantically. Today, crates and modules are very similar to each other. They both keep some parts of themselves private from their parents while exposing others, and they can both re-export public items from their children. So why does your inquiry go out to the crate root and no further? Why not care equally about re-exports by dependent crates?

Sure, modules are more likely to be 'internal', i.e. have an API meant to change freely with the needs of the parent rather than providing any stability guarantee. But there's no rule that you can't expose an 'internal' API in a crate either; on the contrary, it makes a lot of sense to do so if multiple crates are logically part of the same project and there is some helper code used by all. (This is often when you'd use an = dependency in Cargo.toml, and doing a quick grep of the registry, = dependencies seem to be far from uncommon, appearing in 231 crates out of 6,204 crates that have any dependencies.) The reverse is also possible: modules can be designed to have a self-contained, 'independent' API. If nothing else, coherence issues and compilation time can encourage keeping bits of functionality in one crate even if they could be split out into independent crates.

I guess the intuition is that I'm just describing module-like crates and crate-like modules, that crates are meant to play a role as "stability boundaries" even if there are exceptional cases where they don't. But that conflicts with my stronger intuition that "unit of compilation" is (should be) essentially an implementation detail.

I think my perspective is similar to @kornel's, because their proposed solution seems most natural to me. Implicit modules shouldn't be public by default any more than any other item defined in their parent module (i.e. frequently the crate root). "Public" should mean "public to the parent", and it's the parent's job to decide if they want to reexport - regardless of whether we're in a submodule or in the crate root, in which case the parent is another crate consuming this one. (So I don't see pub as currently having a "dual meaning".) This would have the side benefit of allowing implicit modules to be implemented without breaking backwards compatibility, under a scheme where rustc only looked for foo.rs when given use foo. As has been stated, public modules couldn't be lost because they'd have to appear in a pub use anyway to override the visibility default.

In other words, use would basically work the same as Python import. The meaning would change from "look up this path in the existing module namespace" to "look up this path, and if it can't be found, try to find it on the filesystem" (edit: that's not quite what I want actually - see later post). use already treats its path arguments "specially" – as absolute rather than relative – and treating them as potentially not yet known to the compiler is just another kind of specialness. Then, "try to find on the filesystem" is also what extern crate does, broadly speaking, so use could handle crate imports as well – allowing extern crate to be shed from the mental model without the confusion of Cargo.toml/--extern causing things to magically show up in the namespace (as has been discussed in other threads).

4 Likes

I don't agree with this intuition. The unit of compilation matters a great deal semantically, because it is the window through which the compiler is able to see the world. This is why the orphan rules exist, for example; so that the compiler's view of coherence can't be changed by any information not available at compilation time. This is also why crates are required to form an acyclic graph, whereas the internal module graph of a crate can contain cycles. This is why I can pub(in path), but I can't pub(in [these three crates]).

Once you've made something public - truly public, outside of your crate - all bets are off; every crate that can discover my crate can import this item anywhere. The question you ask about re-export from other crates doesn't make sense to me - if I can see those other crates, I can also see this crate, their re-exportation doesn't impact whether or not I can get at this item.

The problem with @kornel's proposal is that mod would still be used in a frequent but hard to explain niche case, when you have a library and this module's items are pub but aren't used elsewhere in the library. This makes a system which seems more complex to me than what we have. However, I think there's a connection between your endorsement of that idea and our disagreement about that intuition. The reason to base visibility on the declaration on the item, rather than on the use statements, is that the item is declared in this crate, but the use statements could be declared in other crates (hence the need, in some cases, to stand in for those other crates with mod statements).

In my view, just because something is on crates.io doesn't mean it's there for anyone to depend on. It may be only for the use of my own crates.

Yeah, looking back I guess my suggestion differs from @kornel's somewhat. In fact, I confused myself and ended up misstating it slightly. What I really mean :slight_smile: is:

The namespace for regular names and for use paths would be separated. The latter would implicitly include modules based on filename, as well as extern crates.

So: If you're in lib.rs and you want to make a module foo.rs "public" (i.e. re-export it to your parents, namely other crates), you'd write pub use foo;.

If you don't want the module to be public, you can use foo; or use foo::bar;, or both. If you use foo::bar, foo.rs is loaded but you don't get foo in the regular namespace, only bar. So the experience would be very similar to how use works today, in those cases where you can use use. The only names in scope in a given module, regardless of whether it's a root or not, are those that appear at the ends of use paths; you don't need to consult the filesystem or Cargo.toml, unlike some other proposals.

And extern crate foo; would become use foo; too. As I said, similar to Python import.

However, I think there's a connection between your endorsement of that idea and our disagreement about that intuition.

Yep.

4 Likes

Having Cargo detect whether your directory layout is tricky, and having plain rustc invocations default to the existing semantics in the absence of a flag, does alleviate backwards compatibility concerns. And to be frank, it's much smarter than anything I came up with. I have some doubts about the accuracy of this detection by Cargo but let's put that aside until we get into the details.

A much bigger concern I have is that this leaves the existing module system 100% in place and furthermore makes the switch implicit. This may be unavoidable for backwards compatibility, but if so, that's (again) a downside of this change. Specifically, what concerns me is:

  1. There are two sets of incompatible rules, and which ones are used in a given piece of code is not written down explicitly [1] in the source code or in the Cargo.toml. For example,
  • to dive into a crate's source code and understand its internal structure, I now have to manually apply whatever heuristic Cargo uses (or diff the file system against the mod lines — if all files are mentioned in mod lines somewhere, then I know it's an old-style project)
  • if it's not a Cargo-managed project, I have to go find and look at whatever script builds them (cf. https://github.com/rust-lang/rust-roadmap/issues/12).
  • there's now a big semantic difference to "normal" projects if I build a small project by hand with rustc foo.rs (e.g., for test purposes), unless I always remember to pass this flag (which I won't). Such projects are usually single-file but not always.
  1. This split will be around forever, probably even in a hypothetical Rust 2.0 as the old style remains 100% workable and suddenly removing it would probably be too big a breaking change for our tastes. This is, at the very least, aesthetically unsatisfying. It also makes the language as a whole, all the finer details that one needs to keep in mind to language-lawyer or comprehend legacy or pathological code, more complex.
  2. Regarding "How do we teach this", not only will a comparatively huge amount of code (all multi-file crates in existence) and documentation (including the upcoming second edition of The Book) still be using mod and therefore confuse newbies [2], some (possibly many) projects will stick to the explicit mods. I don't just legacy code dating to from before the change and old projects that never bothered to switch over, as long as backwards compatibility is kept and the change is controversial enough, it will be just a stylistic choice. We've had some people in this very thread announcing a desire for just that! Therefore, I fear that many Rustaceans will still have to know, and deal with, the "old rules" depending on what projects they are involved with.

[1] By the way, this should not be taken as implying I want an explicit/opt-in switch. That just has different downsides, which I assume are clear enough, as nobody proposed it yet. [2] I don't look forward to replacing my standard explanation of why mod is needed and how it differs from use with "oh ignore this it's legacy cruft" (especially since that won't help anyone who needs to work with real code that involves mod).

7 Likes

I want to push back on this, for several reasons. First, even the idea that it would be a mess is non-obvious to me. I don't know if you've ever worked with, e.g., Eclipse or NetBeans, but they work exactly this way. You just point the IDE at a source directory and it compiles everything in there. And many makefiles and things in C projects that I've used wind up applying rules to $(wildcard foo/*.c), rather than explicitly writing out each .c file by hand.

Second, @petrochenkov is correct as to the implementation challenge; I am not sure that it is even possible to implement it this way, or at least it would be very difficult. Finding a workable definition of our module system is incredibly difficult (thanks to to features like pub use and globs). If we didn't even have a notion of what paths exist or not that would be extra fun. (I guess we'd have to define more precisely what it means for something to be "used" anyhow...)

But finally, I think this rule isn't really what I would want anyway for another reason. I often have Rust files that contain no items that need to be "used" from other places. Think of a file full of #[test] functions, for example, or a file that contains only a impl SomeType { /* inherent methods here*/ }. I want those to be compiled in. And, in fact, even when those items might get used elsewhere, I often start by writing some (dead) code and getting it to compile first, and then use it from elsewhere.

2 Likes