I think there’s two things. The first one is that there always is going to be a semantics gap, because directories and filenames can have names that are invalid in the language, and one is intrinsically tied to the OS while the other is independent. I don’t think that’s bad.
I also think that sometimes it makes sense to have per-file module, sometimes it doesn’t, in some weird cases I might want directories that don’t create modules!
I can think of an interesting solution that let us benefit from the best of both worlds (though it doesn’t care about being backwards compatible and the implications yet).
- In a crate directory’s root a cargo.toml file is added that explains how to import external crates. They are exposed as modules on the root (
- The crate directory (defined by containing the cargo definition) has its contents inside the root directory.
- Files and directories whose name does not start with
_ create a new module with their name and have their contents inside this module
- Files and directories whose name starts with
_ are implicit, and have their contents added to the root directory.
- (backwards compatibility we’d special conditions for
lib.rs and mod.rs).
- The root namespace is always
pub(world).
- By default the modules generated by files and directories are
pub(crate) (except for the root module). Files and directories prefixed with pub- will be pub(world) and their name ignores the prefix. You can make a directory foo into pub(world) module by naming it pub-foo and a file bar.rs into a public one by naming it pub-file.rs; the module names would remain the same even as the filename changes.
- Names with
_ leading are not creating modules, so it makes no sense, it’s ok as both prefixes are mutually exclusive pub-_baz.rs is an invalid filename. A name _pub-foo.rs could be taken as a valid implicit file, since it doesn’t create a module, but I think we should make it a mistake since it can be interpreted in ambiguous manners.
- Backwards compatibility we’ll probably have to make things
pub(world) by default, and you’ll have to opt out, I like following code convention.
-
mod <name> always creates a module within a file. The whole importing file thing is dropped because it’s hard to keep file structure and their contents in-sync (and they are explicit enough on their own, IMHO).
So then we can have a directory that looks like the following:
my_rust_crate
|- cargo.toml // Imports crate crt
|- _lib.rs
|- foo.rs
|- pub-baz.rs
|- pub-bar
|-_big_function.rs
|- _other_big_def.rs
|- _small_stuff.rs
|- _fizz
|- buzz.rs
|- _impl.rs
This creates the equivalent namespace:
pub(crate) mod crt { // optionally this module would be inside a pub(crate) mod extern statement.
// Contents of crt crate
// This actually works differently because of crate special rules
// making the crate pub(world) needs to happen at the toml level.
}
// Contents of _lib.rs
pub(crate) mod foo {
// contents of foo.rs
}
pub mod baz {
// contents of baz.rs
}
pub mod bar {
// contents of _big_function.rs
// contents of _other_big_def.rs
// contents of _small_stuff.rs
}
mod buzz {
// contents of _fizz/buzz.rs
// Weird edge case the _fizz directory has its contents dumped, but buzz is a separate file.
}
// contents of _fizz/_impl.rs
Also I’d make all paths work from the current namespace, instead of global, and have a unique way of accessing the absolute path (probably starting them with ::. As long as you don’t overload std (there should be lints preventing that) you can always do use std::... and it’ll do the right thing.
Now some caveats and random thoughts.
- Open question: is it worth it to be limited to OSes that support filenames and directories starting with
_, does this make sense?
- There’s also the problem of two mods with the same name. Either a file and directory both with the same name and/or a case where you have
foo.rs and pub-foo.rs. The former is not as bad, because we could implicitly declare them both as the same (but to keep surprises low I’d prefer it’d be an error), the latter is very problematic because it’s creating two mods with different visibility rules but the same name.
- Alternative: Could we do something like Go and use capitalization of the first letter to decide if it’s
pub(world|crate). This would avoid the problem with two mods with different visibility but the same name. Windows filenames are not case sensitive though.
- It might not be intuitive for new users that files and directories that start with
_ are not modules. I think that the solution is to make module names starting with _. When they try to use a file starting with _ as a module, we can realize the probable mistake and educate them of implicit files and directories.
- This means that we’d have a more complex identifier story for crates, but at least we wouldn’t need magic filenames.
- Open question: does
_.rs make sense? does a directory named _ make sense?
- Backwards compatibility this offers some headaches, and we’ll need exceptions for some of the current conventions ie.
lib.rs/mod.rs (even though they aren’t needed any more.
- It might not be intuitive that your library is not exporting all things. It’s also weird that root implicit files don’t need the
pub to expose their contents which might lead to newer users using a flat collection of implicit files. What we want is that implicit files are used as the exception instead of the rule, generally new files should store their stuff in their own module to keep things sane in the long-run, we want to make that path easier.
- This might mean that, when dealing with file/directory modules we want an inversion of the code convention:
pub(world) by default, with an extension. Maybe change the prefix from pub- to local-.
- This makes a bunch of keywords somewhat redundant, of the top of my head,
mod foo (it’s no implied by file structure), extern crate (now defined in cargo.toml).
-
mod {} still is useful for when you logically want a new namespace, but pragmatically want things in the same file. If anything it’s even more important with implicit files. I might want to have an implicit file contain, for example, a struct and its impls, but I also want to create some helping functions, so I’d “hide” then inside a mod type_name_internal {} to avoid contaminating the shared namespace.
-
pub use is still there, but it should become a lot rarer for crazier stuff. Hopefully it will be less common.
- I am implicitly leaving out the
src/ directory, I think that tests could fit within a module (this might be more complex though). This may not be the case, but then we can easily map to something else.
Just as a final exercise, here’s how you’d do the directories for some of the examples from the article.
The future module would look something like this:
future
|- cargo.toml
|- _future
|- pub-future // Here goes whatever stays in futures::future mod.
|- _and_then.rs
|- _flatten.rs
|- ...
|- _future.rs // This contains the Future structure, with the namespace futures::Future
|- pub-poll_fn.rs // This is a separate module
| ...
|- _poll.rs
|- pub-poll.rs
| ...
This allows us to have separate files with separate functionality exposed in different files. It still is somewhat complicated but it seems a bit clearer where the contents of each file is by just looking at the file structure.
Of course the question that stands out (IMHO) is: does it even make sense to keep the same structuring based for futures if this is changed? I would have to think more about this.