I have worked on a C++ code base that is a few million lines of code. I don't know how that compares to glibc, but I found files on the file system were often out of date with the CMakeLists.txt (paired with ninja as the backend). Going to a glob based approach has been a big step in the right direction, making things more consistent.
Back in the day, Visual studio 2005 couldn't handle the (at the time somewhat smaller) code base. You had to open bits and pieces of it. We then went to Linux and Qtcreator which handled the size well. Now most people are using clangd + vscode and it works well unless you are also compiling at the same time (in which case fzf style functionality is a bit laggy, but still usable).
I don't know if Rust-analyser scales equally well, I haven't worked on as large projects there yet.
So I think your fears are way overblown, but the only way to know is to test it.
At least for CMake, my thoughts are in this response on CMake's Discourse. I have a long-standing task to write up a blog post for why globs are not worth their weight in my view.
My considerations are independent of the size of the codebase. There are situations where files with a matching extension exist but are not actually part of the build. Of note is during Git conflict resolution. In C++, when I'm fixing conflicts, I like to build the file in question to test that my conflict resolution is working while I'm working on it. During conflict resolution, Git drops files with the same extension as the file in question beside it for editors to work on (and share extensions to help with any language detection support editors may have). Build systems which glob will end up adding these to the build unnecessarily and, if not doing a specific .o build, can cause a lot of unnecessary build noise (and certainly linker due to the inevitable duplicate symbols). This is less applicable to Rust because it is a lot harder to test out specific file conflict resolutions given the difference in TU size.
The second is that glob files don't have a presence in the diff. A naïve git diff won't show untracked files when you ask "what diffs do you have?" during debugging. When there's no visibility that there's a file included in the build, it can make these "debugging at a distance" issues harder. Given Rust's "pit of success" preferences, this is an easily avoidable pothole IMO.
With generated sources, the code generator should just be explicit; it knows what it is doing anyways. Removing previously-present-but-no-longer files from the build tree is not always trivial (especially if they are cfg-dependent). This is analogous to the previous point except that VCS tracking of generated files isn't even there to appear in a git status output.
Yes, and in all cases I found, those were bugs. Left over files, or gtest based unit tests that should have been executed but never were. Having a single source of truth is better.
Strange, I remember them getting .cpp.rej, .cpp.orig etc? Not at my computer, so I can't double check. I usually resolve conflicts before attempting to build, so our workflows are different.
This I don't understand what you mean with. Git status will show untracked and modified files separately. But how does that affect debugging? You will need to expand on what you mean here, maybe with a concrete example.
Yes, I agree on that. Generated files are quite different from manually written files. Ideally the build system should keep track of which files are generated and add them in (and they should live in the build tree of course). For C++ build systems this is trivial since the TU is a file. Cmake or Bazel can just be told to include the generated files into the library. For Rust and Cargo, that is less clear. Especially with the current build script design.
This last point made me realise something though: In C/C++ what source files to include is up to the build system, the compiler doesn't care (as long as it can resolve includes). This is unlike Rust or Python with module systems (I'm excluding C++ modules, I have yet to see them used in a real project). I think this makes a important distinction, cargo can do globs on crates at a workspace level. And no one is complaining about that. Why is that?
Cmake does not natively support globs on add_subdirctory as far as I know, but the language is powerful enough that you could emulate it. Bazel will auto discover BUILD files in the directory tree. Either works, but why list something that doesn't need to be listed, I think Bazel does this specific thing slightly better.
FWIW, I am a CMake developer. I have also seen these things. I believe that there can be tools to diagnose such things[1], but they have to be advisory without understanding things like $<$<PLATFORM:Windows>:windows_impl.cpp> as a target source is a mention even if the file doesn't actually end up getting used in a non-Windows build (the generator expression at least has hope; if (WIN32) configure-time logic is much harder). I suppose one could instead just #if out the entire source file, but then you tend run into linker diagnostics about objects without symbols.
I found this release note from Git 2.2.0 mentioning the scheme. .orig and .rej are from git apply. The "enhanced" names are used during git rebase, git merge, or git cherry-pick.
Setup:
user works on repo which uses globbing
adds a new file to test/save/store something
runs a build
something breaks
Yes, git status will show the file, but git diff won't. So when debugging a situation like this, it looks like a "clean" source tree with a plain git diff (I want the actual diff, not a list of files when debugging, so I start there). With explicit source listings, the file doesn't matter unless the build is informed (which leaves a diff to hint at the situation).
Globs are even less applicable here. The build system needs to know even more about them to handle them correctly:
files which provide modules must be handled specially so that BMI-providing commands can be reliably provided (modules have only added at least 3 new file extensions to the zoo we had rather than moving towards any actual "official" extension)
like with headers, which modules are "public" versus "private" is not immediately obvious from the filename, so some other signaling is necessary
if you tell the build system which files cannot import modules, you can save the extra work needed to deal with them (scanning and dynamic dependency discovery)
That's news to me, but maybe because it is far more likely to leave an experimental source file "laying around" than an entire crate? A stray file is a :w blarg.rs typo away; cargo new has a much higher Hamming distance to anything that might happen otherwise.
I have plans for making file sets of all files that matter to targets, but there are still corner cases: sources that are .. include'd into documentation but not actually compiled. With file sets telling CMake all files that "matter" to a target, you can lint on files that do not "belong" to any target. ↩︎
Yes, there always need to be escape hatches if you are doing unusual and advanced things. But we shouldn't let those cases stand in the way of the 99.9% normal cases.
Also, I consider conditionally including files in compilation a anti-pattern. Better options: targe specific static libs, or just ifdef out the file on the other platforms. I don't remember seeing the linker warning you mention, but if that is the case, that is unfortunate.
I don't think I have ever used git apply. Merge, rebase and cherry pick I do very often.
I admit that I generally use a GUI for git, and it shows everything clearly. I like SmartGit myself, but it isn't open source. But I have heard good things about other options like lazygit.
And git diff is usually overwhelming anyway, I generally have a large number of changed files.
Now a way to make globos more useful is to like bazel add exclude blocks: Functions | Bazel
You can specify "all cpp files, except those in platform/" and then add the platform specific files to the list returned by glob. Generally, a declarative approach allows smarter build systems. Cargo could usefully lint on if you forgot a source in your tree for example (I don't know if it already does). Bazel could also do some with some effort. And cmake, well it is Turing complete, so good luck?
Microsoft has LNK4221 which is even better because it is apparently order-sensitive (empty objects only warn after the first empty object on the command line). I recall seeing it on macOS too but I can't find the warning in the public ld64 source.
rust-analyzer puts an LSP diagnostic on line "0" that the file is not part of the module tree. Cargo itself is silent.
File sets make things far more declarative. But yes, there is a lot of historical baggage to tote around in the meantime.
Ah, it could be a 3-way diff thing. If you're not using a 3-way diff it could just be doing .orig and .rej.
glibc is currently about 1.85 million lines of code. I don't have GCC or Firefox checkouts on this computer.
rust-analyzer is in fact the most recent IDE-like thing I tried and found to ruin my battery life.
That's the other thing - in the codebases I work on it is normal to have source files in the source tree, checked into version control even, that are not supposed to be part of the build. I gave examples in the previous discussion. It's also very common to have files that are part of the build but only in some specific situation, such that their mere presence in a directory is insufficient information. Glibc again provides a good example: each subdirectory is an unsorted bag of files that may go into the main library, the secondary libraries, bundled utilities, and/or tests. (This is intentional. The idea is to make it easy to find all the files that are relevant to a specific cluster of functions, usually all those declared in a particular standard header.) Sometimes a single file is compiled multiple times with different compilation profiles for different uses. Thus, at least some of the files have to be explicitly mentioned to the build system, and at that point it's been my experience that it's easier to review changes under the policy that all files must be explicitly so mentioned.
I don't want to hold up glibc as an example of good project organization -- quite the opposite! However, one must keep in mind that every big, messy project was a small, less messy project once, and that the affordances of the build system -- what it makes easy, and what it makes hard -- will guide the project as it grows. Rust can encourage good project organization, or not. It is my contention that the existing requirement of explicit mod blah for every module encourages good project organization, and that even a highly restricted opt-in "every file in this directory is a submodule" mechanism is liable to do more harm than good.
This is argument by assertion. You're the one advocating a change. The onus is on you to provide actual evidence, not just claims that evidence exists, or claims that other people's everyday experience is an edge case.
This is all retorical gatekeeping on your part.
My claim is perfectly testable as @Vorpal already mentioned - we can do an experiment and test adoption of my approach. We cannot falsify your claim that a basic auto-discovery mode is unnecessary.
Your everyday experience is by definition an edge case:
You are bringing examples from a different ecosystem that in itself aren't representative of said ecosystem, nor have you demonstrated how it's transferable to Rust.
Most c/c++ devs aren't working on GCC. There are a lot of regular C/C++ libraries that have just a simple list of files in one folder.
I've spent a decade of my career working on financial software in C++.
Having a proper package manager (Conan in our case) is a game changer and it solves all the issues you raise.
We have hundreds of small git repos that make all the tree structure shenanigans you talked about obsolete. They all have a simple small set of files, and some older ones have maybe a few sibling dirs at the root level. That's it. We don't need complex tree structures because we can manage complexity via Conan's package dependencies.
Going back on topic - Rust - we have a ubiquitous package manager in Cargo thus mostly eliminating the need to grow complex tree structures per crate.
IDK if this would count as evidence —as it's only one anecdotal data point— but when i was a total Rust newbie i found it surprising that doing use something (or use ./something, or use crate::something) failed when i had created an src/something.rs file. The usual autocomplete that i got for other external modules (crates in Rust parlance, but this is before i knew the difference) when doing use <cursor here> didn't help either.
I just tried it, and the error message still isn't very helpful. It just states no `something` in the root. If it had a code recommendation like help: if you want to import the something.rs as a module, add a `mod something` statement to the root main.rs file, i think we could argue that this is already a beginner-friendly error and that the existing tooling is guiding new Rust programmers in the right direction. But as it stands today, i wouldn't be surprised if newcomers are still getting confused by this.
IDK what's the best solution for this. But since many people probably come from languages where imports are basically a glorified eval(read("some/path.ext")), i think it'd be a good idea to detect that pattern and nudge beginners in the right direction.
If this sounds like a good idea, i can help filing in an issue and trying to implement something along those lines
Also, i'd personally really like to have something like mod * or an equivalent opt-in (or opt-out) option on Cargo.toml. I think removing friction for the common case of using the source files inside a source directory is a good idea.
I work on a large C++ code base in the embedded Linux domain (control cose for industrial vehicles). We have in the high tens of thousands of C++ files.
We used to list files by hand, but as already stated: that was error prone. Now we use globs.
I think there are two cases in total of conditional compilation in our entire code base, both in the platform abstraction layer. Just two cpp files that exist in a Linux and a legacy-RTOS version (that we migrated from a few years ago, but there are some old devices still out there using that).
So, the overwhelmingly common case is "all files should be included".
It would be interested to pull down the top 100 crates on crates.io and see how many would work with a file system based approach. Would that be evidence enough for you? If so maybe that is something that @yigal100 can do. Because you haven't specificd what would count as sufficient evidence yet, and without that your request is impossible. You need to clearly define your goal posts and stick to them.
IMO if the problem is "new Rust users are confused that making src/something.rs and then writing use something::* doesn't work", the better solution is to make the error message more helpful, not to change the entire language.
Personally, I could tolerate an explicit mod *; statement to automatically read direct child modules from the directory structure, but with the idea that it's something to be rarely used in niche cases and not the default for code (like how I currently treat use some::path::*;). But I wouldn't want it to be viewed as the normal thing to do, and I definitely don't want it to do anything implicit or recursive.
I don't have random .rs files lying around next to the code that are meant as notes, but I do sometimes have things like #[cfg(target_os = "foo")] mod foo; and I'd prefer a world where people don't assume without reading any code that, since the foo.rs file exists, that it's a module that can always be accessed.
Explicit is better than implicit, and IMO saving a few lines of typing isn't worth making these changes. (That is what all this is about, right? Just saving you/your autocomplete from typing the extra mod ..; line when you add a new module? From what some of y'all are saying, it sounds like you're talking about saving much more work than that)
In that line of thinking, would an automated tool to add/remove "mod" statements be a workable solution for the large projects?
Basically, an external opt-in glob import, with controlled updates.
rust-analyzer definitely knows when a file is not part of the module tree. It could probably offer a fixit to add a line to the "appropriate" place (assuming the immediate parent is part of the tree).
I always do it in the opposite direction. I add a mod something; and rust-analyzer has a quick action to create the file for me. Which feels like less work than actually creating the file manually.
I think a better error message would be great, and there is ample precedent for doing things we wouldn't otherwise do when the result "only" affects diagnostics. So please do file an issue.
Personally, I very much appreciate there being an explicit chain of custody from the crate root to everything that is included in the build. It doesn't really feel like "multiple sources of truth" to me -- there's only a single source of truth, and it is the .rs files themselves. The file system is merely the substrate that holds the data.
use isn't the import though, mod is. Even those languages still have some form of explicit imports, don't they?
If people are bothered by mod+use, I wouldn't mind some combined syntax that does both of these or so. But the acts of bringing an item into the namespace, and of including a module in compilation, are just fundamentally different -- including a module can have effects even when nothing is added to the namespace, e.g. via traits. I think we are better off trying to teach people about this rather than trying to paper over misunderstandings until they inevitably surface somewhere else later.
[I'm going to be offline all of next week so this is probably going to be my last message in this thread for a while.]
@Vorpal I'm interested in different evidence regarding what you're saying than what @yigal100 is saying.
As I understand it, what you (@Vorpal) are saying is that you have experience with large codebases that don't involve a lot of conditional compilation, and in that context a feature along the lines of "treat every .rs file in this directory as a submodule" is useful and reduces errors. I'm contrasting this with my experience of large codebases that do involve a lot of conditional compilation, in which context that feature is liable to introduce errors. But both of our experiences are with C(++), not Rust, so neither of us can be sure it all carries over.
I think the best way to move this discussion forward constructively would be with some kind of feasibility study. Suppose we do introduce mod *; to Rust. Alternatively, suppose we introduce a rule that if foo is not already in scope then use foo::<anything> implies mod foo; right above it. Either way, pencil out an implementation, then pick two big hairy Rust projects, one with and one without a lot of conditional compilation (the stdlib comes to mind for "with"), and attempt to convert them to use of the new feature, and report on what problems are encountered.
Now, @yigal100 made a specific claim that evidence already exists that demonstrates that "the current setup is confusing and a stumbling block for people who want to adopt Rust." I don't know what they were thinking of, but I think it's entirely fair for me to ask them to present that evidence, rather than just asserting that it exists.
If I were to set out to prove or disprove that explicit mod statements are confusing and a stumbling block, I would treat it as an exercise in CS education research. I would widen the research question to "What aspects of the Rust language are confusing stumbling blocks for people, and at what levels of understanding of the language?", because going in with preconceived notions of what's a stumbling block is a great way to miss something important. I'd find a bunch of people who teach Rust regularly, and poll them for what they think are the stumbling blocks. I wouldn't mention any specific aspects of the language to begin with, I'd just ask for a list and for how they know. Then I would put together some sort of standardized test of understanding, including all the things that the teachers mentioned and maybe some more besides, and administer it to a lot of people at all levels of experience with Rust, and see what came out.
I'm emphatically NOT claiming that large and complex code bases that rely havily on conditional compilation should combine that with a mod *; approach. By all means, this is when being more explicit may be warranted.
This is analogous to the difference between someone commuting to work and just want to put the car into drive mode and not think about it vs. a formula race driver who absolutely must use a manual gearbox for total precision and control of the vehicle.