Rust has two types of bmi files, rmeta and rlib. Both can be used to provide metadata during compilation. In contrast to clang's pcm, rmeta cannot be used for linkining, while rlib files can be linked directly. The rlib file contains the rmeta plus the generated object code, hence it takes longer to generate it.
rmeta would be the surfaces and rlib the implementations. Notice that you cannot compile surfaces in parallel, as they can reexport other surfaces. (The same issue exists in C++).
In general, you have 3 workflows possible:
Build all rmetas first sequenceially in the dependency chain, then stop., Later build all rlibs in a second call in parallel.
Build the rlibs sequencially in the dependency chain.
Build an rmeta during the process and signal to the build system that it's ready, then continue to generate the rlib. The build system can start building downstream dependencies, ones the rmeta is generated. (This is the strategy currently used in cargo)
I don't know where pcm files sit there but I suspect that they are somewhat closer to rlib files, limiting you to the second strategy.
Thanks. I am unsure about reexport. ISO C++ modules also has export import foo.
Or does it just introduce a new dependency for all consumers of my module on foo?
pcm resp. BMI files are more like rmeta files. They contain an AST.
It's about more than just re-exports. In C++ you can't even parse a module without the BMIs for its imports; in Rust you can parse but you can't typecheck. So the dependency graph imposes ordering constraints on building BMIs/rmetas in both languages.
I'm not familiar with what C++ build systems do today, but C++ compilers tend to support building a BMIs first as a separate step to building an object file. This means workflow 1 and 2 are both feasible. Workflow 3 is trickier because it involves a single compiler invocation producing multiple artifacts at different times, which is not something build systems have traditionally supported, but the C++ language itself doesn't prevent it.
While true, GCC and MSVC do not. Whether this is actually faster in practice requires data that is not available because Real World™ projects don't support modules because build systems don't support modules because compilers don't support extracting dependencies reliably[1]. Once benchmarks can be done on the both-at-once versus the separate-compilation, we can see what matters more. Note that remote execution and process launch time are not negligible once the work units get small enough, so it might not be one answer for all cases either.
Yes you can (well, at least as far as module dependency discovery needs; if this wasn't possible, building ISO C++ would be an unsolvable problem). This is 100% because modules cannot export macros that can influence future #if conditions. Header units are a whole different beast, but at least (handwaving) their contents can largely be determined from the header file itself.
Note that it is also unsolvable if there is a __has_module builtin preprocessor-time macro like __has_include.
Of the 3 implementations, only one supports BMI creation separate from object file creation.
Note that there are tools like build2 which play a much more active role in the build where they can communicate with the compiler directly about what is needed and provided in various places. However, this has the problem that the number of compilers you can have launched to discover that one module everyone (assuming it even exists) is waiting for can be unbounded. Not everyone has the RAM for that style of build execution. ↩︎
It depends on the implementation. GCC copies the contents into the BMI so that only directly consumed BMIs need to be known about, but MSVC needs to know where the transitive closure of imported modules (re-export'd or not) exist when consuming any module.
For example, when compiling C, it needs to be told where A and B are despite only knowing about B directly:
a.cppm: export module A;
b.cppm: export module B; import A; (needs a newline in actual source code, but meh for Discourse)
I'm well aware of clang-scan-deps. I'm suspicious of its applicability outside of clang emulation however. Basically, what is it going to report for this code:
I don't believe there's going to be a single tool that is suitable for this unless C++ ends up going to a compiler mono-culture (I put those odds at…0%).
Just to note, I'm an author of the dependency spec that is being worked on for communicating these things with build systems. Also note that you need to run it during the build in the general case; you cannot just scan and commit it to the source tree.
_MSC_VER is intrinsic to the MSVC compiler and contains its version number as its value (if defined). I really don't think clang-scan-deps is going to be able to answer __has_builtin questions for arbitrary compilers. Even if it does, clang-scan-deps will need to be strictly newer than your compiler because it can't guess the future at all. I don't think it's a viable solution for scanning other than as an alternative to the clang compiler it ships with that you'd be using anyway.
Yes, I meet with these folks regularly to discuss C++ modules implementation progress .
This is simply not true- that example does not have any imports and so of course it does not need any of them to be present for parsing. But in general, if you don't have the symbols from imports, you can't parse C++- this is just a general property of the language and applies to #include as well.
Right, you can uncover the dependency graph, but you can't actually parse the module and produce a BMI until you've built BMIs for its dependencies. Dependency discovery had to be carefully designed to support this, because the rest of the language grammar does not.
MSVC also supports separate BMI creation with /ifcOnly.
Have you tried running that command? It doesn't work: Compiler Explorer
C++ cannot be parsed without knowledge of the declarations in scope. The standard examples are things like x * y (if x is imported from bar, you have to know the contents of bar to know whether to parse that as a declaration or expression), or template metaprogramming (because "declaration or expression?" can depend on arbitrary template instantiation, so again you have to know the contents of bar).
Indeed. The rules around macro expansion not applying to the module-related keywords, disabling of line continuation, and other minor exceptions are there for dep scanning.
Oh, I had missed that. Does it support ifc to obj? Or does that need from-source again (MSVC would still need the transitive import ifc set, so still bad for distributed compiles)?
From what I understand, MSVC does not support IFC -> OBJ, as IFCs only include the bodies of inline functions rather than everything needed to generate code. The IFCs on disk also still depend on each other in the same way as the source files, to avoid duplicating imported declarations.
Does Clang support distributing the PCM->object stage without the transitive import PCM set? Or do you still need those, just not the source?