Parallel-friendliness in the Rust compiler

tschuett · October 20, 2022, 9:48am

clang supports the module compilation in two steps:

$ clang++ -std=c++20 -x c++-module Hello.cpp --precompile -o Hello.pcm
$ clang++ -std=c++20 use.cpp -fprebuilt-module-path=. Hello.pcm -o Hello.out
$ ./Hello.out
Hello World!

pcm is the BMI file.

Copied from here llvm-project/StandardCPlusPlusModules.rst at main · llvm/llvm-project · GitHub.

nacaclanga · October 20, 2022, 11:13am

Rust has two types of bmi files, rmeta and rlib. Both can be used to provide metadata during compilation. In contrast to clang's pcm, rmeta cannot be used for linkining, while rlib files can be linked directly. The rlib file contains the rmeta plus the generated object code, hence it takes longer to generate it. rmeta would be the surfaces and rlib the implementations. Notice that you cannot compile surfaces in parallel, as they can reexport other surfaces. (The same issue exists in C++).

In general, you have 3 workflows possible:

Build all rmetas first sequenceially in the dependency chain, then stop., Later build all rlibs in a second call in parallel.
Build the rlibs sequencially in the dependency chain.
Build an rmeta during the process and signal to the build system that it's ready, then continue to generate the rlib. The build system can start building downstream dependencies, ones the rmeta is generated. (This is the strategy currently used in cargo)

I don't know where pcm files sit there but I suspect that they are somewhat closer to rlib files, limiting you to the second strategy.

tschuett · October 20, 2022, 3:06pm

Thanks. I am unsure about reexport. ISO C++ modules also has export import foo. Or does it just introduce a new dependency for all consumers of my module on foo?

pcm resp. BMI files are more like rmeta files. They contain an AST.

rpjohnst · October 20, 2022, 7:13pm

It's about more than just re-exports. In C++ you can't even parse a module without the BMIs for its imports; in Rust you can parse but you can't typecheck. So the dependency graph imposes ordering constraints on building BMIs/rmetas in both languages.

I'm not familiar with what C++ build systems do today, but C++ compilers tend to support building a BMIs first as a separate step to building an object file. This means workflow 1 and 2 are both feasible. Workflow 3 is trickier because it involves a single compiler invocation producing multiple artifacts at different times, which is not something build systems have traditionally supported, but the C++ language itself doesn't prevent it.

tschuett · October 20, 2022, 7:25pm

For ISO C++ modules, you can parse a file and generate an AST without the neighbouring BMIs. I have shown this above:

clang++ -std=c++20 -x c++-module Hello.cpp --precompile -o Hello.pcm

The precompile generates a pcm resp. BMI file. For codegen you need the dependent BMIs.

mathstuf · October 20, 2022, 7:55pm

While true, GCC and MSVC do not. Whether this is actually faster in practice requires data that is not available because Real World™ projects don't support modules because build systems don't support modules because compilers don't support extracting dependencies reliably^[1]. Once benchmarks can be done on the both-at-once versus the separate-compilation, we can see what matters more. Note that remote execution and process launch time are not negligible once the work units get small enough, so it might not be one answer for all cases either.

Yes you can (well, at least as far as module dependency discovery needs; if this wasn't possible, building ISO C++ would be an unsolvable problem). This is 100% because modules cannot export macros that can influence future #if conditions. Header units are a whole different beast, but at least (handwaving) their contents can largely be determined from the header file itself.

Note that it is also unsolvable if there is a __has_module builtin preprocessor-time macro like __has_include.

Of the 3 implementations, only one supports BMI creation separate from object file creation.

Note that there are tools like build2 which play a much more active role in the build where they can communicate with the compiler directly about what is needed and provided in various places. However, this has the problem that the number of compilers you can have launched to discover that one module everyone (assuming it even exists) is waiting for can be unbounded. Not everyone has the RAM for that style of build execution. ↩︎

mathstuf · October 20, 2022, 8:00pm

It depends on the implementation. GCC copies the contents into the BMI so that only directly consumed BMIs need to be known about, but MSVC needs to know where the transitive closure of imported modules (re-export'd or not) exist when consuming any module.

For example, when compiling C, it needs to be told where A and B are despite only knowing about B directly:

a.cppm: export module A;
b.cppm: export module B; import A; (needs a newline in actual source code, but meh for Discourse)
c.cpp: import B;

tschuett · October 20, 2022, 8:07pm

At least the Apple guys confirmed that clang-scan-deps will get support for ISO C++ modules. There will be a tool to find the dependency graph.

mathstuf · October 20, 2022, 8:30pm

I'm well aware of clang-scan-deps. I'm suspicious of its applicability outside of clang emulation however. Basically, what is it going to report for this code:

#ifdef _MSC_VER
import msvc_specific_stuff;
#endif

I don't believe there's going to be a single tool that is suitable for this unless C++ ends up going to a compiler mono-culture (I put those odds at…0%).

Just to note, I'm an author of the dependency spec that is being worked on for communicating these things with build systems. Also note that you need to run it during the build in the general case; you cannot just scan and commit it to the source tree.

tschuett · October 20, 2022, 8:57pm

I know!

I found this on discourse. An Apple guy said that they will support ISO C++ modules:

It depends on what is in your compile_command.json file! If there is -D_MSC_VER?

mathstuf · October 20, 2022, 9:15pm

_MSC_VER is intrinsic to the MSVC compiler and contains its version number as its value (if defined). I really don't think clang-scan-deps is going to be able to answer __has_builtin questions for arbitrary compilers. Even if it does, clang-scan-deps will need to be strictly newer than your compiler because it can't guess the future at all. I don't think it's a viable solution for scanning other than as an alternative to the clang compiler it ships with that you'd be using anyway.

Yes, I meet with these folks regularly to discuss C++ modules implementation progress .

rpjohnst · October 20, 2022, 10:31pm

This is simply not true- that example does not have any imports and so of course it does not need any of them to be present for parsing. But in general, if you don't have the symbols from imports, you can't parse C++- this is just a general property of the language and applies to #include as well.

Right, you can uncover the dependency graph, but you can't actually parse the module and produce a BMI until you've built BMIs for its dependencies. Dependency discovery had to be carefully designed to support this, because the rest of the language grammar does not.

MSVC also supports separate BMI creation with /ifcOnly.

tschuett · October 21, 2022, 12:53am

I disagree! Hello.cpp:

export foo;
import bar;
// foo function 
//main and more

If I run:

clang++ -std=c++20 -x c++-module Hello.cpp --precompile -o Hello.pcm

over Hello.cpp, then export foo and the foo function decl will be in the pcm file. But I need the module bar for codegen.

rpjohnst · October 21, 2022, 5:32am

Have you tried running that command? It doesn't work: Compiler Explorer

C++ cannot be parsed without knowledge of the declarations in scope. The standard examples are things like x * y (if x is imported from bar, you have to know the contents of bar to know whether to parse that as a declaration or expression), or template metaprogramming (because "declaration or expression?" can depend on arbitrary template instantiation, so again you have to know the contents of bar).

tschuett · October 21, 2022, 8:27am

Ok. You got me. If you look at the second and larger example:

github.com

llvm/llvm-project/blob/main/clang/docs/StandardCPlusPlusModules.rst#quick-start

====================
Standard C++ Modules
====================

.. contents::
   :local:

Introduction
============

The term ``modules`` has a lot of meanings. For the users of Clang, modules may
refer to ``Objective-C Modules``, ``Clang C++ Modules`` (or ``Clang Header Modules``,
etc.) or ``Standard C++ Modules``. The implementation of all these kinds of modules in Clang
has a lot of shared code, but from the perspective of users, their semantics and
command line interfaces are very different. This document focuses on
an introduction of how to use standard C++ modules in Clang.

There is already a detailed document about `Clang modules <Modules.html>`_, it
should be helpful to read `Clang modules <Modules.html>`_ if you want to know
more about the general idea of modules. Since standard C++ modules have different semantics

This file has been truncated. show original

There are -fprebuilt-module-path=.sneaked in. There are even dependencies for BMIs

mathstuf · October 21, 2022, 11:03am

Indeed. The rules around macro expansion not applying to the module-related keywords, disabling of line continuation, and other minor exceptions are there for dep scanning.

Oh, I had missed that. Does it support ifc to obj? Or does that need from-source again (MSVC would still need the transitive import ifc set, so still bad for distributed compiles)?

rpjohnst · October 21, 2022, 6:33pm

From what I understand, MSVC does not support IFC -> OBJ, as IFCs only include the bodies of inline functions rather than everything needed to generate code. The IFCs on disk also still depend on each other in the same way as the source files, to avoid duplicating imported declarations.

Does Clang support distributing the PCM->object stage without the transitive import PCM set? Or do you still need those, just not the source?

mathstuf · October 25, 2022, 3:18pm

I'll let you know when I get a clang that outputs the information needed for CMake to work .

system · January 23, 2023, 3:18pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Crate dependency discovery compiler	24	3947	May 14, 2020
Towards a second edition of the compiler compiler	38	19064	March 25, 2019
Rust Compiler Performance Working Group announcements	44	12704	March 25, 2019
Next steps for reducing overall compilation time compiler	24	8688	March 25, 2019
[pre-RFC] Generate "headers" for greater parallelism	8	2335	March 25, 2019

Parallel-friendliness in the Rust compiler

Related topics