What is pimpl? It's a C++ idiom for drastically improving compilation speed of large projects: https://en.cppreference.com/w/cpp/language/pimpl.
I believe that Rust doesn't have a direct analogue of this idiom, but that it really needs one. I am actually quite surprised that I have never noticed it being brought up during compilation time discussions. However, I am not an expert in this area, so I can only request someone more knowledgeable to write an RFC
Let me try to explain a problem, and possible solutions in Rust and C++.
Let's say we are writing a compiler, and hacking on a lexer! Lexer is the lowest level of the compiler pipeline:
lexer <- parser <- type checker <- emitter <- CLI
If we make each of the phases a separate compilation unit, touching a lexer would lead to the recompilation of the whole world. (I have totally though this all up while staring at the "Building ... 153/157: rustc" line after modifying a single line in lexer).
We can do slightly better in Rust if we are willing to jump through a number of hops.
First, we split lexer
crate into lexer_api
, which contains an object-safe trait Lexer
, and lexer
, which depneds on lexer_api
and provides a concrete implementation.
Then, we make sure that parser only depnds on lexer_api
, and uses dyn Lexer
internally.
Finally, in the top-level CLI trait we tie the knot and inject the concrete lexer from lexer
into the pipeline.
lexer<---------------------------------------------+
| +-------------------------------------------|
| | |
v V |
lexer_api <- parser <- type checker <- emitter <- CLI
With this setup, changes to lexer
only affect the CLI.
However, this is not a great solution, especially in comparison with C++ one, which I'll show in a second.
The biggest problem is that, while we compile parser
separately from lexer
, we can't actually use it until we tie the knot!
So, for example, parser tests would still need to be recompiled.
Another problem is that we need a lot of ceremony here: duplicating crates is no fun.
So, how this would look in C++? In C++, one would have a lexer.h
and lexer.cc
.
A simple version would look like this:
// lexer.h
struct lexer {
lexer(std::string src);
std::optional<Token> next_token();
private:
std::string src_;
std::size_t pos_;
};
// lexer.cc
#include "lexer.h"
lexer::lexer(std::string src)
: src_(std::move(src))
, pos_(0)
{}
std::optional<Token> lexer::next_token() {
// hack hack hack
}
Now, crucially, the parser
would depend only on the lexer.h
and not on the lexer.cc
.
So, for example, if we want to add a new field to the lexer
, we need to change lexer.h
and recompile the parser
.
However, the setup could be improved with a pimpl idiom:
// lexer.h
// forward declaration, the crux of the pattern
struct lexer_impl;
struct lexer {
lexer(std::string src);
std::optional<Token> next_token();
private:
std::unique_ptr<lexer_impl> pimpl_;
};
// lexer.cc
#include "lexer.h"
struct lexer_impl {
std::string src;
std::size_t pos;
lexer_impl(std::string src)
: src_(std::move(src))
, pos_(0)
{}
std::optional<Token> lexer::next_token() {
// hack hack hack
}
};
lexer::lexer(std::string src)
: pimpl_(std::make_unique<lexer_impl>(std::move(src)))
{}
std::optional<Token> lexer::next_token() {
return this->pimpl_.next_token();
}
Now, we don't specify the number of fields (or any other implementation details
of lexer
) in the .h
file, so we can change lexer_impl
as we like, and none
of the other stages of the compiler would be recompiled.
Note that, unlike with the lexer_api
crate, we still have only one compilation
unit for the lexer. We also don't need to tie the knot: parser tests could just
construct the lexer
, they need only .h
file for this! In other words, the
knot is tied by the linker in the end, and not by the compiler.
I would say that this setup seems like a massive improvement over Rust, both in terms of ergonomics (single compilation unit!) and in terms of build scalability (we don't need to defer lexer creation until the top-level compilation unit).
Now, it could be argued that a sufficiently advanced incremental compiler will
find pimpls for us automatically, so we are all set! I don't think it's that
easy: note how in C++ case it's the build system that notices that there's no
dependency between parser
and lexer.cc
, so the build is simply not invoked
for many compilation units. Additionally, because of this the build can be
easily distributed across many machines, use binary caches, etc. In other words,
C++ pimpl works with separate compilation, while incremental compilation sort-of
doesn't: it's about putting all the puzzle pieces into the single place.
Additionally, I believe that it's important to give a programmer manual
control over compilation firewalls. Specifically, the programmer should notice
immediately if a change to code pokes a pimpl barrier (compare with
auto-vectorization vs explicit SIMD).
Here's my proposed "design" for Rust pimpl (this really should be written by someone who can codegen).
- Define type as public_abi if:
- it is
pub
- it is a non-pointer member of a public_abi type
- it is
Specifically, pub struct S { pimpl: Box<A> }
alone does not make A
a public_abi, while pub struct S { pimpl: Option<A> }
does.
Introduce a #[private_abi]
attribute, which is applicable to types, and which emits a warning if a type turns out to be public_abi.
Additionally, #[private_abi]
could be applied to a module, which is a shorthand for marking all of the types in the module and its submodules.
When producing .rlib
or .rmeta
files, include a hash of all public_api types into it.
Teach cargo to not rebulid a crate if no dependency changed this hash.
Make sure that this actually leads to linkable crates
With this setup, the original lexer problem would be solved like this, without introducing any additional crates:
pub struct Lexer(Box<pimpl::Lexer>);
impl Lexer {
pub fn new(src: String) -> Lexer {
Lexer(Box::new(pimpl::Lexer::new(src)))
}
pub fn next_token(&mut self) -> Option<Token> {
self.0.next_token()
}
}
#[private_abi]
mod pimpl {
pub(crate) Lexer {
src: String,
pos: usize,
}
impl Lexer {
pub(crate) fn new(src: String) -> Lexer {
Lexer { src, pos: 0 }
}
pub(crate) fn next_token(&mut self) -> Option<Token> {
// hack hack hack
}
}
}
See also Compilation firewalls in Rust