Request for an RFC: pimpl for Rust

What is pimpl? It's a C++ idiom for drastically improving compilation speed of large projects: https://en.cppreference.com/w/cpp/language/pimpl.

I believe that Rust doesn't have a direct analogue of this idiom, but that it really needs one. I am actually quite surprised that I have never noticed it being brought up during compilation time discussions. However, I am not an expert in this area, so I can only request someone more knowledgeable to write an RFC :slight_smile:

Let me try to explain a problem, and possible solutions in Rust and C++.

Let's say we are writing a compiler, and hacking on a lexer! Lexer is the lowest level of the compiler pipeline:

lexer <- parser <- type checker <- emitter <- CLI

If we make each of the phases a separate compilation unit, touching a lexer would lead to the recompilation of the whole world. (I have totally though this all up while staring at the "Building ... 153/157: rustc" line after modifying a single line in lexer).

We can do slightly better in Rust if we are willing to jump through a number of hops. First, we split lexer crate into lexer_api, which contains an object-safe trait Lexer, and lexer, which depneds on lexer_api and provides a concrete implementation. Then, we make sure that parser only depnds on lexer_api, and uses dyn Lexer internally. Finally, in the top-level CLI trait we tie the knot and inject the concrete lexer from lexer into the pipeline.

lexer<---------------------------------------------+
  |    +-------------------------------------------|
  |    |                                           |
  v    V                                           |
lexer_api <- parser <- type checker <- emitter <- CLI

With this setup, changes to lexer only affect the CLI. However, this is not a great solution, especially in comparison with C++ one, which I'll show in a second.

The biggest problem is that, while we compile parser separately from lexer, we can't actually use it until we tie the knot! So, for example, parser tests would still need to be recompiled. Another problem is that we need a lot of ceremony here: duplicating crates is no fun.

So, how this would look in C++? In C++, one would have a lexer.h and lexer.cc. A simple version would look like this:

// lexer.h
struct lexer {
    lexer(std::string src);

    std::optional<Token> next_token();

private:
    std::string src_;
    std::size_t pos_;
};

// lexer.cc
#include "lexer.h"

lexer::lexer(std::string src)
    : src_(std::move(src))
    , pos_(0)
    {}

std::optional<Token> lexer::next_token() {
    // hack hack hack
}

Now, crucially, the parser would depend only on the lexer.h and not on the lexer.cc. So, for example, if we want to add a new field to the lexer, we need to change lexer.h and recompile the parser.

However, the setup could be improved with a pimpl idiom:

// lexer.h

// forward declaration, the crux of the pattern
struct lexer_impl;

struct lexer {
    lexer(std::string src);

    std::optional<Token> next_token();

private:
    std::unique_ptr<lexer_impl> pimpl_;
};

// lexer.cc
#include "lexer.h"

struct lexer_impl {
    std::string src;
    std::size_t pos;

    lexer_impl(std::string src)
        : src_(std::move(src))
        , pos_(0)
        {}

    std::optional<Token> lexer::next_token() {
        // hack hack hack
    }
};

lexer::lexer(std::string src)
    : pimpl_(std::make_unique<lexer_impl>(std::move(src)))
    {}

std::optional<Token> lexer::next_token() {
    return this->pimpl_.next_token();
}

Now, we don't specify the number of fields (or any other implementation details of lexer) in the .h file, so we can change lexer_impl as we like, and none of the other stages of the compiler would be recompiled.

Note that, unlike with the lexer_api crate, we still have only one compilation unit for the lexer. We also don't need to tie the knot: parser tests could just construct the lexer, they need only .h file for this! In other words, the knot is tied by the linker in the end, and not by the compiler.

I would say that this setup seems like a massive improvement over Rust, both in terms of ergonomics (single compilation unit!) and in terms of build scalability (we don't need to defer lexer creation until the top-level compilation unit).

Now, it could be argued that a sufficiently advanced incremental compiler will find pimpls for us automatically, so we are all set! I don't think it's that easy: note how in C++ case it's the build system that notices that there's no dependency between parser and lexer.cc, so the build is simply not invoked for many compilation units. Additionally, because of this the build can be easily distributed across many machines, use binary caches, etc. In other words, C++ pimpl works with separate compilation, while incremental compilation sort-of doesn't: it's about putting all the puzzle pieces into the single place. Additionally, I believe that it's important to give a programmer manual control over compilation firewalls. Specifically, the programmer should notice immediately if a change to code pokes a pimpl barrier (compare with auto-vectorization vs explicit SIMD).

Here's my proposed "design" for Rust pimpl (this really should be written by someone who can codegen).

  • Define type as public_abi if:
    • it is pub
    • it is a non-pointer member of a public_abi type

Specifically, pub struct S { pimpl: Box<A> } alone does not make A a public_abi, while pub struct S { pimpl: Option<A> } does.

Introduce a #[private_abi] attribute, which is applicable to types, and which emits a warning if a type turns out to be public_abi. Additionally, #[private_abi] could be applied to a module, which is a shorthand for marking all of the types in the module and its submodules.

When producing .rlib or .rmeta files, include a hash of all public_api types into it. Teach cargo to not rebulid a crate if no dependency changed this hash. Make sure that this actually leads to linkable crates :slight_smile:

With this setup, the original lexer problem would be solved like this, without introducing any additional crates:

pub struct Lexer(Box<pimpl::Lexer>);

impl Lexer {
    pub fn new(src: String) -> Lexer {
        Lexer(Box::new(pimpl::Lexer::new(src)))
    }

    pub fn next_token(&mut self) -> Option<Token> {
        self.0.next_token()
    }
}

#[private_abi]
mod pimpl {
    pub(crate) Lexer {
        src: String,
        pos: usize,
    }

    impl Lexer {
        pub(crate) fn new(src: String) -> Lexer {
            Lexer { src, pos: 0 }
        }

        pub(crate) fn next_token(&mut self) -> Option<Token> {
            // hack hack hack
        }
    }

}

See also Compilation firewalls in Rust

5 Likes

Note: request for request pun in the title is not intended :sweat:

1 Like

This still seems like it ought to be much easier to achieve by improvements to rustc's incremental compilation than with a dedicated core language feature. I know there's a paragraph about that in your post, but AFAIK the fact that C++'s pimpl relies on separate compilation instead of incremental compilation is more a historical accident than a design decision with any technical reasoning behind it. Or in other words, "incremental compilation" in C++ is done by the build system, but only because it's never been done by the compiler (though that might change with C++ modules?) and it was really easy for everyone to hack up their own per-file incr. comp. using filesystem timestamps.

But I don't know the details of Rust incr. comp. either. Reading https://github.com/rust-lang/rust/issues/47660 gives me the impression that we're also doing per-file incr. comp., and anything finer-grained would be https://github.com/rust-lang/rust/issues/48211. Could anyone confirm that? And even if that's true, I would have already expected making Lexer.rs and LexerPimpl.rs separate files (instead of separate crates) to be enough to make Rust's incr comp not recompile all Lexer users when the pimpl changes (assuming it's a true pimpl, i.e. not referenced in any way besides non-generic calls from Lexer.rs going through a runtime indirection). Does anyone know for sure that it isn't already doing that, or how we could test that, and if not what's preventing it?

7 Likes

I feel like the motivation extends beyond types, to even free functions:

#[inline(never)]
pub fn really_expensive_algorithm(a: &[f64], b: &[f64]) -> Vec<f64> {
    ...
}

one day I would like for it to be possible that modifying this monomorphic, #[inline(never)] function (or anything else that is only used by that function) does not require a recompilation of all downstream crates; just a relinking of the final binaries.

7 Likes

I think the rule for functions is simpler though: function affects ABI iff it is public. A type can affect ABI even if it is private.

As I understand it, the benefits of the pimpl pattern are twofold, with the second being an unintended but useful side effect:

  • ABI stability in the face of private changes. The API is just the .h, which is just the forward declaration of the pimpl type, the API/ABI type which is just a pointer to some unknown (but we promise it exists) blob, and definitions of functions that just delegate to that same blog (we promise they're there).
  • Compilation speed of downstream users. This falls out of the above, as they only care about the thin public wrapper, and we provide the precompiled binary blob it links to. This speed also benefits hugely from the default of dynamic linking.

One small but notable issue for Rust here: we have ?Sized types, so a pointer to the pimpl type is not a guaranteed size. It might be *const (), or it might be (*const (), PointerMetadata). So the sizedness of the pimpl type is also ABI-public.

What this proposal boils down to is identifying the "ABI public" part of a crate, and only recompiling downstream users when the upstream ABI changes. This doesn't have to be a public change, it can just be a change in what rmeta is emitted and skipping downstream recompilation (of debug binaries only?) if the rmeta is only trivially different (i.e. just metadata-metadata such as version, hash, and such; not actual ABI information).

Currently, downstream compilation units are recompiled even if all you do is touch the upstream crate. We obviously can do a little better, even if it's not full on making the pimpl pattern a "viable" downstream incremental boost. Once we have some of this unchanged-ABI downstream compilation reuse, it might make sense to then add more controls for making sure private types really are ABI private as well as API private.

The real advantage from pimpl is the ABI stability, meaning you can just relink against a changed upstream binary. Any discussion has to be careful about what ABI guarantees are being made, and what is compiler-private, what is guaranteed for the outside world.

I'm not saying we can't do it, just making sure the details are acknowledged. I'd love for things to get better along this axis.

And if it helps us get towards a viable #[repr(C)] struct Foo { pimpl: Box<FooImpl> } path for stable ABI, even better!

4 Likes

Nitpick: The pimpl pattern per se is suboptimal and is basically a hack to work around a limitation of C++. Specifically:

  • It's suboptimal from a performance perspective because there's an unnecessary double indirection. The caller of next_token passes a pointer to lexer as this, and the lexer in turn contains a pointer to the lexer_impl.
  • It's suboptimal from an ergonomics perspective because you need to define two structs and write a wrapper for each method.

The limitation in question is that C++ requires you to put a class's methods and its fields (including private ones) in the same class definition. In C, there are no methods, but you can write:

struct lexer;
struct lexer *lexer_new();
token lexer_next_token(struct lexer *lexer);

This uses only one struct and no wrapper functions. And there's no double indirection: the caller passes a pointer directly to the implementation struct.

Users of the header don't see the fields of struct lexer. They don't know its size, so they can't allocate lexers on the stack or evaluate sizeof(struct lexer) But they can still reason about pointers to struct lexer, and call functions like lexer_new that do the allocation for them. This is somewhat similar to the concept of an unsized type in Rust.

Translating this into Rust, perhaps you could write something like

#[publicly_unsized]
struct Lexer {
    // fields...
}

For code outside the crate, the compiler would refuse to establish Lexer: Sized, so you could not call size_of::<Lexer>() or pass Lexer to generic type parameters that require Sized. You might also be forbidden from passing or returning Lexer by value, as is the case for the C equivalent, although it might be possible to allow it as part of unsized_locals (it would translate to a dynamically-sized stack allocation).

In return, this would guarantee that code generation in client crates does not depend on the layout of Lexer (except for the offsets of public fields, if any are present), so private fields could be arbitrarily added, removed, or changed without requiring them to be recompiled.

This would be a superset of the functionality proposed in the original post. If, say, you wanted to expose a sized type so client crates wouldn't have to care about unsized types, you could still have a wrapper struct containing a pointer to an inner "impl" struct; you would just mark the inner struct as #[publicly_unsized].

Another alternative

We could just bite the bullet and have the equivalent of header files. Something like:

pub extern struct Lexer;

impl Lexer {
    pub fn new(src: String) -> Box<Lexer>;
    pub(crate) fn next_token(&mut self) -> Option<Token>;
}

...where the layout of Lexer, and the implementations of the methods, would be in separate file.

That would maximize the programmer's control over the compilation firewall, as well as the compiler's ability to perform separate compilation. But... supposedly Rust doesn't need header files. :wink:

20 Likes

In C++, this idiom requires the separation of declarations and definitions for the involved functions and types. As a long-time C++ user myself, I find the argument quite unconvincing that this would be a "massive improvement over Rust in terms of ergonomics"; for me, the exact opposite is true.

The "funny" (insofar as one is willing to describe such hacks as humorous) part is that the PIMPL idiom was created exactly to fight the proliferation of header files and the way they are included and unnecessarily (re-)compiled multiple times upon building each related compilation unit. Since Rust lacks header files, it simply doesn't suffer from the same problem.

Furthermore, in C++, the act of decoupling dependence analysis of the pimpl from the wrapping type (or rather, eradicating it) opens up the possibility for misuse, causing sizes to change accidentally and publicly, and ultimately leading to memory corruption. I believe this could be solved in Rust but that would require a less manual design.

Finally, I believe that codifying an idiom as a language feature should have a significantly higher barrier to entry, as it can fundamentally change the way we write code. As incremental compilation in Rust is nowhere near a solved problem, we should instead focus on improving it and making dependence analysis transparently smarter instead.

20 Likes

What are you talking about? There is nothing non-type-safe about pimpl.

It would be nice. Will it ever happen, though?

There's also the goal of increased parallelism for clean builds. In theory this can also be done transparently, but it's a separate feature from incremental compilation, and seemingly even more distant.

Personally, I'd like an explicit compilation firewall not only as an optimization, but also as a first step toward that forbidden phrase, "stable ABI". Yes, yes, I know.

2 Likes

See also, wherein I sketch what a minimal "private impl" "stable ABI" could look like:

This sounds very much like the approved extern type stuff; I think one could just use that (albeit unsafely) to try this out

1 Like

Indeed, it is similar – no coincidence, since extern type is designed to bind to C APIs that use this pattern :slight_smile:

Hmm... One issue with my original suggestion has to do with specialization. I suggested that "the compiler would refuse to establish Lexer: Sized" outside of the crate defining Lexer, but allow it within the crate; call this a "pseudo-unsized type". But suppose that some downstream code defines a trait with a default impl for all types, plus a specialized impl for Sized types. Which impl would be used for Lexer? If it uses the specialized impl, that creates a backdoor way for code to obtain the type's size, which is supposed to be impossible. If it uses the generic impl, you run the risk of incoherence.

Possible solutions:

  1. Arguably the simplest solution is to treat Sized bounds like lifetime bounds, where specialized impls are "not supposed to" be able to test for such bounds.

    Currently, specialized impls can test for lifetime bounds at the cost of unsoundness, which is preventing specialization from being stabilized. There have been some ideas for how to fix this, but the exact approach is yet to be determined, and it doesn't seem to be a priority at present. Without knowing the approach, I can't judge the feasibility of extending it from lifetime bounds to Sized bounds. It wouldn't be necessary to implement this before implementing "pseudo-unsized types", since specialization is already unsound and we wouldn't be making it much worse. But it would be necessary to know that it can be implemented down the line; we don't want to paint ourselves into a corner, design-wise.

  2. The next simplest solution is to just treat Lexer as !Sized everywhere. After all, once unsized_locals is fully implemented, !Sized won't be that much of a hindrance: you'll be able to create, pass and return unsized types by value, and APIs like Box::new will be able to drop their Sized bounds. As I said before, unsized_locals normally involves a dynamically-sized stack allocation, but inside the defining crate it could be optimized into a normal statically-sized allocation. The downside is that the defining crate couldn't use things like Vec<Lexer>, which depend on all instances of the type having the same size (a property that holds here, but not for dynamically-sized types in general).

  3. Another approach would be some mechanism to create a wrapper type, like (spitball)

    opaque type Lexer = LexerImpl;
    

    where Lexer and LexerImpl would be different types from a type system perspective, but &Lexer and &LexerImpl could coerce to each other, and maybe there could be some magic to forward trait impls and inherent methods between one and the other. If you think about it, there's a lot of resemblance to how trait objects work, so it wouldn't be completely unprecedented. Still, it's unergonomic and confusing to have to deal with two different types, even with magic.

3 Likes

extern type essentially imports an opaque (unsized & FFI-safe) type for use in the current compilation module. This is complementary to this discussion - how to export an opaque type (same restrictions apply).

Based on common best practices in C/C++ where this kind of situation is usually handled by a typedef (especially in C), why can't Rust simply follow suit and allow the same pattern with a public type alias to a private type? This is currently frowned upon (private-in-public is warn by default) but this seems like a valid use case for such a pattern.

In other words, we could allow:

pub(crate) struct FooImpl {...} // internal type details
pub type Foo = FooImpl; // external opaque type (same restrictions as extern type)

This will be used in concert with the extern type syntax to import Foo in client crate.

1 Like

Not sure about that. From what I've seen, most C and C++ libraries do this by relying on forward declaring structs/classes in header files. The header may then add a typedef for that struct, but it's "just" a typedef – both the typedeffed and original names are equally usable in both the library implementation and in clients.

But perhaps you're referring to a pattern like the one LLVM's C API uses, where the publicly-exposed types are a lie:

typedef struct LLVMOpaqueModule *LLVMModuleRef;

In this case, struct LLVMOpaqueModule is not actually defined anywhere. Instead, LLVMModuleRef pointers actually point to instances of the class llvm::Module, and all C API functions which accept LLVMModuleRefs have to cast them to llvm::Module * before using them. I believe LLVM uses this approach because C does not support namespaces.

Even here, though, the use of a typedef is merely incidental. struct LLVMOpaqueModule * and LLVMModuleRef are equivalent types, and the "real" type, llvm::Module *, doesn't appear in the header at all.

Well, I'd say that a public type alias to a private type doesn't naturally imply the desired semantics, because trait impls (such as Sized or any other impls that the implementation crate wants to hide from the client) don't have visibility. I guess they could be treated as a special case, and it wouldn't be a breaking change since such aliases are currently forbidden; still, it seems to me more like a different feature which should have different syntax.

Ditto here – extern type already does something different from what's needed. At least, client crates shouldn't need to know whether a struct is opaque or not and thus shouldn't need anything but use. On the other hand, if we create a header-file-like construct (either a separate file within a crate, or a separate crate), it would need some syntax to 'forward declare' a struct (in another post I used extern struct as an example). I suppose extern type could be repurposed for that, but I think we want at least slightly different semantics from what extern type currently does.

Wouldn't trait objects allow to achieve the same separation of the interface from the implementation which would result in separation if compilation units? A special type could be introduced to use dyn Trait in dev mode and ImplTrait in production.

Trait objects are one possible workaround (rustc uses them in some places as such), but they're somewhat limited. For example, in the following example, both methods could be supported for opaque types but are not object safe:

impl Foo {
    fn new() -> Box<Foo>;
    fn takes_another(foo: &Foo);
}

Also, compilation firewalls are useful to speed up even release builds.

1 Like