Prevent source parse with proper bytecode

Everytime, when a Cargo project is first built, all dependencies have to be compiled from source to an internal format. Then subsequent builds don't need dependencies built again; aka. incremental compilation.

What if Rust supported its own bytecode format, which retains all the language semantics, allowing Cargo to skip building dependencies by sharing bytecode to crates.io? For example, different from LLVM bitcode or WebAssembly, the Rust's bytecode format would store conditional compilation attributes (#[cfg]), env!(...), features, macros (including their inner concat!, env!, include!, include_bytes! etc.), module items... everything from Rust. Then the program build phase can import and export bytecode.

So supporting Rust's proper bytecode would speed up the first build phase. I'm not sure how much that'd benefit, because in some ways the bytecode has to be verified like the sources... but there are many differences... like, it'd avoid parsing the Rust syntax, would strip indent characters and more... (maybe would retain documentation comments for use by IDEs and RustDoc).

build.rs

If a crate has a build.rs, its build.rs should execute before its bytecode is reused. The bytecode must be able to reuse build.rs artifacts... so include! and several other macros have to stay unresolved at the bytecode level.

When macros clearly don't rely on build.rs artifacts, they can be resolved at the bytecode level...

This doesn't prevent compilation exactly

Dependency crates are still incrementally compiled, except their source isn't parsed. Only build.rs is parsed, maybe.

Other uses

Importing and exporting a proper bytecode has an advantage. It allows manipulating Rust programs in other ways, including allowing it to be embedded in other software without low-level WebAssembly or LLVM bitcode.

The main point of this idea about preventing source recompilation isn't clear whether it's worthy; it might improve parsing speed if the bytecode format is more compact and easier to parse than the Rust syntax.

The parsing phase does not take much time, testing on tokio (75kLoC, 35kSLoC):

> cargo rustc -- -Ztime-passes
time:   0.001; rss:   40MB ->   43MB (   +3MB)  parse_crate
time:   0.106; rss:   48MB ->   97MB (  +49MB)  expand_crate
time:   0.106; rss:   45MB ->   97MB (  +52MB)  macro_expand_crate

Maybe the first two steps could be done before publishing, I'm not entirely sure what expand_crate does, the third step is already going to be applying conditional compilation so must be delayed till actual build. So that's either 1ms or 107ms that could be saved (less whatever the load time for the custom serialization is).

3 Likes

expand_crate is a subpass of macro_expand_crate. That is the code structure is roughly sess.time("macro_expand_crate", || { /* stuff */ sess.time("expand_crate", expand_crate); /* stuff */ }); where both /* stuff */ are very quick.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.