More granual `build.rs` companion: `links.rs`

The problem

Click to fold

I'd like to kickstart the discussion around one aspect which makes "cargo check does not compile/codegen-machine-code nor involve linkage" a blatant lie, one which makes it so cargo check/clippy, when invoked by the user, or when invoked by rust-analyzer, is not as fast as it could be:

  1. Every build.rs in the dependency tree MAY generate Rust code which may affect cargo check-ing;

  2. Therefore, every build.rs "build crate" in the dependency tree needs to be:

    • checked (which is sensible),
    • compiled (and linked as a binary),
    • and executed,

    before the main crate of its encompassing package (usually, a [lib] crate) gets to be itself checked and have its "Rust metadata" generated before every downstream dependent thereof gets to be itself, in turn, checked and so on.

  3. Notably, many build.rs scripts out there stem from the "conventional …-sys packages", i.e., from Rust packages wrapping some C library (or some otherwise FFI-bridgeable library) through:

    • a build.rs crate which often, for convenience, compiles from scratch a bundled version of this library, often linking it into a C library, and then emitting linkage directives to Cargo for the final binary artifact to link against such library,
    • (and a [lib] crate shim which declares, in Rust, the C-y signatures embedded within it, so dependent Rust code know how to call into it).

Now, taking a step back and looking at all of this, we can see the inefficiency: we're compiling and executing build.rs scripts so many of them, in turn, compile C libraries, for them to be available to Cargo should some cargo build/run/test/bench occur.

But in the context of a cargo check or cargo clippy or cargo doc, there is no such build/run/test/bench occurring, resulting in:

  • at best, having unnecessarily frontloaded the time and resources involved into compiling part of the final artifacts (e.g., in the case of some cargo check/clippy prior to running some cargo build/run/test/bench);
  • at worst, having done it fully unnecessarily. For instance, rust-analyzer will never run cargo build for its flycheck diagnostics or its code navigation utilities. When using a separate target/ dir for it (as advisable for users wishing to concurrently use cargo … on their own), it means that any such C compilation artifacts have been produced in vain.
    • even worse, sometimes the rust-analyzer environment is somehow not configured well enough for the C compilation to even be able to succeed (e.g., some improper CC or CMAKE env var or whatnot setup). In such a case, rust-analyzer diagnostics will end up tainted with such a failure.

For the remainder of the post, I'll refer to such build.rs scripts as "conceptual links.rs build scripts"

  • for the build.rs doing both Rust code generation and C compilation-and/or-linkage, these could be split, at least conceptually, into its pure code-generating part ("conceptual build.rs"), and its merely compile-and/or-linking part ("conceptual links.rs").

Palliatives / working around it with the current tools

Click to see
  • Magic RUSTC_WRAPPER strings

    For instance, rust-analyzer sets, by default, within its configuration, RUSTC_WRAPPER=rust-analyzer:

    • (Emphasis mine.)

    I imagine this means that conceptual links.rs build scripts willing to be friendly to such a situation would then be expected to have the following bail-out:

    //! "Conceptual links.rs" logic within a build.rs
    
    fn main() {
        if ::std::env::var("RUSTC_WRAPPER").as_deref().is_ok_and(|s| s == "rust-analyzer") {
            eprintln!("\
                `rust-analyzer` environment detected: \
                 skipping compilation/linkage of C dep `libfoo`\
            ");
            return;
        }
        // actual logic here
    }
    

    Whilst it is nice for this to exist, it is a pity for the package author to need to think of doing this. And from this search, it doesn't seem to be done much in the wild.

  • links = "libname" "abuse"

    It turns out that there is one area of Cargo which is links-aware: the package.links Cargo.toml field

    "Conventionally …-sys" crates are expected to declare themselves as such by setting this entry to the name of the C/FFI library they wrap.

    From there, an end workspace (.cargo/config.toml) depending on such a package can opt into "hijacking"/bypassing/circumventing the build.rs of such a package:

    To clarify (because, imho, these two sections are not very well explained by the reference):

    1. Say there is some C library libfoo;

    2. There will probably be some ::foo-sys Rust package wrapping it;

    3. With a build.rs potentially compiling and definitely emitting linkage directives against such a library;

    4. :backhand_index_pointing_right: ::foo-sys would thus be expected to declare some package.links = "libfoo" directive/entry in its Cargo.toml package manifest :backhand_index_pointing_left:

    5. Any user of this package / of its [lib] crate, either a direct user or some transitive downstream dependent, within their own workspace, can then declare to be overriding its (compilation, if any, and) linkage setup for the libfoo C library, by using the following:

      [target.<TARGET_TUPLE>.libfoo]
      rustc-link-lib = ["foo"]  # This is `-lfoo`, to find `libfoo.{so,dylib,a}` on Unix, `foo.{dll,lib}` on Windows.
      rustc-link-search = ["/path/to/foo"]  # This is `-L/path/to/foo`.
      

      i.e.,

      [target.<TARGET_TUPLE>]
      libfoo.'rustc-link-lib' = ["foo"],
      libfoo.'rustc-link-search' = ["/some/dir"]
      

      i.e.,

      [target.<TARGET_TUPLE>]
      libfoo = { 'rustc-link-lib' = ["foo"], 'rustc-link-search' = ["/some/dir"] }
      

      which boils down to:

      [target.<TARGET_TUPLE>]
      libfoo = { … }
      

    :light_bulb: Setting this effectively disables all of ::foo-sys' build.rs script execution :light_bulb:

    Granted, this can be a handy trick for actual cargo build/run/test/bench compilations that manage to set up stuff in a way where the desired C library can be found, compiled, at some predictable path (this is the very point of the feature!).

    But it turns out that this can be (ab)used to make certain, specially-crafted, cargo check/clippy commands skip these build.rs scripts:

    1. It turns out that setting an empty libfoo = {} "object" suffices to trigger this; or, oddly enough, embedding a dummy key-pair within it works too:

      libfoo = { 'build.rs' = 'skip' }
      # or
      libfoo.'build.rs' = 'skip'
      
    2. To add such a setting on-the-fly / only for specific commands, cargo supports single-line .cargo/config.toml additions, like this:

      RUST_HOST_TUPLE="$(rustc --print host-tuple)"
      cargo <check/clippy> … \
          --config "target.'${RUST_HOST_TUPLE}'.libfoo.'build.rs'='skip'"
      
      • (An empty object cannot be set this way; I imagine so that the approach be additive. Hence the dummy key-pair approach.)

    Limitations of .cargo/config.toml links overrides

    • when used, on-the-fly, for cargo check/clippy, it's hacky, cumbersome (needs knowing of every potential …-sys crate in the dep tree, and the links name each uses, multiplied by the "matrix" of every potential target tuple which may be involved), and can ironically back-fire if the cargo command ends up followed by an actual cargo build, as doing so might require recompiling everything given the adjustment of the .cargo/config.toml. It might be advisable to only use this in conjuction with custom --profiles (or otherwise separate target/ dirs) so these cached artifacts be properly insulated from the ones involved in actual builds; e.g., within rust-analyzer config:

      • Relevant snippet from my rust-analyzer config
        "rust-analyzer.cargo.extraArgs": [
            "--config", "target.'aarch64-apple-darwin'.rocksdb.'build.rs'='skip'",
            "--config", "target.'aarch64-apple-darwin'.snappy.'build.rs'='skip'",
            "--config", "target.'aarch64-apple-darwin'.titan.'build.rs'='skip'",
        ],
        "rust-analyzer.cargo.targetDir": true,
        
    • Even when used legitimately, as per the rust-bindgen example above, bypassing the build.rs currently also means that its code generation part, if any (such as that of generating "FFI headers" in Rust) will also be skipped (unless the package were to split itself into two, one with a package.links + build.rs, and the other, with just the build.rs…).


The suggested solution

Would be for Cargo to fully acknowledge and embrace this package.links with a companion build script (the "conceptual links.rs build script"), and reïfy this concept:

  1. Allow for package.links to be set to an object rather than just a string, with, at least:

    • name entry, to be set to its identifying name,
    • script entry: to be set to point to some .rs file.
      • Optionally, it could be set to true to default to "links.rs"; and/or having such a file alongside a Cargo.toml and a links = "libfoo" kind of entry could be tantamount to all this.
    [package]
    links = { name = "libfoo", script = "links.rs" }
    
  2. Such a links.rs-specified Rust file shall:

    • be interpreted like a build.rs does currently in Rust:

      • involving build-dependencies,
      • living in the build/host universe (e.g., w.r.t. Cargo resolver or whatnot);
      • being bypassable through links overrides (the build.rs being the one bypassed as a fallback, for retro-compat).
    • but only involved:

      • for actual cargo-compiling commands, such as build/run/test/bench, i.e., it would be skipped when running cargo check/clippy/doc.
      • invoked in the tail-end of the compilation pipeline.

      Or, more precisely, deemed to have no dependents inside Cargo pipelining but for the compilation/linkage of the very final artifact(s) (which cargo check/clippy/doc do not produce).

3 Likes