Build.rs use cases and stories sought!

Holy shit, that is exactly what I needed. Thanks!

Now to wait for impl... or help out on it.

Would it be feasible to standardize an binary calling convention similar to pkg-config that would find for example openssl?

You could have all your “find openssl” logic in a crate which produces a binary that takes the same args or works in a similar way as pkg-config.

Then it would be up to the overall build system to either use the real pkg-config or to use the rust crate.

That’s skipping the hardest part. If we had a bug-free universal tool, which already knows how to build/configure every library on every platform, then sure, we could standardize on it.

We already have pkg-config on some of the platforms, and it still requires workarounds in build.rs due to its limitations and bugs in package definitions (especially on macOS where it gives answers right for homebrew, but not necessarily right for Rust).

Also, pkg-config doesn't actually do everything that the build.rs script is doing. Even if it didn't need to work around configuration bugs, build.rs would still have to decide which features of the crate to enable or disable based on the version of openssl that's present. Something still has to generate the cargo:rustc-cfg=osslconf="(x)" definitions.

1 Like

That is fine, pkg-config has an argument for getting the version of the found dependency. Obviously the build.rs script or some sort of generated rust code is needed for rust specific things, which is totally fine. My problem with build.rs is that part of what it does makes it incredibly difficult to integrate into other build systems.

It doesn’t have to be universal. My point was that a tool like “openssl-config” could be compiled. It would be a rust project and it would do exactly the “find openssl” part of the build.rs. it would just follow the calling convention of pkgconf so that an external build system could use it. It would be a less opaque solution than have today

But who provides that openssl-config (and libfoo-config and libz-config-in-context-of-libpng-config)? For which platforms?

If Rust/Cargo, then it’s like giving maintenance of all crates-io sys crates to the Cargo team, effort comparable to making a new Linux distro, except for all platforms.

If package maintainers, then that’s exactly what build.rs compiles to, with a layer of indirection.

2 Likes

The main thing I dislike about build.rs is that it hides dependency information - cargo is constrained to building an executable then running it for its side-effects. The executable can express some limited dependency info in the form of “rerun if X changes”, but that’s pretty coarse.

build.rs is used for a few distinct things:

  1. generate Rust source code for use later in the build process - bindgen and parser generators are common examples
  2. capture some environment for version/build id (time, git hash, etc)
  3. build an external library for ffi use - libjpeg/openssl/etc.

The first case has well defined dependency information, and generates deterministic output. I’d assume its reasonably straightforward to make cargo understand this case (and maybe the existing mechanisms already do).

The second is inherently non-deterministic, since a timestamp will change all the time, and a git hash can change even if none of the code going into the executable has changed. Build systems generally have to special case these to avoid absurdities like “always rebuild because now has changed”.

The third is particularly awkward, since cargo doesn’t understand non-Rust dependencies, and nor should it. The approach I’ve suggested in the past is to split this into two parts: a declarative “I depend on openssl”, and some custom code which implements “here’s how to find/configure/build openssl”. This allows the two parts to be decoupled, so you can have different implementations for the second part for different environments, while giving cargo the abstracted information it needs.

So really, I think build.rs should be split up into at least 3 separate mechanisms to handle these cases (and maybe more that I’m overlooking now).

Like there isn't for any other language out there including C itself!

What Cargo's build.rs does is just the same as what Autotool's configure.ac, CMake's CMakeLists.txt or Meson's meson.build does in light cyan.

Since there is no standard for defining dependencies across platforms and/or build systems, you end up with installation instructions in the readme.

That post is written too much from the point of view deep inside Meson. If you were trying to integrate something built with Meson into a larger project built with, say, Maven, you'd be cursing in just the same way. When mixing build systems, neither can see what is going on in another one, no matter which ones you choose.

Instead the closest-to-practical approach when building heterogeneous suites of software is to use a meta-build system like Yocto's BitBake or BuildStream that treats each part, with its own build system, as a little black box, declares dependencies between them, and then lets each build in a sandbox with the dependencies already declared. But you can do this for a specific system, not for all of them, because they are too different.

I would probably go a bit further and suggest that build.rs should be able to define any number of generation steps with dependencies and outputs. That way if multiple components need to be generated, only the ones that have changed inputs can be re-run.

Though we can also simply have a library to take care of that and simply use it from the build.rs

Anybody who compiles current timestamp in a build deserves having everything rebuild every time. Git hash or timestamp of the latest modified file are fine, because they only change if some file did.

I don't see any point in having the declarative part. You agree that Cargo shouldn't understand non-Rust dependencies, so it won't be using it itself, and unless a standard for dependency declaration emerges, every meta-build/package-management system has to declare the dependencies itself anyway.

Cargo has to know something about it. Otherwise the code is just saying "I'm reaching out and pulling a dependency from thin air". The interface to the second part should not only be "build this thing", but also "does it need rebuilding?".

The idea is that you have the abstract requirement, and then an arbitrary number of specific implementations: for example, the one for MacOS, the one for Windows, the one for Nix, the one for Debian, the one for Fedora, the one for building from source, etc, etc. Sure you could have a single monolithic chunk of logic handling everything, but that's hard to manage and maintain (esp since each platform could have different maintainers).

In my particular case, I'm working out how to embed the entire Cargo build process in a larger build system managed by Buck, which does know how to build all kinds of things, so Cargo's "I need openssl" dependency would map to a Buck dependency on an openssl target; it would then be up to Buck to manage the whole build (both Rust and non-Rust), using the crates resolved by Cargo.

The more I think about it, the more I like the idea of a helper library.

  1. Cargo should only need to be able to detect whether build.rs output has changed and then whether any of the generated files did.
  2. A library could then take it from there, implementing caching for the configuration, dependency-checking logic for individual parts of code generation, and a special utility to write-only-if-different, so the generated code does not need to be recompiled if it didn't actually change (CMake's configure_file does this for exactly this reason).

That way:

  1. Cargo itself can be kept simple and only the crates that need some special logic will pay the price and
  2. crates will be able to take advantage of the improved logic without forcing dependency on the newest cargo.

Yeah part of the problem is that Cargo has no formal idea of what output(s) build.rs has. It just runs it, and later on some .rs file will have an include!(...) or something to get a file that the build script left at a particular path. If a build script had a formal way to (at least) declare its outputs, then Cargo has more to work with. But it also needs some kind of cheap way to tell whether the script needs rerunning.

That says that we’re using build.rs, in the case of generated code, for two purposes - telling Cargo when to recompile build.rs and generating code.

To be clear, my ideal would be to eliminate build.rs entirely, and have separate (new) mechanisms to deal with each of these cases.

Understandable, but lets focus on initial steps that can be done without breaking older code and crates? Remember, stability is important for Rust.

To summarize, in terms of the code generation story, there’s two functions that build.rs is currently used for:

  • Generating code, which is typically inserted into the crate via the include_str macro.
  • Telling Cargo when to rerun the code generation (via build.rs)

Sound accurate? Any complicating factors for this use case?

build.rs needs to report the dependencies that should be checked to find whether rebuild is needed. But this is a list of files (and possibly environment variables or other things). It has no use for abstract thing like “openssl”, because it won't know how to check whether it changed anyway.

The point of checking whether dependencies changed is to avoid needless work, so it should itself be fast, and that should mean just checking some timestamps, not running any helpers.

I don't think logic at this level should be inside Cargo. Like for most existing software components, the logic inside the crate's build system should only be looking files it needs using hints from environment.

The logic for Debian, Fedora or MacOS Homebrew should go in the packaging specification, which is wrapped around cargo. The same for distributions or suites built with BitBake or BuildStream. And while Windows are short of similar tools, software is usually distributed in binary form there and the builder can use a Jenkinsfile or such.

Yes, but what needs the dependency in this case is Buck, not Cargo. In your project it makes little difference which file you declare it in, and for another project that uses BitBake it won't be much help, because the dependency probably needs to be specified differently there. And it may be even project specific—at work we have related projects and some APIs are implemented in different library in each, so each has it's own .bbappend files that override the dependencies. The base build defined in Cargo shouldn't care. It's author wouldn't even know.

Having declarative inputs and ouputs for scripts sounds like a good idea. For example, build phases in Xcode require this information.

Cargo currently has a way of specifying inputs, but that’s done only during script’s run, and has a fatal flaw: declaring any input disables Cargo’s defaults, so libraries like pkg_config can’t use it without risk of breaking build.rs scripts.

Split into code generation and library search is probably a good idea as well:

  • Code generation is a relatively simple task, and isn’t replaceable (i.e. it’s always crate-specific).
  • Library search is a hairy mess, and on some platforms/distributions it is replaceable.

Perhaps this could be as easy as formalizing the -sys crates: if the crate has -sys name, and has the link attribute, then assume its build.rs searches for the library from link name, and the build script can be skipped if you know better how to link that lib. But implementations would also need to provide directories for headers (a related cargo:root/cargo:include convention).

2 Likes

In this regard, it would be nice to split all of Cargo into configure and build steps similar to how Autoconf, CMake or Meson work, i.e. not only the custom parts generated by build script, but also steps built into cargo would be first written out as steps with listed dependencies and outputs and then executed. And one of the steps with listed dependencies would be re-running the configure step.

This would make it easier to integrate with other build tools, as the other tool could take advantage of knowing the dependencies and re-running the steps only as needed, and it would also speed up the build, because checking the timestamps can be done really quickly and the configure script would only be re-run if needed (library detection is normally not re-run without explicit request in Autoconf or CMake).

In fact I am thinking it would make sense to create something like ninja—possibly almost port including some compatibility—for Rust. Maybe even just a wrapper for it at first. This could first be made available as a utility for the build.rs for the custom steps, to test and polish it, and then integrated into cargo itself.

Personally, I think what Cargo most needs is the equivalent of ./configure --help: a way to list all the options you can pass to configure the build.

For example, openssl-sys's build.rs contains a bunch of magic autodetection which I might not want to use – but I don’t actually have to. If I pass OPENSSL_LIB_DIR and OPENSSL_INCLUDE_DIR, most of that detection is short-circuited in favor of just using the provided directories. If I have an outer build system that wants to be responsible for finding OpenSSL itself, I can configure it to always pass those variables – or if I can’t, that’s the fault of that build system for not being flexible enough. But how do I know which variables to pass? Well, it is mentioned in the documentation for the openssl crate, and if it were just openssl, that might be good enough. But that doesn’t scale. If my code has a large dependency tree, I don’t want to go through each crate in the tree and scour the docs and/or source code for any references to magic environment variables (especially since the set of supported variables might change over time). Much better if I can run a command to get a comprehensive list of all the options for the whole tree.

It would also be nice to have a standardized convention for a way to tell build.rs scripts “please don’t try to autodetect any dependencies, just fail if I haven’t specified them explicitly”.

4 Likes