Direct support for pkg-config in Cargo


#1

These days, I’ve been playing with static Musl builds with native dependencies. The more I get to know all kinds of *-sys crates I’ve been trying to get to build, the more clear it has become that the build.rs system isn’t ideal at all: it’s diverse in a bad way, and that makes it error-prone and confusing. Everybody and their cousins are coming up with their build scripts, 90% of which all do the same thing: tell Cargo the name of the library to use, a path where to find it, and possibly how to link it (dynamic/static). Usually they are getting this data from two sources: from environment variables if they are set, and/or from pkg-config, if that’s present.

Now, I think that if most of the scripts are doing the same thing, with possibly implementations that are buggy or lacking features (I already send some PRs to add support for having the option of passing the static linking flags and using pkg-config.), there is a clearly benefical alternative: Cargo should use pkg-config directly to cover the majority of the cases where build.rs is currently used. I think that if there was a single field in Cargo.toml, like pkg-config="libpq" to tell how to get the linker flags and lib dir from pkg-config, most of the *-sys crates would simply use that.

Thoughts? I think this would be one, quite simple step to simplify the process and improve the ergonomics of linking native dependencies. Currently the process involves checking the docs/sources of every *-sys crate to see how they are configured. (If they even can be configured.) Note that I’m not saying that we should abolish build.rs scripts for good, just saying that a unified, simple alternative would do for the majority of the cases.


#2

Can this be done with a crate for finding libraries properly? So that all those diverse build scripts would be reduced to:

extern crate proper_pkg_config_wrapper;
fn main() {
   proper_pkg_config_wrapper.find_library_properly("libfoo");
}

#3

Currently, there is the crate pkg-config that does precisely this. The problem is that not everybody is using it, even if they should.


#4

My fear would be this making matters even worse for Windows users.

I’ve thought a few times that what would be nice is some build crate where you just provide information about a library, and it does the rest. Not just the pkg-config name, but something like:

find_lib! {
    version: "1.2.3",
    unix: {
        pkg_config_name: "libblah",
        try_apt: ["libblah", "libblah-dev"],
    },
    windows: {
        nu_get: "cURL",
    },
    source: {
        sources: [ { url: "https://libblah.org/releases/blah-$VER.tar.gz", sha1: "hexhex" } ],
        build_kind: "i-dont-even-know-what-this-would-need",
    },
    binary: {
        "x86_64-pc-linux-gnu": { url: "https://libblah.org/releases/blah-$VER-bin.tar.gz", sha1: "morehex" },
        "i686-pc-windows-msvc": { url: "https://third-party.org/blah.$VER.zip", sha1: "yougettheidea" },
    },
}

Of course, I’ve never done more than think about it because good grief that’s going to be painful.

All that said, I think some kind of consistency would be nice, even if it only extends to the basics.


#5

I’m not sure how that would make things worse, if the crates currently check the dirs from pkg-config anyway? Cargo could also allow overriding the lib dir with an env var, which would cover the other common case. (There is the var RUSTFLAGS, but there could be, of course, a specialized variable.)

Btw. I also played with a thought of a system that would support downloading and building packages that are using autoconf and automake, but that would be quite limiting, since not all projects are using those build systems. That would need to be a separate tool, like that macro you visioned.


#6

I agree that the build.rs system is often subideal, but my suspicion is that it’s a recognition that there is no one-size-fits-all solution when it comes to C build systems/dep management, hence why the pkg-config crate is just another citizen of that ecosystem. If crates could/should look at pkg-config and aren’t, that seems like a solvable problem (given time), but I’m not sure that making it first-class in Cargo will help - if the crate authors didn’t use the pkg-config crate in the first place to identify dep config (for whatever reason), what additional motivation would be provided to make it happen in the new world?

On another note, you do acknowledge in the OP that this isn’t a one-side-fits-all approach…but there are a fair number of ‘individual’ build systems out there that could also do with some love. One thing that I would like to see for build.rs files (that would help regardless of use of pkg-config) is some way to document the environment variables that can be used alter a build - these often end up staying undocumented in the build.rs, as there’s no good way to expose the information. This way you’d be able to see at a glance whether a *-sys crate uses the pkg-config crate etc because you’d see that PKG_CONFIG_* variable influences the openssl (or whatever) build.


#7

It’s going to make it worse because pkg-config doesn’t exist on Windows. So now you have a feature that works great on unix but Windows users are left out as a matter of convenience.


#8

Ah, I see, that is indeed a valid concern if that leads the Windows users being neglected.

That brings to mind: is any unified way of finding libraries and compiler flags on Windows? It says in Wikipedia that pkg-config is available on Windows, but the problem is that it works by reading *.pc files that contain package information in $PREFIX/lib/pkgconfig, and I doubt the build systems that are common on Windows support saving that information. Is the Windows ecosystem is fundamentally scattered so that a “unification” of probing the libraries isn’t even possible? I remember from the days I did some C# that there was some “global assembly cache” but that might be just for the .NET binaries, not native libs? I don’t really know a lot about Windows.

Edit: Let’s limit the discussion to the two toolchains Rust supports on Windows: MSVC and GNU toolchains. I think the GNU toolchain supports pkg-config on Windows, though I’m not sure if it’s enabled by default (when using MSYS2? MinGW? Cygwin?), plus there is the problem of these toolchains being installed in unknown prefixes. The real problem is MSVC though, which I know nothing of.


#9

*laughs hysterically*

Does that answer your question? :stuck_out_tongue: Honestly, most of the time, I just want a directory into which I can dump the necessary files. I do not want to have to muck about with batch files and managing the correct set of environment variables on a per-project basis because they all want mutually exclusive sets of libraries and tools and refuse to support any kind of local, file-based configuration.

Also, another thing I forgot: some bindings will only work with specific versions of libraries. Aside from nix (and presumably the GNU knock-off), my understanding is that unix package managers don’t generally allow for installation of specific versions of libraries, just the current one that’s in the repository. In that sense, even with pkg-config, you probably want some fallback mechanisms anyway, since an important point for Cargo is the ability to do reproducible builds.

Regarding your edit: it does, until it doesn’t. I mean, MSYS2 has pkg-config, but I’ve come across numerous libraries that use pkg-config, but not on Windows, even with the GNU toolchain. Or, heck, I’ve found libraries that do, but the Rust bindings don’t for some reason.

… now that I think about it, I hate crates with native dependencies. I really do.

Edit: Oh! And let’s not forget libraries that use different names on Windows. *goes to get a bottle of whiskey*


#10

Anyone want to guess what libeay32.dll is for?


#11

IMO the whole story for external dependencies has some big flaws.

  • Environment variables tend to be necessary for locating dependencies, yet don’t trigger rebuilding when changed. This is hard to fix because these variables are often used indirectly; e.g. in the case of pkg-config, the pkg-config program itself uses PKG_CONFIG_PATH. Yet you don’t want to rebuild when any random environment variable changes…

  • Relatedly, there’s no way to get an upfront list of what external dependencies are going to be required by crates in the current Cargo dependency tree, nor what environment variables, paths, etc. are going to be used to locate them (and thus how to override with a different copy of the dependency). The best you can do is figure out which *-sys crates are in use and look in their documentation, or perhaps source.

Consequences:

  • Reproducibility is hard.

  • Build instructions for source distributions of applications are hard. (Anecdotally, when I see a tool on GitHub, if its build instructions require installing some language environment I don’t usually use, that’s already a big turnoff for me wanting to use it. That’s unavoidable. But if I do get over that hump, and install the environment, only to get a build failure… that really sucks. Since it’s an unfamiliar environment, I probably have no idea how to fix it, and I’ve already wasted enough time trying to get the damn thing built…)

  • I’d like to see Cargo maintain a binary cache across build trees - both globally on a given system and even online. Like many source-based system package managers (such as Homebrew and Nix), crates.io could automatically compile binaries, and cargo could download them in preference to source if there’s a match for the version, target OS, enabled features, (for Rust) compiler version, etc. This would dovetail with the idea of Cargoizing libstd: you definitely don’t want to have to recompile it for every project, but rather than having some special mechanism just for std, it could be cached and downloaded the same way as any other package. And if there was no match, like if you’re compiling for an unusual OS, it would rebuild it just like any other package. Such a mechanism would make compile times so much less of a burden.

…but you can’t do that, at least not effectively, if anything with a build.rs can quietly depend on whatever bits of external state it wants.

What I’d like to see instead:

  • A centralized mechanism for locating dependencies, as proposed here. But rather than just being for pkg-config, I’d like it to be a fairly extensible mechanism for dependencies and configuration options in general.

  • A cargo configure command that, like ./configure, allows specifying all the dependencies up front - and has a --help that lists exactly which native dependencies the build needs (including all dependent crates) and how to specify them. (This could also be optionally used for targets and features, to avoid the need to repeat it every time you run cargo build.)

  • An optional key to indicate that your build.rs is “pure”, meaning that all external inputs are fetched through the centralized mechanism rather than ad-hoc. To avoid mistakes, a build.rs run this way would be run with a clean environ (including PATH) and, where available, sandboxed.

  • Possibly there should be a way to provide arbitrary Rust scripts to run at “configure time” (in an unclean environment), which would output a list of library files to use, options, etc., but not do any of the work of actually using those files. Cargo would then (a) allow overriding any of these settings in cargo configure and (b) be able to say that if the configure script had the same output, and the files in question have the same content, as observed previously, build output can be reused. (This would be useful for a global cache but too slow to run on every cargo build; for that, maybe the script could also output a list of environment variables, paths, etc. which would trigger a rerun - less reliable but faster.)


#12

I don’t really think the interface of ./configure is very beautiful. I’d much prefer having some toml file you edit, which doesn’t create intermediaries that store state unlike configure.


#13

That could work too, as long as it’s possible to list the available settings (i.e. no stuffing random keys in the toml file and wondering whether they’re having any effect or not).


#14

Personally I’d like to see this sort of thing mature as build-dependency libraries before considering adding it to cargo. (Stuff about build.rs purity is a separate issue in my mind.)


#15

We can make Cargo aware of environment variable dependencies in the same way that it’s aware of file and directory dependencies.


#16

Good discussion. My original suggestion to support pkg-config was born of frustration of the arbitrariness of the current build configuration. I think any kind of improvement would be nice to have. I see now that the original suggestion to support pkg-config was bit of a stretch from the portability viewpoint.

However, having the build.rs scripts to state up front what configuration options they provide and what environment variables they look at, and then have Cargo call them with clean environment with those and only those variables would help the situation indeed. That would help with re-buildability (invalidation of caches), build reproducibility (clear declaration what things in the environment matter and what don’t) and ease of configuration (ability to print the configuration options for the user). Of course, if we don’t go for full sandboxing, the build.rs scripts are still free to do any crazy stuff they want, but even without that, just pre-declaring the env vars and calling the script with clean env would be a nice speed bump.

As for the “ickyness” of env vars, there could be an .env file (I believe that’s the commonly-used name for purposes like this) that Cargo reads to set the variables. The .env files could be added to .gitignore by default in new Cargo projects, as they are and should be environment-specific.


#17

Doing a new tool until everything is mature and agreed upon is a great idea, maybe I’ll give it a shot.

Something that reads from a config file (maybe called build.toml?) and then passes the variables to the library crates.

It could also have a mode to generate such a build.toml file with appropriate comments, similar to how there is a build.toml.example in rustbuild (that one is manually generated though).


#18

I’ve written a crate metadeps, which lets you specify pkg-config dependencies declaratively in Cargo.toml metadata, like this:


[package.metadata.pkg-config]
foo = "1.2.3"