Futher extensions to `cfg(target_family)`, and concerns about breakage

In build.rs file, it's generally a mistake to do cfg!(target_foo = "thing"), ad it will test the value on the host instead of the target. This means code that does this is broken when cross compiling, which is unfortunate (but extremely common).

(While this is confusing at first, it's also hard to argue it is not the correct way for this to work. If when run from a build script it worked some other way, then, say, #[cfg(target_arch = "x86_64)] might be insufficient to soundly guard use of core::arch::asm! for x86_64 code).

Instead, I believe we're we're supposed to use the values provided by cargo for build scripts, such as CARGO_CFG_<cfg>. That is:

let target_foo_is_thing = std::env::var("CARGO_CFG_TARGET_FOO")
    .map_or(false, |v| v == "thing");

This properly checks the target's value for target_foo rather than the host's. All good? Well, no. this is still possibly not quite right.

This may work now, and it may stay that way, or it may break under different compilation settings. Consider the case of cfgs which may be used with multiple values, like target_feature. cfg!(target_feature = "abc") and cfg!(target_feature = "cde") both evaluate to true, so what does CARGO_CFG_TARGET_FEATURE use? It uses all of the values, separated by commas. So, for cfg keys which may hold multiple values, you must account for this and write your test in a slightly different manner, for example:

let have_static_crt = std::env::var("CARGO_CFG_TARGET_FEATURE")
    .map_or(false, |v| v.split(",").any(|v| v == "static-crt"))

In fact, the statement "for cfg keys which may hold multiple values" may be too conservative. Somewhat recently, the target_family cfg changed from being single-valued to multi-valued, as wasm has been added as a target_family, and now is returned on certain targets which are also unix; for example, wasm32-unknown-emscripten comes through in CARGO_CFG_TARGET_FAMILY as "unix,wasm". This was done without much discussion, and was celebrated by many, and several people have expressed desires to add even more target_families, for cases like x86 (x86/x86_64), darwin (macOS/iOS/tvOS), linux (linux/android), and so on.

I think we should be careful here. I believe the change from unix to unix,wasm on those targets was probably a breaking change, probably an accidental one (I looked and I couldn't find this being mentioned at all in the discussion about the new target family, but perhaps I missed it). In practice, we almost certainly "got away" with it for a few reasons:

  1. Most build scripts use cfg!() or #[cfg()] for this, which would keep working in this case, despite being incorrect for cross compilation.
  2. The targets which changed are the emscripten targets, which are rare.
  3. Use of CARGO_CFG_UNIX is slightly more common from what I can tell thant CARGO_CFG_TARGET_FAMILY (but the latter does seem to get used)

Alternatively, I suppose we could take the stance that actually all of the cfg values may gain multiple values in the future. If we do this, we should document this fact more clearly than we do. (I also think it will lead to annoying breakage when running some build scripts which suddenly do the wrong thing). I don't know if I think this is useful, since many of these clearly cannot take multiple values, but it's unclear how we would clarify this fact.

Anyway, I don't really have a point with this, I just saw someone mention more target_family values as a desirable goal recently, and wanted to bring this up for discussion. I personally am concerned it would cause subtle breakage, and that it should be done carefully at the very least.

6 Likes

Maybe build scripts should have access to a dedicated API for these things, instead of having to parse environment variables? (like how the proc-macro crate is only available in proc macros)

1 Like

When we wrote the tectonic_cfg_support crate, which looks like it is impacted by the introduction of comma separated multiple values, we explicitly used a macro_rules! macro because proc-macro relies on the target supporting dynamic linking which isn't available for all targets our build.rs should run on.

macro_rules isn't able to faithfully emulate cfg, even for all the cfg's we actually used, in regards to multiple variable argument recursive calls (of the form any(all(...), any(...), ...). In which case we fell back to a builder style API.

So just wanted to note that depending on procedural macros may not really be an option.

At one point, I started on target_cfg.rs ยท GitHub in a -sys crate, of mine, but decided it was overkill and ended up git stashing it into the void before finishing it. (It's pretty messy, and I guess I didnt save the last changes I had for it, because the match arms in __internal::cfg_test aren't all finished).

It's possible but not trivial. You need a recursive tt muncher, which are time-consuming to write, and annoying to maintain.

I'll also note that many people are hesitant to add additional dependencies in build.rs, both to keep builds fast and to reduce risk of supply chain attacks. An addition to the stdlib crates for this case is viable but is a large addition, I think.

It also might not really solve things, unless the environment variables were deprecated at the same time. I'd frankly prefer we just come to a policy on this kind of thing. In particular, I don't think in the future we should add new target_family entries to existing targets without being very aware of the breakage.

(And worth noting this breakage in many cases would be invisible to crater, which does not check other targets).

One option to make it explicit that all cfg(foo = bar) variants are potentially multi-valued would be to add a trailing comma CARGO_CFG_TARGET_ARCH="x86-64,", hopefully people would then parse these as multi-valued rather than var("CARGO_CFG_TARGET_ARCH") == Some("x86-64,").

This would break every build script that currently isn't broken with regards to cross compilation. I've written a lot of code that would break from this, due to assuming that there's only one target_os or target_arch. (I also don't know that it's sensible to make those be multi-valued, I think I would strongly oppose it unless there's some very reasonable case I am missing)

That said, it would be good new cfgs, especially ones that could become multi-valued eventually. For other cases, I think a better solution would be for us to generally consider this kind of change to be breaking, and rather than adding many values for existing single-valued cfgs like target_family, we can just add a new multi-valued entry.

For example, for the target_family case, we could decide not to add new values on targets other than the current crop (e.g. limit the weirdness to the emscripten targets), and add a new multi-valued entry of target_category. I think for that, either we would follow your suggestion and have it be comma-suffixed, or alternatively, populate it with a sufficient initial list of values on common targets โ€“ For example CARGO_CFG_TARGET_CATEGORY might be something like unix,linux,x86 on x86_64-unknown-linux-gnu, and might be windows,x86,msvc on x86_64-pc-windows-msvc (and so on). The values are not really the point, just that most targets would have a multiple values, so folks wouldn't make that sort of mistake -- alternatively the comma-suffix probably would work too.

That said, I think a std::bikeshed::CargoBuildEnv or something would be a good idea for various reasons... and could give us a lot of flexibility on this sort of thing in the future. That said, I have very little hope of that kind of thing stabilizing, for various reasons. (And to be clear, I don't think it would provide as much value as a 3rd party crate -- at least not in terms of protecting us more against breaking changes, and encouraging people to write correct build scripts, etc).

Cargo could provide a cargo_build_script crate separate from rustc.

Even more interesting... https://doc.rust-lang.org/cargo/reference/environment-variables.html#environment-variables-cargo-sets-for-build-scripts

  • CARGO_CFG_TARGET_FAMILY=unix โ€” The target family, either unix or windows .

The one place you can check what the cargo buildscript environment variables are specifically says that only unix or windows are allowed values. This isn't true, and clicking through to the #[cfg] docs makes this clear, but the env var docs are wrong here.

In general, the docs aren't clear on what keys are allowed to be multi-valued.

I know having it as a 3rd party crate isn't ideal, but I did publish build-rs to put a typed interface to the environment variables and cargo: println!s. It should properly have all potentially multi-valued inputs returning a split Vec, but the docs aren't super clear on what's allowed to be multi-valued.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.