Reconsider how merging works for "string or array of strings" Cargo configurations

Using build.rustflags as an example (Configuration - The Cargo Book) — this configuration option is documented as having "Type: string or array of strings", with the following being treated equivalently by Cargo:

# .cargo/config.toml
[build]
rustflags = ["-C", "target-cpu=native"]
# .cargo/config.toml
[build]
rustflags = "-C target-cpu=native"

Except, surprisingly they are not equivalent!

The first one can only be appended to by other configurations. It cannot be overwritten. The second one can only be overwritten by other configurations. It cannot be appended to.

With rustflags = [...]:

$ cargo check --config='build.rustflags=["--cfg=repro"]' --verbose  # APPEND
     Running `/path/to/bin/rustc ... -C target-cpu=native --cfg=repro`
$ cargo check --config='build.rustflags="--cfg=repro"' --verbose  # FAIL
error: failed to merge --config key `build` into `.cargo/config.toml`

Caused by:
  failed to merge key `rustflags` between .cargo/config.toml and --config cli option

Caused by:
  failed to merge config value from `--config cli option` into `.cargo/config.toml`: expected array, but found string

With rustflags = "...":

$ cargo check --config='build.rustflags="--cfg=repro"' --verbose  # OVERWRITE
     Running `/path/to/bin/rustc ... --cfg=repro`
$ cargo check --config='build.rustflags=["--cfg=repro"]' --verbose  # FAIL
error: failed to merge --config key `build` into `.cargo/config.toml`

Caused by:
  failed to merge key `rustflags` between .cargo/config.toml and --config cli option

Caused by:
  failed to merge config value from `--config cli option` into `.cargo/config.toml`: expected string, but found array

This distinction between append-only rustflags and replace-only rustflags seems like too much subtlety to attach to what, to the user, almost certainly seems like an insignificant syntactic difference between writing flags as an array of strings vs space-separated string.

Most recently, this has caused problems for me in GitHub - dtolnay/cargo-docs-rs: Imitate the documentation build that docs.rs would do which needs to set --cfg=docsrs and --extern-html-root-takes-precedence in build.rustdocflags. If the user arbitrarily chose to use string syntax in their config.toml, then they get a confusing failure because I cannot append to it. At best, I could reimplement all of Cargo's configuration merging logic in my subcommand, but this definitely doesn't seem like something I should have to do in order to append flags.

Should we make rustflags = ["-C", "target-cpu=native"] and rustflags = "-C target-cpu=native" actually equivalent, by giving them both append-only behavior?

12 Likes

While I agree this is too much for user to learn the details, this is somehow documented: Configuration - The Cargo Book. People may already rely on this to unset rustflags from config as a workaround for rust-lang/cargo#8687.

Documentation doesn't make this intuitive, though. We can't break existing workflows, but we may be able to provide alternatives, such as by providing always-append sources of rustflags (e.g. RUSTFLAGS_APPEND or --rustflags-append=), providing ways to explicitly override/suppress rustflags from various sources where desired, or changing this over an edition boundary.

4 Likes

It would be nice to have explicitly appending rustflags option.

Regardless of that, I'd prefer Cargo to move away from using rustflags, and create first-class features and config properties for commonly used flags. target-cpu = "native" could be a config option on its own, or a property of a build profile.

4 Likes

Josh's idea is pretty much what we do in Buck2. We have 2 configuration values associated with each Rust toolchain: rustc_flags and extra_rustc_flags, with the idea that rustc_flags would typically originate from files checked into the repo and contain things like:

  • -Csymbol-mangling-version=
  • -Clinker-flavor=
  • -Cforce-frame-pointers=
  • -Cembed-bitcode=
  • -Zmacro-backtrace
  • -Zllvm_module_flag=

whereas extra_rustc_flags would typically be supplied by a command line argument or other ephemeral configuration. For example our docs.rs-equivalent that publishes documentation of all Rust targets in the repo performs builds with --cap-lints configured by way of extra_rustc_flags.

3 Likes

That would be nice. But you said "commonly used flags". Unless every single flag is mapped, there will always be a need for rustflags. And it should work as well as it can work.

Especially when dealing with embedded in Rust. I have had to configure a lot of strange flags there (especially around linking or the linking driver in use).

This is being tracked in Tracking Issue: Exposing RUSTFLAGS in cargo ¡ Issue #12739 ¡ rust-lang/cargo ¡ GitHub

5 Likes

This becomes especially interesting around the unsound codegen options. Rustc necessarily has some codegen options which are, strictly speaking, unsafe, as they make unchecked assumptions about the target runtime environment. The canonical example is than enabling target CPU features is (or at least can be) unsound if run on a CPU without those features[1]. So far Cargo avoids addressing this theoretical unsoundness hole by just not exposing it except via the "I know what I'm doing" knob that is setting RUSTFLAGS directly, but first-class Cargo support means needing to explicitly acknowledge any potentially unsound flag usage.

-Ctarget-cpu has a fairly straightforward solution — test target features before main (or on dylib load) to generate a guaranteed crash before execution of any code compiled with an assumption of feature availability. I don't know off the top of my head what other rustc options are technically unsound. I'm 95% certain other flags exist that pass on a soundness proof to the person distributing/running the binary. (E.g. the implicit assumption that any linked life before main is safe independent of runtime environment.)

"I'm going to run this binary blob" (hopefully) doesn't have UB at the OS level, but it absolutely can cause UB at the program level when language-known properties about the host don't hold. People already familiar with native builds for C or C++ are hopefully aware of this, but for people who don't have any unmanaged background and don't touch unsafe can absolutely be burnt by this[2].


  1. Specifically, Rust (and LLVM) declare it UB to run code in an environment with improperly set target features, as optimizations are allowed to introduce usage of those features (e.g. for auto-vectorization). And even then, IIRC is_x86_feature_detected! has an optimization to skip hitting the feature detection cache if the feature is globally statically available. (I know I've seen a proposal for it to try to skip the check if during monomorphization it's locally statically enabled by a use of #[target_feature]. I don't think we do that currently, as it is in conflict with bottom-up optimization/inning order, and discussion leaned towards using a separate macro to avoid changing the behavior of the current one, but it's desirable for rustc-managed multiversioned function.) ↩ī¸Ž

  2. It's not uncommon to see "use -Ctarget-cpu=native for release builds as a performance tip without any further caveats, especially on x86_64 targets where the default v1 target is behind a still reasonable assumption for an audience of "running newly acquired software" systems to have. But if a less aware user follows this advice and then runs the binary on an older CPU than was used to do the build — easily accomplishable if the build was done on "latest" CI and then run from an older local machine — they're opening themselves up to UB (in the best case, an illegal instruction fault, but in a worst case, processor level nondeterminism from decoding the "malformed" instruction encoding). ↩ī¸Ž

1 Like