Config.sub and rustc => Rustc Accepted Targets

(CC: @zackw)

Based on discussion on the IRLO thread Expand targets acceptable to rustc, and on zulip, a few weeks ago I compiled a list of targets that are supported by rustc (at any tier), which are not accepted by config.sub (used by projects that use autotools, including compilers such as gcc, and treated as normative by various projects of mine, such as https://crates.io/crates/target-tuples, and GitHub - LightningCreations/lccc: Lightning Creations Compiler Frontend for various languages). Of the 167 total targets supported by rustc, 52 are not accepted by config.sub (I have not done any idempotency checks, so of the remaining 115, some many not be in canonical form, though I forsee this as less of an issue provided the details of the target are reasonably related to those of the canonical form).

The link to the unsupported target list, as well as the script used to compile the list is: Rustc Targets ยท GitHub.

This could, potentially, have issues with compatibility, like were raised in Expand targets acceptable to rustc, though from a different angle, where it may not be possible to compile non-Rust code for those platforms if only the rustc target is known, even if the platform does have support in a major C/++ compiler under a different target name. This also could pose issues with non-rustc implementations which either determine the valid targets as autoconf --targets (which are passed through config.sub), or that canonicalize targets according to a subset of the config.sub rules. The former may be an issue with gcc-rs (if anyone working on the project could comment as to whether it is or is not, that would be ideal), and the latter with my own implementation, lccc (link above). The most forseeable issue would be that two implementations disagree on the name for a particular target they both otherwise support, and both interior tooling (such as build scripts) and exterior tooling (such as configuration scripts/files) written for one without consideration of the other will not be portable accross implementations (and dealing with such incompatibilities can be difficult).

If you're mixing target configuration via GNU config.sub (for example, GCC, including GCC/Rust, or generally GNU Autotools) with non-GNU config.sub (for example, Rust), then yes, you'll need some kind of (bidirectional) mapping table for target plus any relevant other flags that may be implicit in the target string.

@zackw, would it make sense to host this translation table in config.sub, with a new mode/command-line flag? (If that makes sense?)

@InfernoDeity, I'm not sure I got what exactly your actual concern or suggestion is?

Well, it's more intended as an open-ended discussion of "What can/should we do about this" and "Are there any other issues that may occur", more than a specific suggestion (I probably could have clarified this in the intial post).

I'm not sure this solves the issue. As mentioned, I work with a number of things that purposefully accepts and normalizes targets like config.sub (though only a subset, because some of the more interesting forms are fun to parse). If I need to go from a TARGET name to a config.sub name, or from a config.sub name to a TARGET name, I'd have to know if I have to translate it. This may or may not be or become reasonable, but if a build script doesn't consider when the answer may be different from what it expects (for example, if it assumes it's targetting gcc-rs, and thus the target doesn't need to be translated and then it's run on rustc).

Re-using the same example: that build script already needs to consider whether it's going to invoke GCC/Rust, gccrs vs. Rust, rustc, to use the respective command-line flags etc., and it then correspondingly has to use the original vs. translated target name.

Or, if the build script doesn't (need to) know all this, then a wrapper such as cargo has this logic incorporated?

For example:

  • Project A, using GNU Autotools, contains some Rust-language pieces. The configure script checks whether GCC/Rust is to be used and if yes, uses the config.sub target for gccrs, otherwise translates that one to Rust target for rustc use.
  • Project B, using Rust cargo. Invoking the usual cargo command, that one internally checks whether GCC/Rust is to be used and if yes, translates the Rust target to config.sub target for gccrs, otherwise uses the Rust target for rustc.

(This isn't implemented yet, but that's how I'd assume it would work.)

Here, I'm referring to a cargo build script, so it's invoked by cargo. I guess gcc-rs, which has a more gcc like CLI isn't a good example. In lccc, however, I plan to provide a rustc-like CLI, so that CLI (invoked as rustc or lcrustc) it should be a drop-in replacement for rustc, except that it doesn't parse targets, because parsing targets once is more than enough (and the more times targets are parsed, the more points of failure). Another issue would be if there is more than one reverse mapping. For example, x86_64-unknown-linux-gnu could map identically or could map to x86_64-pc-linux-gnu (even more if we consider exact target names, not just canonical target names). This has issues if the build script needs to invoke other toolchains such as a C compiler or linker (and yes, you can have two toolchains for functionally identical targets that are completely different. Stage 0 of LInux From scratch produces a toolchain for the phony x86_64-lfs-linux-gnu target to distinguish it from the host toolchain).

I have experience with configuring autotools for use with rust. One issue this has is that this isn't really future-compatible. If you are just testing for gccrs, then that will fail if a new implementation shows up that uses autoconf targets. Currently, my script, which does do some modification to adapt --host and --build targets to rustc targets, modifies the names according to a few hardcoded rules and does trial compilations until one is accepted.

Currently cargo just asks $RUSTC if it accepts a particular target (similar to what I do in rust-autotools, but not repeated with modifications). Ideally, this would be sufficient, as cargo likewise cannot predict for some arbitrary $RUSTC whether it is accepting rustc-style targets, or config.sub-style targets.

gccrs has the same commandline interface as gcc. This means that cargo can't run it directly anyway. @CohenArthur is working on cargo-gccrs which is a wrapper around cargo that translates rustc arguments to gccrs arguments. This wrapper can translate from rustc target to config.sub target too. By the way gcc is not a cross-compiler and as such there is no need for any such translation anyway afaict.

1 Like

I'd assume there is at least some things it would need to know about the target anyways, so that when invoked with the cargo --target option it can either select the correct compiler in the wrapper (if available), or fail gracefully (rather than try and invoke x86_64-pc-linux-gnu-gcc-rs for --target aarch64-pc-windows-gnu). Though, in that case, it would probably be better to not translate the targets, as config.sub=>rustc=>config.sub is quite lossy as mentioned, and someone may have really wanted x86_64-unknown-linux-gnu rather than x86_64-pc-linux-gnu.

RUSTC is a Rust compiler, with a Rust compiler interface, supporting Rust targets. Cargo will invoke it like a Rust compiler. If some non-Rust compiler provides RUSTC / rustc, but doesn't support Rust targets, that will make it incompatible with the expectations of tools invoking RUSTC / rustc, including cargo.

A rustc-compatible wrapper for some other compiler will need to translate targets to what that compiler expects, and in any case that compiler will need to present such targets to Rust code cfg directives as Rust targets to avoid breaking conditional compilation. So on balance, such a compiler will really need to understand and present Rust targets internally anyway.

Separately from that, there's the question of target aliases, and what toolchain to invoke by default for things like linking or building C code. For that, we may need to improve our autodetection to try to find compatible tools by default.

As I mentioned that may be undesirable because the user may not want a particular target from the rust=>internal mapping. If it maps the rustc target to some particular config.sub target, then uses that mapped target to find other pieces of the toolchain, or make decisions (such as whether the target is a cross-target), it will be wrong in the general case. To limit rust implementations to assuming potentially incorrect defaults, with limited portability to override that default, imo, is unreasonable. If I pass --host x86_64-lfs-linux-gnu to configure or --target x86_64-lfs-linux-gnu to cargo, my expectation as the user is that the compiler invoked is for that target or the compilation fails. Note that rustc already breaks this assumption, because, on my system at least, for x86_64-unknown-linux-gnu it uses host tools for x86_64-pc-linux-gnu (See Expanding Targets Acceptable to Rustc). It seems like a poor idea to force such limitations on implementations otherwise capable of circumventing it (also, less importantly, but as I said, I have zero need to parse targets twice in lccc. The more places I do so, the more likely one of those places is wrong and I won't catch it).

That would be easier to do, and much less error prone and assumption-breaking than if you have to go rust=>config.sub=>rust, or config.sub=>rust=>config.sub=>rust (which becomes even more error prone if the crate author proceeds add another =>config.sub step).

This is exactly what I was getting at: a Rust compiler needs to understand Rust targets on input, and needs to provide them to Rust code later, so on balance it'll be easiest if it just understands Rust targets internally rather than having to translate on both ends.

But a compiler could choose to translate on both ends, if it really wanted to use some other target representation internally; that just seems excessively painful and error-prone.

To illustrate the issue, I'll provide two examples using lcrustc (the rustc CLI for lccc), and rust-autotools. Case 1, --build x86_64-pc-linux-gnu, no --host:

  1. configure assigns host to be x86_64-pc-linux-gnu, and host_alias to an empty string, according to it's internal rules for the target variables.
  2. LCRUST_PROG_RUSTC matches lcrustc in PATH against AC_PATH_PROGS for various versions of $RUSTC (or RUSTC is preset to lcrustc).
  3. Target detection, IE. how to pass $host to $RUSTC in target. the first 4 cases fail - lcrustc is neither prefixed by $host_alias nor $host, so it's not a static cross-compiler (a gcc-like), --target $host_alias fails, and, because of requirements on targets, --target x86_64-pc-linux-gnu (currently) fails. Eventually, the rule for x86_64-*-* is matched, and the target is converted to x86_64-unknown-linux-gnu, which succeeds. --target x86_64-unknown-linux-gnu is placed in RUSTFLAGS, and rustc_host_target is set to x86_64-unknown-linux-gnu.
  4. the compiler is invoked with --target x86_64-unknown-linux-gnu when LCRUST_PROG_RUSTC checks whether it works. It has to map the target to an internally acceptable target. Knowing that unknown is generally not what the user wanted, and that most x86 systems are pcs, it swaps the vendor again before parsing and canonicalization and passing that target to the frontend.
  5. The frontend matches the target to the cfg directives, and sets the fields to target_arch="x86_64", target_vendor="unknown", target_os="linux" and target_env="gnu" (and all the others).

(Discourse is failing to format the 6th item) As a result, we have done a round trip through both targets, matching them twice. In this case, the result is fine, because the exact target that the compiler uses matches the target the user provided, so the link step will invoke x86_64-pc-linux-gnu-ld or x86_64-pc-linux-gnu-ar as necessary..

Case 2, --host x86_64-lfs-linux-gnu:

  1. configure assigns host_alias to be x86_64-lfs-linux-gnu. The target canonicalizes as x86_64-unknown-linux-gnu and that is put into host.
  2. intermediate AC_PATH_PROGS
  3. Target matching happens. First 3 cases fail as above, but the fourth, matching $host directly, works. Note that it failed and skipped over $host_alias which is the third case.
  4. lcrustc parses and maps to x86_64-pc-linux-gnu as the exact target name.
  5. Link steps invoke x86_64-pc-linux-gnu-ld. Uh oh, we just invoked a host tool instead of a cross tool.

The issue, with something like lccc, and presumably with gcc-rs, is that it isn't just a rust compiler. It's also a C/++ compiler that needs to understand the targets that have become defacto standard for that purpose. Rust targets aren't sufficiently granualar to use internally for the entire project, and at some point I need to convert internally to the xlang targets so I can query the target properties (which is defined in one place to avoid having two places that don't agree). As noted, that process is very lossy. I also need to eventually map the rust target into the exact gnu target to invoke following steps (which, as demonstrated, is never going to be generally correct and prone to using the wrong toolchain).

1 Like

If I had the time to engage with this mess properly, which I really really don't, I would be advocating for convergence between the Rust names and the GNU names. There's no reason why each project needs its own set of names nor why they should be subtly different from each other, only a lack of past coordination (and an ongoing lack of manpower).

This is what I think is the best idea - it supports implementations that want to be more general while not causing issues during interconversion steps (which I have demonstrated will be present reguardless of whether the implementation uses rust or gnu names internally).

I'd certainly like to see that, but it's not obvious how that could happen without either breaking existing code on the Rust side or breaking existing code on the autotools side. Both sets of names have code depending on them.

Also, you mentioned in a previous thread that autotools has some limitations on what it can use as a canonical target name, and in particular it sounds like a canonical target name for autotools must have at least three components, which is not a requirement for Rust. How ingrained is that assumption, and would it be possible to fix?

A lot of things doing matching on canonicalized targets will match three or four components. A good example would be these lines of of shell script from rust-autotools: https://github.com/chorman0773/rust-autotools/blob/027ada3cf0a73372345f0a6bd89983cf0b71acfa/m4/lcrust_prog_rustc.m4#L117..L129. Notably it matches *-*-* (which validly matches *-*-*-* as well), and strips out the second component (precisely because rustc has some targets that don't have a vendor component). Both gcc and binutils also match against 3 and 4 component targets, where the second component is assumed to be the vendor.

I think the easiest thing would be to attempt to unify rust targets with non-canonical names, but ensure that canonically-equivalent targets are equivalent rust targets or not accepted by the implementation (but implementations should be free to accept them without restriction beyond the equivalence). This would allow implementations to internally canonicalize targets without breaking anything. The only thing this should break is code relying on exhaustively matching $TARGET, but rustc is already free to add new targets.

We'd certainly have to keep old names around for backward compatibility, but I'd like to think there is enough wiggle room to adjust what the designated canonical form of each equivalence class is, on both sides.

It's pretty deeply ingrained. For instance the wasm32-wasi name that, IIRC, you gave as an example of a Rust-side two-component canonical name, becomes wasm32-unknown-wasi when run through config.sub, even though config.sub has no special knowledge of either wasm32 or wasi.

config.sub itself could be changed relatively easily to not do that; the problem is all the programs out there (e.g. a majority of all autoconf-generated configure scripts) that parse the output of config.sub and assume that there will always be at least two dashes.

I can see a way forward, involving adding a --novendor option to config.sub and config.guess, and I think it would be well received among the interested parties on the GNU side of things, but that's one of many projects that I have neither time nor funding to pursue until 2022 at the earliest.

(I apologize for dropping the thread on Zulip, it was happening much too synchronously for me that week.)

That sounds really promising. That would help unify new targets around simpler names, while automatically adding a compatibility version (e.g. firstcomponent-unknown-secondcomponent for two-component names, or for that matter firstcomponent-unknown-unknown for one-component names if we also allow those). Having wasm32-wasi as the canonical name and wasm32-unknown-wasi as a "compatibility" alias seems reasonable.

It's not entirely clear what it would mean to "keep old names around for backwards compatibility" if they're not the canonical name anymore. We can certainly have any number of aliases for a target, but any given target in autotools can only have one canonical name (which some software depends on), and any given target in Rust can only have one canonical name (which some software depends on).

I think addressing targets like wasm32-wasi might make sense, and the --novendor solution sounds like a good approach to that, while preserving compatibility with existing software that expects names to always match *-*-*.

But that only handles the case of targets that autotools doesn't yet support at all, rather than targets that autotools and Rust both currently support but with different names. For instance, consider all the -windows-gnu and -windows-msvc targets. It's not obvious how we could align on any one naming for those targets without either breaking scripts that expect the autotools names or breaking Rust code that expects the Rust names. In theory either one could have an opt-in mechanism to change to a new canonical name, but a new canonical name effectively means a new target, with all the migration pain that implies. It's not clear if that'd be worthwhile.

I wonder if we might be better off just having a "canonical autotools name" and "canonical Rust name" for each target, and being able to ask both autotools and Rust for whichever one your build process needs to know.

... which is what I'd suggested at the very beginning of this thread. :wink:

I don't follow, will you please clarify your point here: how is GCC "not a cross-compiler" and thus "there is no need for any such translation"?