Expand targets acceptable to rustc

Right, and the question is whether we have any hope of actually supporting that. Rust's compilation model is radically different from what your usual autotools... hack expects. I think that concerns like "what should target vendor be" aren't ultimately useful, because no one uses targets like that in the C world AIUI.

I honestly feel like it would make more sense to just build a GCC-flavored rustc driver specifically for people who want to slot rustc into some legacy build system that doesn't handle new toolchains well. Things like Clang's target parser would live there without exposing users of the "native" frontend to it. (Not that I'm saying this is easy, I just think this is a more principled solution than making rustc itself more GCC-like.)

RalfJung
June 21

mcy:

If you're asking what they accept for --target

Yes. Since autotools seem to be part of the argument, and autotools was presumably written to match gcc/clang here, I figured it would make sense for Rust to follow what those compilers do. Certainly if they would differentiate between "pc" and "unknown", I would consider it a mistake for rustc to treat them as equivalent.

Note that gcc does not have a --target option in the driver. It rather uses the --target configure option, which runs through config.sub (but in some cases, such as when searching for tools, it uses the exact input).

Now you say that full compatibility with gcc/clang is likely not desirable. That's fair, but the entire thread here (as I understand it) is about better compatibility with "other tools using target triples", so -- how compatible with gcc/clang do we need to be to make things work well?

Currently, the goal is not to accept the full list (which can be absolutely fun to parse), but a reasonable subset that has wider compatibility (which currently includes x86_64-pc-* where x86_64-unknown-$system is accepted, and also all i?86 from i386-i686), so as to avoid certain issues I've observed in working with such tools and rustc. The absolute maximum I'm seeking is to accept some meaningful subset of targets parsed by my target-tuples crate.

Also, gcc-like support may become inevitable, at least for some subset. config.guess producing x86_64-pc-linux-gnu on my system currently holds for gcc-rs, and if the exhaustive list of targets needs to be specified, then either that will need to be defined, or I wouldn't be able to compile gcc-rs in a default configuration.

We don't design things on the basis of "inevitability", and we should not take non-Rust implementations as fait accompli that we can't improve on. Rust features and Rust behavior is designed collaboratively upstream, and will continue to be.

I don't see any fundamental reason that would prevent handling various reasonable target aliases. We don't need to support any random name autotools comes up with, but we could support cases that substantially improve compatibility with specific distributions. For instance, we already know that some Linux distributions specifically build for x86_64-pc-linux-gnu and would find it convenient if that alias existed.

2 Likes

Still, I think that if an exhausitve list of targets is defined, it should not be incompatible with implementations that (at configuration time) use config.guess to determine the host (and default target) triple. If I'm a user on a system rustc supports, and gcc-rs or lccc respectively builds in default configurations on, I'd expect that I can configure the rust frontend on that default configuration and not have it fail. I don't particularily want the answer to "why doesn't the rust frontend work on my system" to be "because rust says it's not allowed to". This also isn't necessarily something that can be fixed, because the toolchain does have to be the one given by the exact target, and the first thing I'd try is ${CMAKE_C_COMPILER} -dumpmachine (I'm going to recuse myself from this particular line of discussion, because it's getting somewhat difficult to respond rationally).

While I do agree with the fact we don't need to support every possible name, because that would just be an absurd undertaking (config.sub is somewhere between a masterpiece and absolute heckery), something more general than a special case may be warranted here, especially since there are multiple ways of doing it (and, at least for me, all seem reasonable). Would it be a good idea to design the mechanism now, to work out those design decisions, while only specifically defining the special cases?

With my "de facto maintainer of autoconf" hat on, and without expressing any particular opinion on the specific case that sparked this thread:

  • The VENDOR component of a GNU "canonical system name" is vestigial. It exists because there was a brief period in the late 1980s/early 1990s where there were a bunch of Unix workstation vendors all putting out kit based on the Motorola 68xxx CPU, with operating systems identifying themselves as "Unix System V", but providing different C-level programming environments. So m68k-foonly-sysv and m68k-barley-sysv could have been meaningfully different. Nobody does this anymore, because the marketing value of giving your company's operating system its own name became obvious almost immediately.

    If you look at how GCC's configure.ac, for instance, processes canonical system names, you will see that most of the time it looks only at $(host|target)_cpu or at $(host|target)_os, and the rest of the time it uses shell glob patterns that ignore the vendor field.

    I would recommend documenting that #[cfg(target_vendor = ...)] is supported for completeness and (as far as the Rust Project knows) is not useful for distinguishing between any two currently supported systems.

  • It is reasonable for Rust not to bother supporting all of the names understood by config.sub. However, if Rust accepts system names A and B as valid, and config.sub maps A to B, then the entire Rust ecosystem MUST also map A to B, except for the very rare circumstances where it is desirable to preserve A exactly as input by the user (e.g. configuring GCC with --target=riscv-linux will cause it to look for an assembler named riscv-linux-as rather than riscv-unknown-linux-gnu-as).

    If you don't do this it will break all kinds of "downstream" processes that expect they can run any system name through config.sub and treat the result as the canonical representative of an equivalence class. If I understand OP's original request correctly (which I'm not sure I do), it is a report of exactly this type of breakage.

  • Just to avoid confusion, it would be nice if the Rust toolchain used the terms "build", "host", and "target" the same way Autoconf does: the "build" system is the system that the program is being compiled on, the "host" system is the system that the program will run on, and the "target" system is the system that the program will generate code for. A program that is not some type of compiler has no use for the "target system" knob.

4 Likes

Rust currently uses/abuses target_vendor to distinguish between Windows UWP and normal Windows targets (e.g. x86_64-uwp-windows-msvc vs. x86_64-pc-windows-msvc). Better ways to do this have been discussed but I don't think anything has been decided yet.

1 Like

Not exactly, since it's not that rustc accepts x86_64-linux-gnu and treats it as x86_64-unknown-linux-gnu instead of x86_64-pc-linux-gnu. Rather, it's that despite being supplied the latter as the (canonicalized) host, I have to pass the former to rustc (and, currently, I do so by "fixing" the tuple, swapping pc for unknown for x86_64 and i?86). Though it is forseeable that fix could result in that exact issue when used with a "less" smart $RUSTC that tries to invoke <target>-ld, if it doesn't accept the canonical host directly (or isn't prefixed with that name), in which case the --target $host is passed verbaitum.

Thanks for the feedback, @zackw!

That seems like a good idea, in general. Ignoring for a moment cases that currently use that vendor field to distinguish target variants (e.g. UWP Windows targets), we may want to attempt to deprecate the use of target_vendor for that, and consider introducing a separate mechanism for distinguishing targets that would otherwise have the same architecture and os. We have https://github.com/rust-lang/rfcs/pull/2992/ , which defines target_abi, such that a -linux-gnueabi target has target_env="gnu" and target_abi="eabi". That might work for UWP targets as well, and potentially other target variants.

@zackw Has there been any discussion elsewhere about conventions regarding things like gnueabi, and splitting that logically into gnu and eabi, with the latter being treated as a separate component with a separate name (e.g. "ABI" rather than "env"/"environment")?

This is exactly the behavior I'd like to see as well, with my lang and cargo hats on. If we map A to B, as far as any Rust code can tell it should always appear that the user passed B.

At least today, we currently have a fairly specific expectation for how we find the target linker, based on the target name; for any given target we'll tend to look for a specific linker by name. (I think there's plenty of room to improve that autodetection.) If we map target A to target B as an alias, and it might potentially be reasonable for a toolchain prefixed with A- to exist rather than B-, does it seem potentially reasonable to assume that either should be acceptable? And in particular, does it seem potentially reasonable to just automatically look for B- as the canonical name and then fall back to looking for the toolchain under an alias prefix, whether the user specified A or B?

I would tend to consider it problematic to have a system with both riscv-linux-ld and riscv-unknown-linux-gnu-ld pointing to programs with non-identical behavior. If we can ignore that case, we could just always look for riscv-unknown-linux-gnu-ld and then riscv-linux-gnu-ld and then riscv-linux-ld, using the first one we find.

Unfortunately, I think that terminology difference may be permanent at this point. This distinction didn't tend to arise within the Rust community, largely I suspect because Rust doesn't have a need to distinguish between toolchains that "target" different systems. One Rust toolchain can target many different systems; you don't need a different Rust toolchain for different targets. As a result, we only ever needed to distinguish "host" (the system you're running on) and "target" (the system you're compiling for), and that terminology has become pervasive throughout the Rust community.

1 Like

When cross-compiling lfs, the first stage toolchain uses the target name x86_64-lfs-linux-gnu (assuming you're building on x86_64, of course), which canonicalizes to x86_64-unknown-linux-gnu (which may be the host system tuple). That's a legitimate use case where canonically-equivalent target names will have functionally different toolchains. If rustc was cross-compiled in that stage (to avoid having to use binaries in BLFS), then you'd have to force rustc to find x86_64-lfs-linux-gnu-ld for the target linker.

While this may be true of rustc, this may not be true of other implementations of rust. I would assume, for example, that gcc-rs, like the rest of gcc, is a single-target compiler. However, I wouldn't argue that this should affect the terminology. It could easily be argued that when talking about host vs. target, it is referring to that of $RUSTC, not of the code being built by $RUSTC ($RUSTC here being anything that looks like rustc, so rustc, another rust compiler with the same CLI, or a wrapper arround one that has a different CLI). Although it is slightly annoying that I would need to set TARGET to @rustc_host_name@ and HOST to @rustc_build_name@ (those are special variables set in configure to the names passed to $RUSTC via --target for the host system and the build system respectively), and would probably need a comment along the lines of

Yes this is correct. TARGET in rust refers to the target the code is being generated for (which is the autoconf host target) and HOST refers to the target the code is being generated on (which is the autoconf build target)

in case anyone reading the code looking to file helpful issues thinks it's incorrect. It's not annoying enough that I'd want it changed though (especially since I know that would be an impossible battle to fight).

I'm surprised to see that neither the -pc- nor the -unknown- version of this name is converted to the other; config.sub gives me back both unchanged. Looking at the actual code, it seems most values of the vendor field will be returned unchanged; it only alters the field when it is absent (e.g. <cpu>-linux as an input) or when it knows a (cpu, vendor) or (os, vendor) pair is an alternative name for something (e.g. alpha-digital-dgux becomes alpha-dec-dgux).

Arguably x86_64-unknown not being turned into x86_64-pc is an oversight and inconsistency with the behavior of various other cases, and I'll bring it up on the mailing list specifically for config.{sub,guess}, but backward compatibility might prevent any change.

Not that I'm aware of. That conversation would be much more likely to occur on the GCC or Binutils mailing lists than any of the Autoconf-related mailing lists, and I haven't followed the toolchain lists in many years.

Putting these two things together I suggest the simpler behavior of looking first for the exact name given by the user, second for the canonical name, and not trying to look for a "more general" name by dropping components. I feel like dropping components is asking for trouble; for instance it's conceivable that riscv-linux-gnu-ld will link with GNU libc, whereas riscv-linux-musl-ld will link with musl libc, and the user made riscv-linux-ld a symlink to one or the other for unrelated reasons and then forgot they did that.

Yeah, if we could do it all over again from scratch I would imagine we would make GCC have a runtime-swappable back end the same way LLVM does. Back when I worked for CodeSourcery we actually started working in that direction but it's a huge overhaul and we ran out of funding.

I can see that it would be very hard to change what #[cfg(target_os = ...)] means now, and the autoconf/rustc inconsistency is only going to trip up people doing cross compilation, which is finicky anyway. So maybe just document it carefully and call it good?

Even in that case, though, having the build, host, target distinction can be useful. In lccc, the autoconf script overloads the --target option to set the default compiler target, acting equivalent to -DLCCC_DEFAULT_TARGET cmake option.

That sounds entirely reasonable to me. We can make Rust's linker detection prioritize the exact target string provided. We could then tell build scripts to use that linker.

I just don't want to tell build scripts the exact target string and let them guess, because then the guesses they use may vary from one build script to another; if we guess the linker and provide that linker to other tools then we get more consistency. We don't want different build scripts implementing their own different equivalents of config.sub, all of them slightly different. So I think we should always provide Rust code and build scripts with the canonical name, and then also tell them which tools they should use for that target, and those tools may potentially be detected using the non-canonical name the user provided.

If this were a project someone had the bandwidth to spend time on, I'd happily sponsor it, and encourage others to do the same. This is one of the biggest things I wish for when I'm using GCC for cross-compilation. I'd like to be able to cross-compile C as easily as I can cross-compile Rust.

An initial prototype might start out by aiming for a multiple-target version of libgccjit, which could initially avoid the complexity of handling target specs and other frontend issues.

Documentation will definitely help. But also, if you're planning on adding support for Rust in another build system, there's one specific mistake we made in Cargo that we're currently working on correcting, that I'd encourage you to not make in the first place. With Rust, because it can be told which target to build for rather than assuming a specific target based on the system it was built for, always tell it what target to build for, even if you're building for the same system you're running on. In other words, all compiles should be semantically treated as cross-compiles, and you should never care or depend on what target the Rust compiler defaults to. If we'd done that from the beginning in Cargo, we'd have avoided several problems that we're now in the process of slowly changing (with associated backwards-compatibility concerns).

1 Like

Oh, that is interesting. Currently, the macro I wrote, LCRUST_PROG_RUSTC, doesn't attempt to set --target in RUSTFLAGS for $RUSTC when --host isn't specified (specifically, the check is that x$host_alias isn't identically equal to x, a common way to check if a variable is empty, that I've seen). Of course, it then subsequently checks if a) $RUSTC works and can compile a simple program, and b) (when --host isn't specified) whether it can run that simple program (similar to what AC_PROG_CC does). Should I make it pass --target regardless (currently, the logic skips 3 invocations of rustc on my system, as it tries to determine the actual string it needs to pass),

Also, would this be true if the invoked rustc was prefixed with the target (specifically, does rustc extract the target from argv[0], like what clang does)? The macro currently assumes that if $RUSTC is prefixed with one of the normal targets (host or host_alias), that it doesn't need to pass --target.

I probably won't change this particular logic, even if this isn't rustc's default behaviour, as a user could easily have configured a rustc (or other rust compiler with rustc's cli) that defaults to a particular target, and installed it with that name, or could be using a compiler that does use the prefix, and the user always knows better than the build script. However, it would be good to know if rustc actually does so.

To clarify something, I'm suggesting this not because it fundamentally changes the behavior of the Rust compiler, (though I can imagine ways the compiler's behavior would benefit from it being done pervasively), but rather because doing so provides much more hygiene in the build tool and avoids ever conflating a build for the system you're running on with a build for the system you're targeting even if they happen to be building for the same Rust target.

I personally would, yes. You already know what the desired target platform is (based on prior invocation of config.guess if not supplied as an argument), and if you tell rustc explicitly, you avoid depending on whatever rustc's own default might theoretically be. You could also export both RUSTC and HOST_RUSTC (names naturally subject to bikeshedding; I'm ), so that if a user needs to build a proc macro or build script or something else intended to run on the system they're building on, they have a separate tool with which to do so. (Likewise, RUSTFLAGS needs a different version for host and target, even if they're the same, because the desired flags might not be the same. This is something that'd be desirable to work out for Cargo.)

As one of many potentially interesting possibilities: consider a system that happens to have two architectures installed, such as 32-bit i686 and 64-bit x86_64. Suppose the user invokes your build system expecting to build for 32-bit, but happens to have a 64-bit rustc installed. (Suppose they're building a very large program, and they need a 64-bit tool to avoid running out of address space.) That 64-bit rustc may know how to build 32-bit binaries, but it might default to the same architecture it was built for. But if you're invoking the build script in 32-bit mode, you likely want to do your build for a 32-bit target unless you specify otherwise. In particular, if you do have a 32-bit C compiler installed but don't have a 32-bit Rust compiler installed, I think the most useful behavior is to build for 32-bit anyway, rather than building incompatible object files and then giving inscrutable errors when trying to link them together. The architecture rustc happens to be compiled for is an implementation detail that shouldn't determine your target unless you have no better means to determine what target you want to compile for.

rustc doesn't care what name it gets invoked by:

/tmp$ cat hello.rs 
fn main() {
    println!("Hello world");
}
/tmp$ ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc hello.rs -o hello && file hello && ./hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e767329f8afa26862a973c0bd052bfc2aa52d961, for GNU/Linux 3.2.0, with debug_info, not stripped
Hello world
/tmp$ bash -c 'exec -a i686-unknown-linux-gnu-rustc ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc hello.rs -o hello' && file hello && ./hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e767329f8afa26862a973c0bd052bfc2aa52d961, for GNU/Linux 3.2.0, with debug_info, not stripped
Hello world
/tmp$ bash -c 'exec -a some-unknown-target-rustc ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc hello.rs -o hello' && file hello && ./hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e767329f8afa26862a973c0bd052bfc2aa52d961, for GNU/Linux 3.2.0, with debug_info, not stripped
Hello world

(exec -a sets the program name, so it's useful to test running a program as if invoked by a different name.)

I have a hard time imagining a benefit to prefixing rustc with a target, unless it was just to work around a build system that can't deal with a toolchain that supports multiple targets. And if that were the reason, I'd just suggest producing tiny wrapper scripts that invoke the compiler with appropriate arguments. But ideally, I'd instead prefer to have the build system understand how to invoke the Rust compiler for different targets.

Indeed, and this is already done if the script invokes a different macro, with basically duplicated logic. The shell variable is called RUSTC_FOR_BUILD for consistency with the analogous CC_FOR_BUILD, though, rather than HOST_RUSTC (which could get confusing in autotools land). Likewise, flags are set in RUSTFLAGS_FOR_BUILD.

Well, the specific case here was to deal with single-target rust compilers like gcc-rs (which get prefixed with the --target name). The special case is also more generally useful, because it means that a user could, for example, invoke configure with --host x86_64-linux-gnu and RUSTC=x86_64-linux-gnu-myrustc, and passing in --target x86_64-linux-gnu is redundant if myrustc is either single target or that uses the program name to determine the target (lccc does, primarily for compatibility with clang and with autotools searching for prefixed cross-tools).

Well, certainly, cargo can do that. And for more general cases, cmake is likewise capable, though cmake doesn't come with the tools to deal with targets (to my absolute dismay), and invoking rustc through cmake is somehow even more of a pain then it is through a Makefile. Though in this case, it is a simple matter of changing the detection logic (and removing the host-prefix special case, though, as I say, I probably won't).

A couple issues with this, though:

  • It wouldn't work for less used binutils (unless cargo wants to export everything), or for custom target-specific tools (cargo definately cannot be exhaustive).
  • knowing the particular linker won't benefit tools that indirectly invoke the linker, unless it knows how to get it. -fuse-ld for gcc-likes doesn't allow you to set the absolute path to a linker IIRC (clang tried, but I think it broke something I got reverted. gcc certainly doesn't). Like target-specific tools, I highly doubt cargo could ever be exhaustive.

In my case, all of my (rust) build scripts would run targets through target-tuples, which has a goal of consistency with config.sub. However, I can see the potential issue with other crates doing the same.

Another incompatibly I found: rustc does not accept i686-w64-mingw32 or x86_64-w64-mingw32 (32 and 64-bit mingw targets), which should be functionally equivalent to i686-pc-windows-gnu and x86_64-pc-windows-gnu respectively. The major difference between this case, and the linux case is that config.sub rejects the latter case (it doesn't know the windows-gnu).
It's probaly a good idea to ask if windows-gnu is the same as mingw32 (w64 apparently is just something that can, probably, be swapped with pc on those targets).

Accepting some of those as aliases seems reasonable, yeah.