Pre-RFC: Cargo Target Features

I’ve been playing with AVX lately, and I experienced first-hand that building a program that uses non-standard instructions with Cargo is not as easy as it could be. I would like to propose a way to specify required and optional target features in Cargo.toml. I don’t know if such a small change actually requires an RFC, but I am requesting comments on this, so here we go:


  • Feature Name: cargo_target_feature
  • Start Date: 2016-03-22
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

Add a required-target-features field to Cargo.toml where libraries can declare their required target features, and a target-features field where top-level projects can declare the target features that they want to enable.

(Target features here refer to the -C target-feature codegen option, not to the [features] section of Cargo.toml.)

Motivation

Special instructions that require target features to be enabled can often provide a significant perfomance win on modern CPUs. Projects that wish to take advantage of this (e.g. audio and video encoders or decoders, cryptography libraries, scientific computing libraries) should not be harder to compile and use than any other project.

Detailed design

A project can use a target feature in two ways: optional, or required. When a target feature is required, the code relies on the corresponding instructions being available. An example would be a matrix manipulation library where the algorithms are tuned specifically for the AVX instruction set.

It is also possible for projects to take advantage of target features when they are available and gracefully fall back otherwise. The #[cfg(target_feature = "...")] attribute can be used for this. An example would be a vector math library that can use AVX when available, fall back to SSE if that is available, or fall back to serial operations otherwise. A different example would be a cryto library that has an AES implementation based on the AES instructions, and a fallback implementation in pure Rust.

When compiling, only the top level project knows what system it is going to run on, so the top level project should control which target features are enabled. If one of the dependencies requires a certain target feature, it should not be enabled silently: this might unintendedly make the resulting binary incompatible with some systems. This leads to the following design:

  • Libraries should declare their required target features in the required-target-features field under [package]. The type of this field is a list of strings. Example:

    [package]
    required-target-features = ["sse", "sse2"]
    
  • The top-level project should declare the target features to build with in the target-features field under [package]. The type of this field is a list of strings.

  • Cargo will verify that the union of all required-target-features in all dependencies is a subset of target-features of the top-level project. If this is not the case, building will fail with an error message indicating which target features must be enabled, and by which crates they are required.

  • The target-features declared in the top-level project will be passed to the compiler for all crates that need to be compiled. For instance, if in Cargo.toml the following features are enabled:

    [package]
    target-features = ["sse", "sse2"]
    

    Then the compiler will be invoked with:

    $ rustc ... -C target-feature=+sse,+sse2 ...
    

Drawbacks

None that I can think of.

Alternatives

  • RUSTFLAGS support just landed. It is possible to set RUSTFLAGS to -C target-feature=+feature. This can even be done from .cargo/config. However, this mechanism is not intended for required target features of a crate, and it spreads build configuration over two files, instead of keeping it central in Cargo.toml.

  • It is possible to call cargo rustc -- -C target-feature=+feature, but this will only pass the target feature to the top-level project, not to dependencies. This makes e.g. the simd crate completely useless. Furthermore it is not very ergonomic to use.

  • In the proposed design, instead of failing to build if a required target feature is not declared, it could be enabled implicitly. This will make it easier to build, but it will also make it easier to accidentally produce binaries that are incompatible with some systems.

Unresolved questions

  • The [package] section feels like the wrong section to put target-features and required-target-features under. Other codegen options are currently under [profile.*], but target features generally do not differ per profile (you want to debug the same code that fails in release).

    Perhaps they should go under [target.<triple>]? This makes sense because the target features depend on the architecture. But what would putting target features in [target.cfg(target_feature = "...")] do? And how to indicate that only one target is supported?

  • Should target-features default to required-target-features? This can reduce boilerplate for libraries.

  • Should there be a way to couple target features to features in the [features] section?

  • The current design only allows passing target features as -C target-feature=+feature. Is there any use case for passing -C target-feature=-feature?

  • Should there be a way for crates to advertise their optional target features?

  • What about target-cpu?


I haven’t contributed anything significant to Cargo before, but I am willing to attempt to implement this.

Feedback would be appreciated!

For things like AVX and AES-NI in x86-only crates, I oppose this proposal and would strongly prefer library crates to do runtime feature detection using CPUID. No extra work for Cargo or the end application, and gives you wide binary compatibility as a perk. AFAIK, no recent user-mode x86 extension has provided fundamentally new capabilities, so you can always provide a fallback using older instructions and/or pure Rust code.

This doesn’t work for SSE2, because SSE2 affects the ABI, but at this point I think it’s reasonable to assume that by default (as Rust currently does).

This also doesn’t work as well for the supervisor-mode extensions that do provide new capabilities, but you can still have a runtime check and return an error.

If you plan on exporting very short functions that do this, it would be a good idea to cache the relevant feature flag in a static mut AtomicUsize, since CPUID is a serializing instruction. Also this runs into trouble if you have a multi-socket system with CPUs of different families/generations, because the CPUID could run on the newer CPU and the process rescheduled to the older one, but runtime feature testing is common enough in non-Rust programs that that ship sailed long ago.

So, what can we do to Rust to facilitate and encourage runtime feature testing?

Finally, remember that not all the world is x86, and we should design Rust to facilitate using the pure-Rust option on non-x86 CPUs (and I don’t just mean ARM – I’m also talking about the architecture neither of us has heard of that will be widely deployed 15 years from now.)

You would still need to pass the -C target-feature=+feature, otherwise LLVM will not be able to compile the code, even if at runtime you decide not to use the feature. If you want to do runtime feature detection that is great, but this is up to the author to decide. If you know beforehand that your program is only going to run on a specific system, then it does not make sense to implement a fallback.

You do bring up an interesting point though, because runtime feature detection means that a target feature can be required for compilation (i.e. you must pass -C target-feature=+feature to be able to compile at all), without making the resulting binary incompatible with systems that do not sport the feature.

Rust tries very hard to turn runtime errors into compile errors. On the other hand, with the runtime check you can at least print a nice error message. If the instructions are used unconditionally, the program will simply crash. If you do know the system you are going to run on, then you might not want to pay the cost of a runtime check.

I think that we cannot rule out either static features or dynamic detection, the proposal should be able to facilitate both.

I happened to pick x86_64 examples here, but the proposal is not specific to any architecture. Do you see any issues here?

I think that this can all be done by libraries. For instance, a crypto crate could query the CPU features and dispatch on the features in an encrypt function, and users of the crate automatically get the most efficient implementation without worrying about anything.

This only works well for relatively large functions though. For things like adding f32x8s from the simd crate, you want these to be inlined; feature detection for every add is far to expensive. Then detection should happen at a higher level, but now it gets complex quickly: ideally the simd crate would provide a fallback implementation of Add and the user of the crate should not have to duplicate any code. But it is the user’s code that will have to be compiled twice.

Thanks for posting this! I’ve certainly had thoughts on this before, and I’ll try to transcribe them for now as well :slight_smile:

Cargo works pretty hard to make it possible that the top-level package has complete control over the build in terms of flags and codegen. For example the [profile] sections of dependencies are ignored so the top-level decides things like codegen-units, optimization level, debuginfo, etc. I’m not sure that we want to encourage a required list of target features because this is the opposite of this pattern. The top-level package no longer has that level of control and you may not realize that you shouldn’t have been generating AVX instructions until too late.

Along those lines I might be hesitant to move on this just yet and prefer to keep it under wraps as long as possible. Support for RUSTFLAGS landed recently which should allow passing -C target-feature=... to all compilations, but beyond that I agree with @sorear that the best practice here anyway is to write code which compiles and works without these target features enabled, and it seems like Cargo should encourage that.

1 Like

This is not true. TSX and SGX come to mind. It is possible to write a library that defines new synchronization primitives that can only be used if TSX is enabled. Thus, the dependency on TSX should trickle up the dependency tree to the final build product. Likewise, using a library that depends on SGX without SGX is either not possible or removes the security guarantees that that library was aiming to provide.

1 Like

This is why I propose that the top-level project specifies target-features, and the top-level project and dependencies get compiled with these features. But authors might want to write code that depends on a certain target feature, and if they do not provide fallback code, that target feature is definitely required — there is no getting around that. Hence, when one of the dependencies requires a feature that was not selected by the top-level project, Cargo should refuse to build.

Without target features we are discarding a lot of the work that went into making CPUs faster over the past 20 years and programs will utilise only a fraction of the available computing power. Of course it would be nice if authors also write fallback code, but that is for the authors to decide. Not all crates compile/work on every OS either.

In the mean time, those dependencies could do something like:

#[cfg(not(target_feature="avx"))] const E:MUST_BE_COMPILED_WITH_AVX=();

Hm, true, it is indeed in the project section! I think for those purposes, however, it may be more appropriately configured in [profile.foo] as that's already where configuration like opt-level and debuginfo are located.

Yeah it's definitely true that Cargo needs to support this in some functionality, but there's a question of influence here to. For example if this requires RUSTFLAGS, then that's possible, but Cargo is still nudging towards doing the "proper thing" by adding a fallback. In that sense it's largely just a question of how ergonomic this should be.

I find this a pretty interesting question as well, actually. I've been curious in the past if there's a select few situations where we could do something like -C target-cpu=native or something like that to take advantage of the host CPU.

I agree. My only concern with this is that you’d have to duplicate the features for every profile. (See also the first point under unresolved questions.) Or is there actually something like an “all” profile?

I see your point. Crates could still provide a specialised implementation behind #[cfg(target_feature="...")], optionally one that falls back to the regular version after a runtime check. Or authors will just add a fallback that panics, or one that is a compile error as suggested by @jethrogb. Is there something we can do to encourage adding a fallback, as opposed to discouraging using target features at all?

A slightly related issue: if you went to the trouble of implementing specialised functions that use target features, but Cargo then discourages using them, or the default is to use the fallback version, that would be a shame. Especially if you do support runtime feature detection, so there is no harm in enabling the target feature. So you add “put this snippet in your .cargo/config to use the fast version” to the readme, and everything that depends on that crate now needs to add that to their readme as well …

In my experience with C++, just doing that doesn’t magically make the code faster. Perhaps LLVM can vectorize a few loops or use a more efficient instruction here and there, but the real win comes from being able to do 4/8/16 operations at the cost of one, and this often requires invasive changes to the code. (Actually, my experience is that target-cpu=native produces a binary that crashes with an illegal instruction, but that is likely an LLVM bug :yum:)

I’ve thought about writing an RFC for something similar, however my approach is different. Rather than using an attribute in Cargo.toml, the code for a library would use a compile-time assert to ensure that any features it requires are available.

Example:

#![cfg_assert(target_feature = "avx", "This library requires AVX support")]

This can also be used for OS or architecture-specific code:

#![cfg_assert(target_os = "linux", "This library only works on Linux")]

#![cfg_assert(any(target_arch = "x86", target_arch = "x86_64"), "This library only works on x86")]

I don’t think it’s Cargo’s job to figure out exactly what features a library requires. With my approach any incompatible features (as defined by the top level project) will simply result in a compilation error when building dependencies along with a nice error message from the library author.

2 Likes

This is really neat idea! Then the required-target-features field can be removed entirely.

@Amanieu that’s a fascinating idea! It somewhat plays into the idea of static assertions as well, but it seems like a great way at least to me to indicate that a library only works on some platforms or with some features.

@ruuda yeah unfortunately there’s no [profile.all] “profile” for this, and in general I’ve felt a little uncomfortable with profiles recently (for reasons like this). I do also wonder if the “speed bump” that Cargo would provide of requiring RUSTFLAGS is just too much of a speed bump to promote usage of these sorts of features throughout the ecosystem.

One vector to take here is to greatly improve the runtime detection story here. One problem today is that even if you do runtime detection, there’s no way to force a function to be compiled with AVX support unless the entire crate has support. In theory your runtime detection would delegate to a number of differently compiled functions, but the story there in Rust is pretty bad.

I guess all in all my feelings are:

  • If we can not worry about required-features in favor of something like @Amanieu’s proposal, that sounds great!
  • If we can put features in [profile.*] sections (like opt-level), then that meshes well with what we’ve got going today. It’s somewhat unergonomic, but it’s arguably unergonomic for doing something like “turn on debug assertions in all modes” as well, so a solution there would solve both problems.
  • Eventually we can bolster the runtime-detection story, although we may not quite be there just yet.

There is one more issue with supporting runtime feature detection: for the specialised code to be compiled, the target feature must be enabled. But when the target feature is enabled, LLVM is free to use those features everywhere. Or am I missing something? For instance, if you enable SSE then LLVM will be free to use SSE registers, even if there are no simd types or intrinsics in the code. So the final executable will still not run on every machine.

So to properly support runtime feature detection, more fine grained control would be needed in the compiler. It seems that there is no easy way to do this with Clang or GCC either, except for setting the target features per translation unit.

I believe more recent versions of LLVM support code generation attributes per-function so we can tell LLVM to enable SSE for just one function but not the entire program. There’s currently no way in Rust itself to specify this, however.

If you plan on exporting very short functions that do this, it would be a good idea to cache the relevant feature flag in a static mut AtomicUsize, since CPUID is a serializing instruction.

I'll just leave this here:

<plug type="shameless"> crates.io: Rust Package Registry </plug>

1 Like

I filed this issue about target-cpu=native, which is a bit different (simpler!?) because it’s about opportunistic exploitation of the target’s features, i.e. code that does not require but greatly benefits from using non-default (rustc on x86-64) features like SSE4.2, AVX, etc.

If you support target-cpu=native, is there any harm in supporting all target CPUs?

How about using ifuncs on Linux to select the best implementation at runtime?

Another issue that I ran into with RUSTFLAGS: if you want to build for a target CPU with different features than your host CPU, build scripts get built with RUSTFLAGS too, which means they might not run on your system and you can’t build. So if the flags can be set from Cargo.toml, they should not be applied to build scripts.

1 Like