Conditionally optimizing for low-resource targets

Some previous discussion around this issue: Idea: Allow to query current optimization level using #[cfg(opt_level="..")]

In cryptographic implementations, it's often quite easy to trade performance for various things like smaller code size, lower stack usage, etc.

The question is how to gate such optimizations. I see based on the past discussion there was a lot of opposition to allowing some sort of introspection of -Os/-Oz for this (e.g. a #[small] attribute, perhaps?).

At the same time we have people complaining that something like a compact cargo feature is an inappropriate use of cargo features.

An alternative I'm not sure which has been discussed is making this a property of individual targets, so it doesn't need to be explicitly configured by the end user.

What's the best way to solve this, now or prospectively?

3 Likes

This seems like a specific case of the more general problem with cargo features: there's a major use case it doesn't support, and cargo should provide that functionality somehow but it currently doesn't. This use case is an enumerated set of mutually exclusive options which do not impact the surface API, and the decision should only be made by the final end user creating a binary. This could be things like optimizing for speed vs code size vs something else, or which backend to use behind a shared interface, etc.

cargo features aren't good for that use case because they can't be mutually exclusive and they can be turned on by libraries. But cargo doesn't provide good support that use case through any other mechanism, so a lot of people use features for this (possibly with other hacks, like cfg'd compiler error macros), causing problems.

14 Likes

Correct me if I'm wrong or missing something, but isn't this essentially what "target features" do? Granted there aren't "general" target features like small or anything, but enabling avx vs sse will affect the code size and speed.

Target features do behave that way, and we do leverage them for CPU intrinsics for cryptographic algorithms.

However the question here is about portable Rust code which does not make use of CPU intrinsics, namely code where obvious/trivial time/memory tradeoffs or code size tradeoffs are possible and we want the application to decide whether it wants to opt into using a slower but more compact implementation.

1 Like

Right, but there’s no technical reason that target features can’t also serve this purpose, it’s just that it currently doesn’t. So in my mind we should ask and figure out “why not target features?” first before thinking of proposing a new mechanism to do this.

1 Like

target_feature is generally considered to be unsafe. I think it's reasonable to split this out into a new config that's like target_feature, but doesn't carry the unsafety baggage that target_feature has. That's my initial response to "why not target features?" but to be honest I haven't given this enough thought to have a serious opinion.

1 Like

Mutually-exclusive or subtractive features would be very handy for features like alloc or no_alloc, to allow or disable use of a system allocator. In some cases, alloc is a feature, because it adds functionality, like new functions and traits that cannot be used without a system allocator, and cases where no_alloc is required to bring in dependencies like arrayvec to use a stack-allocated vector rather than Vec. A great example of this being a hypothetical problem is in nom, which uses my dependency lexical-core, which can optionally use Vec or ArrayVec, feature-gated through the no_alloc feature. Meanwhile, nom has an alloc feature which adds numerous functions that require a system allocator. Since we cannot have mutually-exclusive options, or subtractive features, there's no good way for nom to disable lexical-core's no_alloc feature when alloc is enabled. The alternative would be making arrayvec a mandatory dependency for lexical-core, which defeats the purpose.

My understanding is target_feature is pretty much a thin wrapper around a similar concept in LLVM, so I'm not sure it's the right tool for the job here.

What are you proposing concretely? Something like this?

#[cfg(all(target_feature = "compact"))]

If so, what would something like this do:

#[target_feature(enable = "compact")]

While that might nice, I don’t really see how a new option would be able avoid the same pitfalls, because this feature will still have to interact target-features, and you could easily run into the same issues as you would if you used target-features. Rustc doesn’t really know if a feature does generate UB, because it’s mostly passed directly to the codegen backend, so even if we knew it worked for LLVM, it could generate UB in Cranelift, or GCC, etc.

I would again go back to this being about the current usage, and not about what it can do. The perception of target-feature being LLVM specific will have to change eventually as new codegen backends are introduced which don’t have share the same feature set. In rust-gpu for example, we already use target-features for capabilities and extensions in SPIR-V. Rustc already handles target features itself in the generic SSA backend rather than LLVM, and has translation code converting from Rust target features to LLVM features.

I'm sorry, I'm not following. Why would this feature still have to interact with target-features? If I have a crate named my_crate, and it provides two mutually exclusive features foo and bar, why would target-features have to be involved at all?

The discussion isn’t about mutually exclusive cargo features, as boats said this is just what people are currently doing. I feel like that’s separate from this discussion of codegen features like a compact, and a compact has to interact with target-features because what instructions are available are dictated by the target features, so how compact you can make it depends on what else you’ve enabled.

For what it's worth, in the specific use case I have in mind this isn't the case. The code in question is fully portable, however there are optimizations that can do the following at the cost of performance:

  • cut total stack size required in half (with a corresponding halving of performance, i.e. TMTO)
  • reduce overall code size by ~20%
1 Like