Pre-RFC: Mutually-excusive, global features

epage · September 25, 2023, 1:35pm

This is very hand-wavy and very much targeted at brainstorming. At this stage, what I suspect will be most useful will be

Describing where features are not working well or where are you using --cfg
Planning for how it would look with this proposal instead
Evaluating how well this proposal meets your needs. What works well and what doesn't?

I do not expect to have the time to see this through the RFC process and implementation. However, I will be willing to help shepherd this effort if someone is willing to write the RFC and drive this to implementation.

Feature Name: global-features
Start Date: 2023-09-20
RFC PR: rust-lang/rfcs#0000
Rust Issue: rust-lang/rust#0000
Edit history

Summary

Cargo features are targeted at direct dependents while some decisions need to be made by the top-level crate (usually a [[bin]]). This is currently worked around by tunneling features up or relying on environment variables.

This RFC proposes an alternative to features called globals that are targeted at decisions that affect the entire final artifact.

[package]
name = "git-tool"
version = "0.0.0"
set-globals = { sys = ["static"] }

[dependencies]
git2 = "0.18"

[package]
name = "git2"
version = "0.18.5"

[dependencies]
git2-sys = "0.18.5"

[package]
name = "git2-sys"
version = "0.18.5"

[globals.sys]
description = "External library link method"
values = ["static", "dynamic", "auto"]
default = "auto"

Differences from features

Setting them applies to transitive dependencies
Enumerations, rather than "present" / "not-present"
No unification

We suspect this will work best for modifying behavior while features will work best for extending APIs.

Motivation

Use cases

A crate author offers optional optimizations, like using parking_lot
- Currently solved by using a feature like in tokio
- This requires callers to either enable it directly, re-export the feature, or have applications directly depend on tokio and enable it.
-sys crates need to make the decision of whether to statically link a vendored version of the source of dynamically link against a system library.
- Currently this is solved by a variety of means, usually by dynamically linking if the system library is available and then falling back to the vendored copy unless a vendored feature is enabled.
- This requires callers to either enable it directly, re-export the feature, or have applications directly depend on the -sys crate and enable it.
- To properly represent this, we need three states: "dynamic", "static", "auto"
- See also Internals: Pre-RFC Cargo features for configuring sys crates
Enable alloc or std features across the stack (rust-lang/cargo#2593)
Multiple backends
- flate2 has multiple backends
- Inkwell supports multiple LLVM backends
- mlua supports multiple backends and Lua versions
- async-std vs tokio runtimes
Module-level parameters
- Inline-size control in kstring

Why are we doing this? What use cases does it support? What is the expected outcome?

Guide-level explanation

TODO

Reference-level explanation

Packages may declare what globals they subscribe to:

[package]
name = "git2-sys"
version = "0.0.0"

[globals.sys]
description = "External library link method"
values = ["static", "dynamic", "auto"]
default = "auto"

[target.'cfg(any(global::sys = "dynamic", global::sys = "auto"))'.build-dependencies]
pkg_config = "0.3.27"
[target.'cfg(any(global::sys = "static", global::sys = "auto"))'.build-dependencies]
cc = "1.0.83"

Global names follow the same naming rules as cargo features
Global values may be any UnicodeXID continue character
TODO: Add deprecation support

These can then be referenced in the global namespace in the source, like this build.rs

#[cfg(any(global::sys = "dynamic", global::sys = "auto"))]
fn dynamic() -> Option<Target> {
    // ...
}

#[cfg(any(global::sys = "static", global::sys = "auto"))]
fn static() -> Option<Target> {
    // ...
}

fn auto() -> Option<Target> {
    #[cfg(any(global::sys = "dynamic", global::sys = "auto"))]
    if let Some(target) = dynamic() {
        return Some(target);
    }

    #[cfg(any(global::sys = "static", global::sys = "auto"))]
    if let Some(target) = static() {
        return Some(target);
    }

    None
}

References by this crate to global names/values in cfgs and cargo:rustc-cfg build script directive will be validated based on RFC 3013 (rustc and cargo)

Workspace globals may also be specified and a package may inherit them:

[workspace.globals.sys]
description = "External library link method"
values.enum = ["static", "dynamic", "auto"]
default = "auto"

[globals.sys]
workspace = true

[globals.zlib]
description = "zlib implementation"
values = ["miniz", "zlib", "zlib-ng"]
default = "auto"

This initial design has no overridable fields when inheriting, unlike inheriting workspace dependencies
Like workspace dependencies, cargo new will not automatically inherit workspace.globals

Packages may configure globals for use when built as a root crate

[package]
set-globals = { sys = ["static"], zlib = ["miniz"] }
# may also be inherited from the workspace
# set-globals.workspace = true

Globals are arrays in case two packages are subscribed to that global with disjoint valid values
When building multiple root crates (cargo check --workspace), it is a compilation error if they disagree about set-globals (i.e. no unification is happening like it does for features)

Also these may be set:

On the command-line as --globals sys=dynamic
As a config setting

When resolving dependencies (the feature pass), fingerprinting of packages, an when compiling the packages, workspace.set-globals will only apply to a package if the value is valid within the schema for that package, otherwise the default will be used if any. When multiple values are valid, the first will be selected. Any unused set-globals will be a warning.

cargo <cmd> --globals (no argument) will report the globals available for being set within the current package along with their current value and valid values.

SemVer

SemVer Compatible

Adding a global
Adding a global value

Context-dependent for whether compatible or not

Changing a global default
Removing a global
Removing a global value

Drawbacks

TODO

Rationale and alternatives

This couples mutually exclusive features with global features because

Mutually exclusive features provides design insight into what we should do for global features
Scoping mutually exclusive features to global features solves the feature-unification problem

Global features take a value, rather than allowing relying on presence like normal cfgs (see RFC 3013)

Less brittle for future evolution
Easier to unify for a distributed schema
Removes the complexity in declaratively defining presence and value instances

package.set-globals is a package setting, with conflict errors, rather than a workspace setting because people will likely have binaries that serve very different roles within the same workspace (different target platforms, host vs wasm, etc).

The loose schema is used to allow it to be declared in a distributed fashion without packages failing due to definition conflicts. Alternatively, we could make these parameters on individual packages. The downside is then we'd need a patch-like table for specifying the exact dependency in the tree to set the parameters. This scheme is brittle in handling upgrades.

Unused set-globals is not an error so that changing dependencies is not a breaking change.

Alternatives

Native support for controlling `std` / `alloc` / `core`

See Internals: Pre-Pre-RFC: making std-dependent Cargo features a first-class concept

Native cargo support for `-sys` crates

Instead of a convention around global features, cargo could have built-in flags for controlling -sys crates and could ship some native support that would remove some boilerplate and ensure consistency.

Downsides

Longer time frame: this would require more investigation and experimentation.
Doesn't handle all use cases

See

Internals: Direct support for pkg-config

Automatic feature activation

Too many spooky effects

See

Mutually exclusive features conflict via an identifier

Some systems call this "provides" and others a "capability".

Downsides

No way to unify these, the decision needs to be at the top-level crate.
Linux distributions that use this implicitly wire the packages together by dropping files in place with compatible names while Rust/Cargo need explicit wiring, requiring the choice to be bubbled up the stack. See instead "facade crates".

See

Mutually exclusive features that instantiate distinct copies a dependency

While there might be a place for a variant of this idea, most motivating cases are dealing with wanting one version of the dependency.

See rust-lang/cargo#2980

See also Internals: module-level generics

`log` / logger split

Some cases can be modeled like log where a trait is defined and an API is exposed for regisering a specific implementation.

If the interface crate is able to also depend on the implementations, it could also initialize the global with a default on first use if its unset.

Benefits

Works today
Flexible on what state an implementation can be initialized with

Downsides

Doesn't work for all use cases
Imperative wiring together of APIs in the top-level crate
Some level of overhead
Supporting a default requires building all implementations since the choice is at runtime

Facade crates

On internals, the idea was proposed to bake-in support for the log pattern so it can be done at compile time.

Downsides

Doesn't work for all use cases like -sys crates

Prior art

Gentoo USE flags

USE flags are booleans (set or unset) that are defined through a form of layered config, with some defaulted to on. Users may then enable more or disable some of the defaulted flags. This can be done globally or on a per-package basis. Packages then check what is enabled and conditionalize their builds off of them.

Bazel

Features may "provide" a feature and it is an error to "provide" a feature more than once.

Gradle

This seems work similar to Bazel but instead of "providing" a feature, you declare a capability.

See also Handling mutually exclusive dependencies

Unresolved questions

Target-specific set-globals
Profile-specific set-globals (for custom performance / debugging experience, see Conditional compilation: Cfg option for debug vs. release Cargo(-esque) profile · Issue #14974 · rust-lang/cargo · GitHub)
Naming, especially for package.set-globals and default
Whether values.enum is needed vs just values
Can we make RFC 3013 work with the root cfg namespace being an open set of fields while global:: namespace is a closed set?

Future possibilities

Loosen error on conflicting globals

Most of the time, libs may be selected as part of the root crates (cargo check --workspace) but effectively aren't a root crate. Can we loosen the requirement on these so they don't need the same set-globals as others in the workspace, making it easier to build?

`cfg_str!`

Expose value set for the current package as a &'static str

let foo = cfg_str!("global::foo");

Error for cfgs that have a name without a value

Additional `globals.*.values` validation rules

Currently, we only support validation based on a set of possible values.

Potential fields for the globals.*.values include:

globals.foo.values.bool = true
globals.foo.values.range = "8-23"
globals.foo.values.path = "ImplForTrait"
globals.foo.values.expr = "some_func()"
globals.foo.values.any = true (makes values an open set, predantic warning available when it falls back to this)

(not a tagged variant to make future evolution easier)

Multi-valued globals

Extend the definition of a global to include a multi field. Instead of new assignments overwriting, they append.

josh · September 25, 2023, 1:57pm

Another potential option, rather than a global namespace: we could attach global features to a specific crate, and then for common things like sys we can attach them to a dedicated crate that only provides features; then, -sys crates can depend on that crate to make use of that feature, and application crates can depend on that crate and reference that feature. This has the advantage of allowing any coordinating set of crates to have global features, and not requiring us to maintain a single global:: namespace.

Another advantage of this: you can easily notice if a -sys crate doesn't make use of the global feature, because it doesn't have the dependency.

(Something like zlib, for instance, seems like it makes more sense attached to libz-sys.)

epage · September 25, 2023, 2:18pm

When I had considered per-package parameters, I was going in the direction of preventing ambiguity which lead to needing package spec IDs which you then need to update whenever you update your dependencies, making it more brittle as well as not very ergonomic.

Whats also implicit in what you said is that you'd need to be able to query the state of a parameter of your dependency. I was trying to make these built-in cfgs only affect packages that explicitly mention them as this narrows the scope for fingerprinting for better cache reuse I feel like I'd need to do some deeper stepping through of how this affects resolving of dependencies and parameters and the validation of parameters.

djc · September 25, 2023, 2:21pm

For TLS there has long been a proliferation of features for configuration:

native-tls vs rustls
For rustls, using webpki-roots or rustls-native-certs (and we'll soon start recommending another option)
Current alpha releases of rustls allow the caller to pick a CryptoProvider impl to provide cryptographic primitives (for which ring will likely still be the default in the first release)

Given that TLS and the backend for cryptographic primitives are security-sensitive choices, Cargo features (where choices can be influenced via "spooky action at a distance" from other crates in your dependency graph) have not been a great fit. We've recently discussed the requirements for FIPS certification, where a binary must be shown to only use FIPS certified cryptographic primitives, for example.

One alternative is typically to allow the application to provide their own rustls::ClientConfig or ServerConfig to a library, but that in turn means that rustls becomes a public dependency of every library that needs TLS, which has its own downsides in terms of managing semver updates.

Another interesting point in the design space here is something like an ability to specify that a feature must not be enabled. As in, the application can specify !feature and this would require Cargo to raise an error if anything in the dependency graph tries to enable that feature.

epage · September 25, 2023, 2:25pm

This reminds of me another potential use for globals: asserting on a condition. This requires buy-in from all cryptographic libraries but if you set fips = "true", then it could be a compiler error to enable a feature for non-fips code. Similar for a no_std and no_alloc asserting that std parts of packages shouldn't be unintentionally enabled.

kpreid · September 25, 2023, 2:27pm

What if globals were instead set in [profile]? Advantages of this:

Precedent for a package having global build settings that only apply if the package is the build root is that they go in [profile]
Packages can provide multiple profiles with different globals, either dev vs release or custom named profiles

djc · September 25, 2023, 2:29pm

FWIW I don't much like the notion of a centrally managed list of globals. Maybe another option here would be for crates to opt-in to features based on whether another crate is present in the dependency graph: if the serde crate (with the derive feature) is being built anyway, enable the serde feature.

epage · September 25, 2023, 2:30pm

In the past, that had been effectively rejected, see Eager optional dependencies · Issue #2593 · rust-lang/cargo · GitHub. Doesn't mean we can't revisit it.

epage · September 25, 2023, 2:32pm

One case where I suspect profile falls flat is when the global is tied to the target, like a workspace with an application and then wasm plugins. This is why I was focusing on package and target tables.

kpreid · September 25, 2023, 3:10pm

That's true, but workspaces already fail to support that case well — they can't currently cleanly handle crates that won't compile on one of the two platforms due to platform-specific bindings libraries, because check/build will fail on one or the other. Perhaps there might be a solution that will address both. With the status quo, you have to specify some extra options to get the two builds to succeed, so one of them might as well be a --profile.

the8472 · September 25, 2023, 3:41pm

preview features on stable seems related to this since toggling those may have API implications on crates that optionally expose them.

ratmice · September 27, 2023, 1:44pm

Apologies if this is excessively long and uninteresting.

When trying to expand rust support for seL4, I'm not sure exactly what the current state of multi-config bindings for self+rust are within the last year or so, this is based upon some past work on the API specification such that kernel configuration and the resulting library API's can then be translated into cfgs. I believe that there has been work subsequent work on generating libseL4 in rust, from the API specification but I haven't had time to track it yet.

Typically, many rust bindings just supported a single kernel config (Many didn't exist prior to the addition of the mcs kernel config, so beyond architecture the only thing they had to deal with was whether to enable debug). seL4 uses a decent number of what were historically c preprocessor definitions for configuring the kernel's syscalls and the libsel4 API, these were all abstracted out into a somewhat language neutral xml file which now can be translated into either c preprocessor definitions or rust cfg.

There are basically 2 layers to this:

affects the availability of syscalls in the kernel in the kernel proper.
subsequent availability of API in system level libraries, and the syscall stubs generated from the above.

The latter of which tends to "infect" downstream crates as they use API dependent upon specific cfg's

Essentially 2 kernel config 'master', or 'mcs' (beyond arch) and a flag for enabling debug which all affect the syscall availability Here are some examples:

syscall configuration
Negative reasoning about kernel config in the library portion of the API^[1] example.

In general the goal has been to when generating code for these, generate the appropriate #[cfg(...)] directives. The RFC hasn't sunk into my brain fully to feel comfortable responding to the 2nd or 3rd questions as to how this would work with global features (partially just because of lack of experience since we don't have a lot of experience with e.g. the translation of the libseL4 API into rust, nor any subsequent crates which depend upon it) . But it seemed worth mentioning as it relies on cfg quite heavily in a cross-crate fashion.

This should perhaps change to whitelisting the configs that enable it rather this current than negative logic, but serves as an example use of logic in the API specification that get translated to cfg. This basically defines a type TCBConfigure only if we aren't using the CONFIG_MCS option. ↩︎

epage · September 27, 2023, 1:47pm

Is there a reason you are using cfg rather than features for allowing dependents to control which parts of an API are available?

ratmice · September 27, 2023, 1:56pm

Yes, syscalls are mutually exclusive. e.g. in that first link describes a system call Reply in the api-master configuration not present in the api-mcs configuration and in the other direction lots of syscalls e.g. NBWait not available in the api-master. That is to say at the very bottom of the tower there are a few of mutually exclusive things which tend to permeate upwards.

Edit: That is to say the kernel itself is built to one configuration api-mcs or api-master. Though we can generate a single source syscall-stubs crate which can then be compiled for any configuration of the kernel the syscall-stubs crate must be compiled to match the selected kernel configuration to avoid sending unknown syscalls.

It might be easier to just say it is similar to std::os where there are all of the platform specific std::os sub-modules. if there were we would theoretically have something like std::os::sel4_api_master and std::os::sel4_api_mcs. then std::os::sel4_debug would be a separate module which may/may not be present on either of those. Except not in std, currently there aren't target triples for it, and I fear doing so might lead to a combinatoric explosion of triples for all the various configurations.

bjorn3 · September 27, 2023, 2:51pm

If you have different syscall interfaces using a different target makes sense. Especially if you want to support libstd in the future as a single pre-compiled libstd is shipped per (tier 1/2) target without any option to configure which syscall interface should be used.

ratmice · September 27, 2023, 3:13pm

There isn't enough support to implement libstd it could implement core (minus include_bytes like macros, etc), But it only gives you enough to where you could go and implement a version of standard on top of it, by adding allocators, filesystems on top of it. But it's pretty hard to see it fitting in well with the std ecosystem.

Edit: I think the actual closest comparison here is probably the core::arch submodules with an associated target_family=sel4. From there it gives you enough to build a target_os.

Vorpal · September 27, 2023, 10:42pm

(Side note: Maybe it is just me, but it still seems like an exceedingly bad idea to have two mutually exclusive syscall interfaces for the same OS, as that would result in splitting the already small ecosystem in two. Perhaps if you immediately on top had an abstraction layer (like libc) and had everything use that, then it could work. Still seems like duplicated effort though compared to focusing on one syscall interface.

Userspace really should be talking to the OS via some sort of stable API and ABI, be that syscall like on Linux, or libc like on most other OSes.)

ratmice · September 27, 2023, 11:19pm

(We're kind of getting off into the weeds, as this has little to do with the actual RFC), but yes it "isn't ideal" to put it mildly but there are reasons for it. One of them has been verified and will be legacy once the other is verified e.g. api-mcs is equivalent to something like like nightly for the time being.

Edit: one thing that makes it less troublesome than on ordinary OS's is the kernel + bin targets + libraries are all built in a single build tree similar to a cargo workspace, which generates a single system image file. So to mix syscall ABI's you'd have to mess up the build system pull targets out of the system image to manually. It is basically assumed if anyone is distributing binaries separate from the build generated system image, they need to pick one or deal with the fallout. Most systems using it are completely static.

epage · October 16, 2023, 2:20pm

One thing people bring up from time to time (like with public/private dependencies is the idea of having unique instances of packages.

Another spin on mutually-exclusive features (that doesn't solve feature propagation) is that we support parametrized packages. We have package parameters that are declared with an enum + default. Each unique combination of these parameters creates a unique instance of a package in the tree. We still do dependency and feature unification within that combination of parameters.

I feel like we'd want both globals and parameterized packages. Maybe there is even a way to unify these features so the control for parameters can be local or global, depending on different circumstances.

bascule · October 16, 2023, 4:19pm

Overall this looks like a great proposal!

I have one question/suggestion about this syntax:

#[cfg(any(global::sys = "static", global::sys = "auto"))]

One thing that's a bit tricky about cfg attributes right now is (AFAIK) there is no way to query for them being unset to easily allow for a default. I'm not sure what that'd actually look like in the context of this approach which somewhat resembles cfg attributes (empty string maybe?) but it would be nice to have a solution.

Topic		Replies	Views
Pre-RFC: Cargo mutually exclusive features cargo	39	4204	December 22, 2024
Pre-RFC: cfg - Aligning feature configuration between Cargo & rustc tools and infrastructure	8	846	June 17, 2022
[Pre-RFC] Rust and features (cargo) discoverability documentation	1	745	March 25, 2019
Mutually exclusive feature flags? cargo	13	5952	March 25, 2019
[Pre-RFC] `std` aware Cargo language design	4	1426	May 16, 2019

Pre-RFC: Mutually-excusive, global features

Summary

Motivation

Guide-level explanation

Reference-level explanation

SemVer

Drawbacks

Rationale and alternatives

Alternatives

Native support for controlling std / alloc / core

Native cargo support for -sys crates

Automatic feature activation

Mutually exclusive features conflict via an identifier

Mutually exclusive features that instantiate distinct copies a dependency

log / logger split

Facade crates

Prior art

Gentoo USE flags

Bazel

Gradle

Unresolved questions

Future possibilities

Loosen error on conflicting globals

cfg_str!

Additional globals.*.values validation rules

Multi-valued globals

Related topics

Native support for controlling `std` / `alloc` / `core`

Native cargo support for `-sys` crates

`log` / logger split

`cfg_str!`

Additional `globals.*.values` validation rules