Pre-RFC: Mutually-excusive, global features

This is very hand-wavy and very much targeted at brainstorming. At this stage, what I suspect will be most useful will be

  1. Describing where features are not working well or where are you using --cfg
  2. Planning for how it would look with this proposal instead
  3. Evaluating how well this proposal meets your needs. What works well and what doesn't?

I do not expect to have the time to see this through the RFC process and implementation. However, I will be willing to help shepherd this effort if someone is willing to write the RFC and drive this to implementation.


Summary

Cargo features are targeted at direct dependents while some decisions need to be made by the top-level crate (usually a [[bin]]). This is currently worked around by tunneling features up or relying on environment variables.

This RFC proposes an alternative to features called globals that are targeted at decisions that affect the entire final artifact.

[package]
name = "git-tool"
version = "0.0.0"
set-globals = { sys = ["static"] }

[dependencies]
git2 = "0.18"
[package]
name = "git2"
version = "0.18.5"

[dependencies]
git2-sys = "0.18.5"
[package]
name = "git2-sys"
version = "0.18.5"

[globals.sys]
description = "External library link method"
values = ["static", "dynamic", "auto"]
default = "auto"

Differences from features

  • Setting them applies to transitive dependencies
  • Enumerations, rather than "present" / "not-present"
  • No unification

We suspect this will work best for modifying behavior while features will work best for extending APIs.

Motivation

Use cases

  • A crate author offers optional optimizations, like using parking_lot
    • Currently solved by using a feature like in tokio
    • This requires callers to either enable it directly, re-export the feature, or have applications directly depend on tokio and enable it.
  • -sys crates need to make the decision of whether to statically link a vendored version of the source of dynamically link against a system library.
    • Currently this is solved by a variety of means, usually by dynamically linking if the system library is available and then falling back to the vendored copy unless a vendored feature is enabled.
    • This requires callers to either enable it directly, re-export the feature, or have applications directly depend on the -sys crate and enable it.
    • To properly represent this, we need three states: "dynamic", "static", "auto"
    • See also Internals: Pre-RFC Cargo features for configuring sys crates
  • Enable alloc or std features across the stack (rust-lang/cargo#2593)
  • Multiple backends
  • Module-level parameters

Why are we doing this? What use cases does it support? What is the expected outcome?

See also

Guide-level explanation

TODO

Reference-level explanation

Packages may declare what globals they subscribe to:

[package]
name = "git2-sys"
version = "0.0.0"

[globals.sys]
description = "External library link method"
values = ["static", "dynamic", "auto"]
default = "auto"

[target.'cfg(any(global::sys = "dynamic", global::sys = "auto"))'.build-dependencies]
pkg_config = "0.3.27"
[target.'cfg(any(global::sys = "static", global::sys = "auto"))'.build-dependencies]
cc = "1.0.83"
  • Global names follow the same naming rules as cargo features
  • Global values may be any UnicodeXID continue character
  • TODO: Add deprecation support

These can then be referenced in the global namespace in the source, like this build.rs

#[cfg(any(global::sys = "dynamic", global::sys = "auto"))]
fn dynamic() -> Option<Target> {
    // ...
}

#[cfg(any(global::sys = "static", global::sys = "auto"))]
fn static() -> Option<Target> {
    // ...
}

fn auto() -> Option<Target> {
    #[cfg(any(global::sys = "dynamic", global::sys = "auto"))]
    if let Some(target) = dynamic() {
        return Some(target);
    }

    #[cfg(any(global::sys = "static", global::sys = "auto"))]
    if let Some(target) = static() {
        return Some(target);
    }

    None
}
  • References by this crate to global names/values in cfgs and cargo:rustc-cfg build script directive will be validated based on RFC 3013 (rustc and cargo)

Workspace globals may also be specified and a package may inherit them:

[workspace.globals.sys]
description = "External library link method"
values.enum = ["static", "dynamic", "auto"]
default = "auto"
[globals.sys]
workspace = true

[globals.zlib]
description = "zlib implementation"
values = ["miniz", "zlib", "zlib-ng"]
default = "auto"
  • This initial design has no overridable fields when inheriting, unlike inheriting workspace dependencies
  • Like workspace dependencies, cargo new will not automatically inherit workspace.globals

Packages may configure globals for use when built as a root crate

[package]
set-globals = { sys = ["static"], zlib = ["miniz"] }
# may also be inherited from the workspace
# set-globals.workspace = true
  • Globals are arrays in case two packages are subscribed to that global with disjoint valid values
  • When building multiple root crates (cargo check --workspace), it is a compilation error if they disagree about set-globals (i.e. no unification is happening like it does for features)

Also these may be set:

  • On the command-line as --globals sys=dynamic
  • As a config setting

When resolving dependencies (the feature pass), fingerprinting of packages, an when compiling the packages, workspace.set-globals will only apply to a package if the value is valid within the schema for that package, otherwise the default will be used if any. When multiple values are valid, the first will be selected. Any unused set-globals will be a warning.

cargo <cmd> --globals (no argument) will report the globals available for being set within the current package along with their current value and valid values.

SemVer

SemVer Compatible

  • Adding a global
  • Adding a global value

Context-dependent for whether compatible or not

  • Changing a global default
  • Removing a global
  • Removing a global value

Drawbacks

TODO

Rationale and alternatives

This couples mutually exclusive features with global features because

  • Mutually exclusive features provides design insight into what we should do for global features
  • Scoping mutually exclusive features to global features solves the feature-unification problem

Global features take a value, rather than allowing relying on presence like normal cfgs (see RFC 3013)

  • Less brittle for future evolution
  • Easier to unify for a distributed schema
  • Removes the complexity in declaratively defining presence and value instances

package.set-globals is a package setting, with conflict errors, rather than a workspace setting because people will likely have binaries that serve very different roles within the same workspace (different target platforms, host vs wasm, etc).

The loose schema is used to allow it to be declared in a distributed fashion without packages failing due to definition conflicts. Alternatively, we could make these parameters on individual packages. The downside is then we'd need a patch-like table for specifying the exact dependency in the tree to set the parameters. This scheme is brittle in handling upgrades.

Unused set-globals is not an error so that changing dependencies is not a breaking change.

Alternatives

Native support for controlling std / alloc / core

See Internals: Pre-Pre-RFC: making std-dependent Cargo features a first-class concept

Native cargo support for -sys crates

Instead of a convention around global features, cargo could have built-in flags for controlling -sys crates and could ship some native support that would remove some boilerplate and ensure consistency.

Downsides

  • Longer time frame: this would require more investigation and experimentation.
  • Doesn't handle all use cases

See

Automatic feature activation

Too many spooky effects

See

Mutually exclusive features conflict via an identifier

Some systems call this "provides" and others a "capability".

Downsides

  • No way to unify these, the decision needs to be at the top-level crate.
  • Linux distributions that use this implicitly wire the packages together by dropping files in place with compatible names while Rust/Cargo need explicit wiring, requiring the choice to be bubbled up the stack. See instead "facade crates".

See

Mutually exclusive features that instantiate distinct copies a dependency

While there might be a place for a variant of this idea, most motivating cases are dealing with wanting one version of the dependency.

See rust-lang/cargo#2980

See also Internals: module-level generics

See also Cabal backpack support

log / logger split

Some cases can be modeled like log where a trait is defined and an API is exposed for regisering a specific implementation.

If the interface crate is able to also depend on the implementations, it could also initialize the global with a default on first use if its unset.

Benefits

  • Works today
  • Flexible on what state an implementation can be initialized with

Downsides

  • Doesn't work for all use cases
  • Imperative wiring together of APIs in the top-level crate
  • Some level of overhead
  • Supporting a default requires building all implementations since the choice is at runtime

Facade crates

On internals, the idea was proposed to bake-in support for the log pattern so it can be done at compile time.

Downsides

  • Doesn't work for all use cases like -sys crates

Prior art

Gentoo USE flags

USE flags are booleans (set or unset) that are defined through a form of layered config, with some defaulted to on. Users may then enable more or disable some of the defaulted flags. This can be done globally or on a per-package basis. Packages then check what is enabled and conditionalize their builds off of them.

Bazel

Features may "provide" a feature and it is an error to "provide" a feature more than once.

Gradle

This seems work similar to Bazel but instead of "providing" a feature, you declare a capability.

See also Handling mutually exclusive dependencies

Unresolved questions

  • Target-specific set-globals
  • Profile-specific set-globals
  • Naming, especially for package.set-globals and default
  • Whether values.enum is needed vs just values
  • Can we make RFC 3013 work with the root cfg namespace being an open set of fields while global:: namespace is a closed set?

Future possibilities

Loosen error on conflicting globals

Most of the time, libs may be selected as part of the root crates (cargo check --workspace) but effectively aren't a root crate. Can we loosen the requirement on these so they don't need the same set-globals as others in the workspace, making it easier to build?

cfg_str!

Expose value set for the current package as a &'static str

let foo = cfg_str!("global::foo");
  • Error for cfgs that have a name without a value

Additional globals.*.values validation rules

Currently, we only support validation based on a set of possible values.

Potential fields for the globals.*.values include:

  • globals.foo.values.bool = true
  • globals.foo.values.range = "8-23"
  • globals.foo.values.path = "ImplForTrait"
  • globals.foo.values.expr = "some_func()"
  • globals.foo.values.any = true (makes values an open set, predantic warning available when it falls back to this)

(not a tagged variant to make future evolution easier)

Multi-valued globals

Extend the definition of a global to include a multi field. Instead of new assignments overwriting, they append.

14 Likes

Another potential option, rather than a global namespace: we could attach global features to a specific crate, and then for common things like sys we can attach them to a dedicated crate that only provides features; then, -sys crates can depend on that crate to make use of that feature, and application crates can depend on that crate and reference that feature. This has the advantage of allowing any coordinating set of crates to have global features, and not requiring us to maintain a single global:: namespace.

Another advantage of this: you can easily notice if a -sys crate doesn't make use of the global feature, because it doesn't have the dependency.

(Something like zlib, for instance, seems like it makes more sense attached to libz-sys.)

4 Likes

When I had considered per-package parameters, I was going in the direction of preventing ambiguity which lead to needing package spec IDs which you then need to update whenever you update your dependencies, making it more brittle as well as not very ergonomic.

Whats also implicit in what you said is that you'd need to be able to query the state of a parameter of your dependency. I was trying to make these built-in cfgs only affect packages that explicitly mention them as this narrows the scope for fingerprinting for better cache reuse I feel like I'd need to do some deeper stepping through of how this affects resolving of dependencies and parameters and the validation of parameters.

For TLS there has long been a proliferation of features for configuration:

  • native-tls vs rustls
  • For rustls, using webpki-roots or rustls-native-certs (and we'll soon start recommending another option)
  • Current alpha releases of rustls allow the caller to pick a CryptoProvider impl to provide cryptographic primitives (for which ring will likely still be the default in the first release)

Given that TLS and the backend for cryptographic primitives are security-sensitive choices, Cargo features (where choices can be influenced via "spooky action at a distance" from other crates in your dependency graph) have not been a great fit. We've recently discussed the requirements for FIPS certification, where a binary must be shown to only use FIPS certified cryptographic primitives, for example.

One alternative is typically to allow the application to provide their own rustls::ClientConfig or ServerConfig to a library, but that in turn means that rustls becomes a public dependency of every library that needs TLS, which has its own downsides in terms of managing semver updates.

Another interesting point in the design space here is something like an ability to specify that a feature must not be enabled. As in, the application can specify !feature and this would require Cargo to raise an error if anything in the dependency graph tries to enable that feature.

1 Like

This reminds of me another potential use for globals: asserting on a condition. This requires buy-in from all cryptographic libraries but if you set fips = "true", then it could be a compiler error to enable a feature for non-fips code. Similar for a no_std and no_alloc asserting that std parts of packages shouldn't be unintentionally enabled.

What if globals were instead set in [profile]? Advantages of this:

  • Precedent for a package having global build settings that only apply if the package is the build root is that they go in [profile]
  • Packages can provide multiple profiles with different globals, either dev vs release or custom named profiles
2 Likes

FWIW I don't much like the notion of a centrally managed list of globals. Maybe another option here would be for crates to opt-in to features based on whether another crate is present in the dependency graph: if the serde crate (with the derive feature) is being built anyway, enable the serde feature.

In the past, that had been effectively rejected, see Eager optional dependencies · Issue #2593 · rust-lang/cargo · GitHub. Doesn't mean we can't revisit it.

One case where I suspect profile falls flat is when the global is tied to the target, like a workspace with an application and then wasm plugins. This is why I was focusing on package and target tables.

That's true, but workspaces already fail to support that case well — they can't currently cleanly handle crates that won't compile on one of the two platforms due to platform-specific bindings libraries, because check/build will fail on one or the other. Perhaps there might be a solution that will address both. With the status quo, you have to specify some extra options to get the two builds to succeed, so one of them might as well be a --profile.

preview features on stable seems related to this since toggling those may have API implications on crates that optionally expose them.

Apologies if this is excessively long and uninteresting.

When trying to expand rust support for seL4, I'm not sure exactly what the current state of multi-config bindings for self+rust are within the last year or so, this is based upon some past work on the API specification such that kernel configuration and the resulting library API's can then be translated into cfgs. I believe that there has been work subsequent work on generating libseL4 in rust, from the API specification but I haven't had time to track it yet.

Typically, many rust bindings just supported a single kernel config (Many didn't exist prior to the addition of the mcs kernel config, so beyond architecture the only thing they had to deal with was whether to enable debug). seL4 uses a decent number of what were historically c preprocessor definitions for configuring the kernel's syscalls and the libsel4 API, these were all abstracted out into a somewhat language neutral xml file which now can be translated into either c preprocessor definitions or rust cfg.

There are basically 2 layers to this:

  • affects the availability of syscalls in the kernel in the kernel proper.
  • subsequent availability of API in system level libraries, and the syscall stubs generated from the above.

The latter of which tends to "infect" downstream crates as they use API dependent upon specific cfg's

Essentially 2 kernel config 'master', or 'mcs' (beyond arch) and a flag for enabling debug which all affect the syscall availability Here are some examples:

In general the goal has been to when generating code for these, generate the appropriate #[cfg(...)] directives. The RFC hasn't sunk into my brain fully to feel comfortable responding to the 2nd or 3rd questions as to how this would work with global features (partially just because of lack of experience since we don't have a lot of experience with e.g. the translation of the libseL4 API into rust, nor any subsequent crates which depend upon it) . But it seemed worth mentioning as it relies on cfg quite heavily in a cross-crate fashion.


  1. This should perhaps change to whitelisting the configs that enable it rather this current than negative logic, but serves as an example use of logic in the API specification that get translated to cfg. This basically defines a type TCBConfigure only if we aren't using the CONFIG_MCS option. ↩︎

Is there a reason you are using cfg rather than features for allowing dependents to control which parts of an API are available?

Yes, syscalls are mutually exclusive. e.g. in that first link describes a system call Reply in the api-master configuration not present in the api-mcs configuration and in the other direction lots of syscalls e.g. NBWait not available in the api-master. That is to say at the very bottom of the tower there are a few of mutually exclusive things which tend to permeate upwards.

Edit: That is to say the kernel itself is built to one configuration api-mcs or api-master. Though we can generate a single source syscall-stubs crate which can then be compiled for any configuration of the kernel the syscall-stubs crate must be compiled to match the selected kernel configuration to avoid sending unknown syscalls.

It might be easier to just say it is similar to std::os where there are all of the platform specific std::os sub-modules. if there were we would theoretically have something like std::os::sel4_api_master and std::os::sel4_api_mcs. then std::os::sel4_debug would be a separate module which may/may not be present on either of those. Except not in std, currently there aren't target triples for it, and I fear doing so might lead to a combinatoric explosion of triples for all the various configurations.

If you have different syscall interfaces using a different target makes sense. Especially if you want to support libstd in the future as a single pre-compiled libstd is shipped per (tier 1/2) target without any option to configure which syscall interface should be used.

1 Like

There isn't enough support to implement libstd it could implement core (minus include_bytes like macros, etc), But it only gives you enough to where you could go and implement a version of standard on top of it, by adding allocators, filesystems on top of it. But it's pretty hard to see it fitting in well with the std ecosystem.

Edit: I think the actual closest comparison here is probably the core::arch submodules with an associated target_family=sel4. From there it gives you enough to build a target_os.

(Side note: Maybe it is just me, but it still seems like an exceedingly bad idea to have two mutually exclusive syscall interfaces for the same OS, as that would result in splitting the already small ecosystem in two. Perhaps if you immediately on top had an abstraction layer (like libc) and had everything use that, then it could work. Still seems like duplicated effort though compared to focusing on one syscall interface.

Userspace really should be talking to the OS via some sort of stable API and ABI, be that syscall like on Linux, or libc like on most other OSes.)

(We're kind of getting off into the weeds, as this has little to do with the actual RFC), but yes it "isn't ideal" to put it mildly but there are reasons for it. One of them has been verified and will be legacy once the other is verified e.g. api-mcs is equivalent to something like like nightly for the time being.

Edit: one thing that makes it less troublesome than on ordinary OS's is the kernel + bin targets + libraries are all built in a single build tree similar to a cargo workspace, which generates a single system image file. So to mix syscall ABI's you'd have to mess up the build system pull targets out of the system image to manually. It is basically assumed if anyone is distributing binaries separate from the build generated system image, they need to pick one or deal with the fallout. Most systems using it are completely static.

One thing people bring up from time to time (like with public/private dependencies is the idea of having unique instances of packages.

Another spin on mutually-exclusive features (that doesn't solve feature propagation) is that we support parametrized packages. We have package parameters that are declared with an enum + default. Each unique combination of these parameters creates a unique instance of a package in the tree. We still do dependency and feature unification within that combination of parameters.

I feel like we'd want both globals and parameterized packages. Maybe there is even a way to unify these features so the control for parameters can be local or global, depending on different circumstances.

1 Like

Overall this looks like a great proposal!

I have one question/suggestion about this syntax:

#[cfg(any(global::sys = "static", global::sys = "auto"))]

One thing that's a bit tricky about cfg attributes right now is (AFAIK) there is no way to query for them being unset to easily allow for a default. I'm not sure what that'd actually look like in the context of this approach which somewhat resembles cfg attributes (empty string maybe?) but it would be nice to have a solution.