Pre-RFC: a vision for platform/architecture/configuration-specific APIs

Recently, there have been a number of proposals for expanding out the set of platform-, architecture-, or configuration-specific APIs, including:

We need to discuss our overall vision for accommodating these kinds of APIs, because the vision worked out pre-1.0 is not broad enough to accommodate them.

The status quo

We previously addressed the question of platform-specific APIs in std as part of the IO stabilization effort. The design was built to strongly emphasize cross-platforms APIs, while providing relatively easy hooks into platform-specific ones. In more detail, we proposed:

  • A cross-platform core API (see the RFC). In normal std modules, APIs should only be exposed if they are supported on "all" platforms. They should follow Rust conventions (rather than the conventions of any particular platform) and should never directly expose the underlying OS representation. In principle, this makes the "default" experience of programming against std result in highly portable applications. In practice, even the “cross-platform” APIs exhibit some behavioral differences across platforms on edge cases; we’ve worked hard to keep that set small.

  • Platform-specific extensions (see the RFC). Another bedrock principle of std is that it should be possible to access system services directly, with no overhead. A basic way we do this in std is to provide “lowering” APIs, which are methods on std abstractions that expose the underlying platform-specific data (e.g. file descriptors). These lowering APIs mean you can build external crates that bind platform-specific functionality and provide it as a direct extension to std abstractions. For the most important extensions, we provide direct bindings in std itself.

There’s one additional, very important piece: in std today, all platform-specific APIs are cordoned off into submodules of std::os, which has submodules like windows, unix, macos, and linux (note that the last three form an implicit hierarchy). The upshot is that when you’re doing something platform-specific in std, there’s a clear signal: you will have use std::os::some_platform as a way of explicitly “opting in”.

The platform-specific APIs generally work through extension traits (e.g. MetadataExt), which means that once imported, they’re usable just as ergonomically as the platform-specific APIs. Each platform provides a prelude module to make the imports themselves very easy to do.

The problem

All in all, the current systems works well for one particular platform distinction: the operating system. Since we can form a rough hierarchy of supported OSes (e.g., unix at top level, linux underneath, and conceivably specific kernel versions beneath that), it’s always clear where an OS-specific API should live: as high in the hierarchy as possible.

The approach starts to fall apart, however, once you bring in other factors:

  • Architecture, which doesn’t tend to form clean hierarchies in terms of support for e.g. specific instructions. It’s very difficult to provide a clear module structure that delineates architectural capabilities.

  • Configuration, which applies to both the Rust side (e.g. abort-on-panic) and the system side (e.g. C types can shift depending on how the OS is compiled). Tends to be cross-cutting and also not hierarchically organized.

  • External crates, which usually don’t follow the std::os pattern and can thus be hard to gauge for platform compatibility. Moreover, there are additional divisions like no_std which impose further compatibility restrictions.

While it’s relatively easy to grep for std::os::linux or the like, it clearly doesn’t give the whole story, even for OS-specific APIs, let alone for non-hierarchical concerns.

Design goals

To ease discussion, I’ll use the term scenario to describe any collection of OS, platform, architecture and configuration options.

Let’s take a step back and re-asses what we want in our story for scenario-dependent APIs. Assume for the moment that we can identify a class of "mainstream scenarios", which encompass platform, architecture, and configuration assumptions.

Here are some potential goals for a broader design:

  • By “default”, Rust code should be compatible with “mainstream scenarios”.

    • Ideally, it’s also easy to gauge for even broader compatibility requirements, without literally having to compile for a number of platforms.
    • Ideally, compatibility constraints could be easily imposed/checked across all crates being used, not just std
  • Rust should offer best-case ergonomics for “mainstream scenarios”.

    • This means APIs should be particularly easy to find and use in the mainstream case.
    • Ideally, of course, ergonomics are good across the board.
  • Allow for arbitrary, non-hierarchical sub- and super-setting of “mainstream scenario” APIs.

    • Ideally, with very clean/simple organization, so you always know where to look for a given API, whether mainstream or not.

A possible approach: lints

So, here’s an idea: use the lint system to ferret out unintentional dependencies on the current compilation target, instead of using module organization as we do today. In more detail:

  • Do away with the std::os hierarchy; instead, put APIs in their “natural place”. So, for example, instead of things like UnixStream appearing in std::os::unix::net, they would go directly into std::net. This is similar to the approach we ended up taking for the new atomic operations – they landed directly in std::sync::atomic rather than some architecture-specific module hierarchy. In general, we’d have a single, unified std hierarchy, and encourage other crates to do the same.

  • APIs can be marked with the “scenario requirements” they impose. So UnixStream would be marked with unix, and AtomicU8 might be marked with a direct requirement like atomic_u8. In general, an API would be marked with a set of such requirements; if the set is empty, the API is truly “cross-scenario”.

  • Each crate can specify its desired “compatibility targets”. By default, the target will be “mainstream scenarios”. This information drives the lint, which will trigger in any use of an API that happens to be available in the given compilation target, but not the desired compatibility target (which is usually much broader).

  • Compatibility targets group together scenario requirements in arbitrary, non-hierarchical ways. That is, when you talk about the compatibility requirements for an app or library, you don’t usually want to talk about a combination like unix + atomic_u8 but rather unix + (x86_64 | aarch64). The point is that these groupings and requirements can have arbitrary complex, non-hierarchical relationships, giving much greater flexibility than the existing os module hierarchy does.

The lint approach is attractive for a few key reasons:

  • It allows for the most natural/ergonomic placement of all APIs.
  • It provides a very clear way for a crate to specify its desired compatibility targets, and check for those in the course of a single compile.
  • It works across the whole ecosystem, not just std.

I want to expand briefly on the last point. I expect the lint by default to assume that the scenario requirements for a given fn definition to be exactly the ones of the functions it invokes. Normally, this kind of thing doesn’t work well because of things like closures, but if the only way to make a given fn behave in a platform-specific way is to pass it a platform-specific closure, then the fn isn’t really platform-specific after all.

If we’re careful about how we put the lint together, it will automatically produce the right results for an arbitrary crate based on how that crate uses others (including std). We’d also need a way to say that a given fn imposes fewer constraints than it might seem, e.g.:

// tell the lint there are no scenario requirements, because we'll be using
// cfg internally to dispatch to the appropriate platform-specific code but
// will cover all platforms in doing so.
#[scenario_requirements()]
pub fn cross_platform(...) {
    platform_specific_helper(...)
}

#[cfg(unix)]
fn platform_specific_helper(...) { ... }

#[cfg(windows)]
fn platform_specific_helper(...) { ... }

Assuming we’re interested in the lint approach, there are a lot of details to figure out here, including:

  • What constitutes a “mainstream scenario”? Can that change over time?
  • What is a sensible set of scenario requirements and targets?
  • What is the right way to specify all this information?
  • How can we orient the design so that the lint gets things right by default as much as possible?

I’d love to hear people’s thoughts on the goals, the overall suggestion, and any of the details!

11 Likes

cc @brson @nikomatsakis @Amanieu @retep998 @sfackler @burntsushi @huon @wycats @steveklabnik

So were doing totality checking on various cfg combinations? This looks really good! A few scattered thoughts

  1. I am reminded of the subset-std https://github.com/rust-lang/rust/issues/27701. The only worry for me on this plan is that it could compete with importing various combinations of crates beneath the facade someday.

  2. #[scenario_requirements(..)] might not actually be needed? Is enough to assume the cfgs on public items, and trace their dependencies ensuring that all variations outside of the assumptions are covered? I forget if #![cfg(...)] is a thing, but that would avoid the need to repeatedly label a bunch of items too. Also this means we can do the same totality check over platform attributes and cargo features alike.

  3. Cargo should be made aware of scenario requirements. Specifically, #![cfg(..)] or #![scenario_requirements(..)] should be duplicated in Cargo.toml. Then Cargo can verify that dependencies respect the declared scenario requirements. Furthermore, https://github.com/rust-lang/rfcs/blob/master/text/1361-cargo-cfg-dependencies.md can be re-purposed to “unlock” non-portable items—conditional deps (and “re-added” normal deps) would have all items exposed portable up to the assumptions of the [target.cfg(..).section]. An open question is how to combine the targets-/feature-assumptions of conditional deps listed in two places.

  4. The obvious reason for the lint is because we’re trying to reuse std lib binaries in many contexts. At first I thought that with explicit std dependencies + rebuilding std every time, std simply wouldn’t contain insufficiently-portable items, and all would be good. But actually things are more complex: Since dependencies are shared, they must contain items sufficient for their most demanding consumer—but we should still ensure that other deps don’t actually overstep their portability constraints. Also note that this also applies to features. We therefore want the lint to make sure that each dependency only gets what it asks for. Here is an example: Consider foo and bar depend on initial and terminal depends on foo and bar, a “diamond”. Consider also that foo requires features x and y of initial, but bar only depends on x. bar should not be able to use any items exposed by y.

  5. With the above in place, it’s possible to build crates up to a large set of enabled non-portabilities and features in order to reuse the binaries—this applies both to a hypothetical global Cargo cache, and the way we build the std lib for the sysroot today. However there should always be a way opt-out of this for when the implementation details of features/exposed-platform-attributes matter. For example without --gc-sections, unused impls in libcore would still cause linking problems, and some features (though presumably any of the ones in std) change the implementations of features in important ways without affecting their portability (the item exists whether or not the feature is used).

This seems like a good idea, primarily because of the non-hierarchical APIs issue.

I’d prefer to see this integrated with the same target configuration tokens used in cfg, rather than using a disjoint set of tokens. If this new mechanism needs to introduce a new set of target configuration tokens or expressions, then all those same tokens and expressions should also work in cfg, both for consistency and to provide more power to cfg. That then allows the use of cfg!, cfg_attr, and similar mechanisms. See the existing set of built-in cfg tokens.

The pre-RFC doesn’t mention whether the new scenario lint should warn or error by default. I would argue that it should error by default, but the error should detect attempts to use a symbol that requires a specific scenario and tell the user how to add the necessary scenario to their code. For instance, if you call a function to get a file descriptor, from a function that doesn’t explicitly say it only supports unix, you should get a compilation error, but the error should tell you exactly what you can copy and paste onto your function to make it work.

Also, in the example with platform_specific_helper, cross_platform doesn’t actually cover all scenarios; it runs on cfg(unix) and cfg(windows). Perhaps Rust should have an additional (optional) new lint for the benefit of porters, to identify functions like this that declare more portability than their implementation supports? (The RFC doesn’t necessarily need to depend on this, but perhaps you could list it as a useful additional enhancement?) This also ties into the question about whether “mainstream scenario” can change over time.

That said, this also does need a way to define functions that do runtime detection. For instance, a function portable to all x86 platforms might use cpuid to detect features at runtime, and then call a function that specifically requires avx instruction support, with fallback to a function that works everywhere. In that case, the avx-specific function can’t necessarily use cfg(cpu(avx)) or similar, because the function needs to exist for all CPUs; however, it should somehow declare itself as only running on such CPUs, so that if you call it from another such function it works, but if you call it from a more general function then that function has to explicitly tell Rust that it expands portability.

Given that you can't currently tell rustc "throw in some avx instructions when compiling this function" I don't really see the value of this proposal. You could force use of those instructions in an asm!(...) block, but if you're doing that, you might as well manually insert the feature detection there, and handle the 'feature unsupported' case. And since you're now handling the 'no avx' case, you don't need to declare your function #[cfg(cpu(avx))] and all is well.

Perhaps it would be better to investigate doing this kind of stuff automatically inside LLVM (perhaps such a thing is possible already? I wouldn't know), e.g. by teaching it to recognize an attribute that informs it to always compile with some cpu-features forced enabled, and if the actual compilation target does not have those features, then it should insert runtime feature detection. E.g. imagine a #[use_feature(avx)] attribute on a function. If the compilation target is a cpu that supports this feature, the attribute is ignored. If the target is a cpu that does not support this feature (e.g. 686), it would compile the function twice: once with the feature enabled, and once without, and it would insert runtime feature detection and pick the correct implementation.

Yes, the grammar should be exactly the same, if we don't just use cfg as I propose.

It would need to be just a warning by default for backwards compatibility.

It should have #[scenario_requirements(any(unix, windows))] (or #[cfg(any(unix, windows))] with my plan).

I'm find with that being out of scope initially. This boils down to inline assembly we can't verify anyways, or adding some sort of "fat binary" knowledge to LLVM which is a lot of work, or doing lots of compilation units and hoping linking works, which is optimization-inhibiting and just kinda janky. (To be clear we plan on doing lots of LLVM compilation units anyway for cached recompilation, which is great. It's the mixing of platforms that's a hack. Also consider then which platform would the linker invoked with?)

I like this idea generally. I need some more time to address all of it…

One thing that’s related that there’s been a lot of discussion about, but should be brought up here too: the situation with libcore and OS dev, specifically around floats. Would still like to see some kind of eventual resolution here, and with the “use cases” approach, I’d imagine it fits in.

I like the goal of this idea, and I very much like the terminology around “scenarios” - however, I’m not sure lints are the correct way of evaluating this.

In particular, “scenarios” seem to me to be simply another form of dependency - specifically, dependencies on the environment, rather than on other Rust code.

The current unit of dependency management is the crate, and I think that offers an opportunity to do better than is possible with lints:

  1. Add a crate-level list of “scenarios” required by the crate
  2. Allow features (or platform-specific config) to add scenarios to the list
  3. Allow binary crates to define a bounding set of scenarios which dependencies cannot exceed

As an example, the core float case could optionally “assume a float-capable scenario”, enabled by the float feature, which is in its default features. A kernel (which is a binary crate) would omit float from its scenario bounding set, and thus if a dependency pulled in a float-enabled core, it would report an error.

In the long run, this could even be made more ergonomic - rather than immediately reporting an error when a feature pulls in a disallowed scenario, continue, and build up a description of how and why the bounding set was exceeded. It’s possible that this could even extend to proposing “fixits” - such as what dependency of the top-level binary crate could be removed (or have a feature disabled) to resolve the issue.

The possible scenarios would be defined centrally, perhaps by rustc - good ones to begin with might be float(width), atomic(type), and os(name).

2 Likes

I had asm! in mind when I wrote that. And no, not every asm! block should do its own autodetection; sometimes you want a function that assumes the CPU feature set, so you can call it from another function that already knows the feature set and avoid duplicate detection code.

GCC has a mechanism like this; see __attribute__((target(...))) and __attribute__((target_clones(...))), for instance. The former allows you to optimize a function specifically for the target architecture; the latter constructs several functions and an ifunc resolver to choose between them at runtime. (The latter has some overhead that you may not always want, though.)

Wouldn't it only apply if you call an appropriately tagged function, which didn't exist to call at all in a prior version? How could that break backward compatibility?

Not necessarily just inline assembly. Consider intrinsics, for instance. My motivation to bring this up came from a GCC bug, where it didn't allow calling a function optimized with #pragma GCC target("bmi2") (such as the intrinsics in <bmi2intrin.h>) from a function not compiled for that target, even though the generic function in question called a cpuid intrinsic first. No inline assembly involved, just compiler intrinsics.

I imagine privacy can take care of that without attributes. If you ensure that the pub functions in a module do the feature detection, then the non-pub functions can avoid the detection code.

A module full of functions for a specific target instruction set may want to mark its pub functions accordingly, and leave out the detection.

In any case, I don’t think that needs a new mechanism; rather, I just want to make sure the mechanism proposed here doesn’t completely omit any possibility of telling the compiler “yes, I really do mean to call a more-specific function from this more-general function”.

1 Like

There’s one major problem with the lints. They run so late, that they never see a function that has a cfg attribute which causes the function to disappear. We’d need a new kind of lint that runs on the pure parsed AST (so before this line)

Oh it’s worse than that. It’s infeasible to brute force all possible configurations individually, so I’m pretty sure we’d need to somehow do name resolution for them all at once.

Hmm. This feels related to the current "require-provide" stuff around allocators and panic runtimes.

1 Like

Yeah, agreed - there’s a very similar shape to the way those crosscut the crates involved in a resolution.

EDIT: Hm, although there is one divergence worth noting - allocators and panic runtimes are “select one”, while scenarios as described are additive (much like cargo features) - in essence, scenarios as I framed them are “cargo features for the runtime environment”, plus the ability to restrict which scenarios are permissible.

The only issue I see with this is “who controls these lints”? Do we have to wait for a new version of the compiler to appear? Or can external crates add their own?

For example I would like to add to the list, next to SIMD, support for Bit Manipulation Instruction Sets, that is, support for Intel’s BMI, BMI2, and AMD’s TBM instruction sets.

The motivation is that right now it is hard to write a fast quad-tree or octree in Rust on Haswell because any octree using BMI2’s parallel bit deposit/extract instructions is going to beat it by 2-4x when computing Morton codes. If one uses the llvm-intrinsics crate for this the lack of a cfg macro makes it hard to provide a fall-back for other architectures.

The only way I could workaround this right now is to modify the compiler myself, submit pull-request, and wait for this to get accepted and stabilize.

While this should happen anyways, there are a lot of scenareos in real-life, one cannot always expect the compiler to know about them all, nor expect the compiler devs to care about them all. I basically gave up on bit manipulation algorithms for num because once the API was more or less stable I tried to make them fast and this turned out to be impossible without using intrinsics, and the support for the bit manipulation intrinsics was and still is zero.

So I guess my question is, how will external crates be able to easily add their own constraints to the mix? That is, how can my bit manipulation algorithm library add a new architecture/instruction set like BMI2 and use a cfg flag on that for providing fallbacks for the cases in which this instruction set is not available?

If we do add this sort of complex configuration lint, I’d really appreciate if it was optional because I can imagine there’d be a lot of edge cases that result in that lint taking a horrifying amount of time as it tries to resolve every cfg combination. I’ve been bitten enough by compile time regressions as is.

Another category of feature that I believe falls into this same design space is the inclusion of the allocator and unwind crates. These have a global affect on all downstream components - you are either in the ‘jemalloc allocator scenario’ or the ‘system allocator scenario’; the ‘panic unwind’ scenario etc.

I don’t think it makes sense to tack this onto the lint infrastructure - they don’t seem particularly related to me. These ‘scenarios’ comprise a set of constraints that must be obeyed by all downstream crates. I like the ‘scenario’ terminology - we should consider it a fresh subsystem and not part of lints. Every time a crate introduces or depends on one of these global features or configurations it introduces a new constraint to the compilation scenario.

As I said in my earlier post, those crosscut in a similar manner, but have some important differences - for one, they're mutually exclusive. One can't be in the "panic unwind + panic abort" scenario, while "atomic + float + linux" makes perfect sense.

A possible way of reconciling these would be to allow one provider for each scenario - thus, crates can rely on being in a "allocator" scenario, and exactly-one crate could be (say) "=allocator", enabling that scenario.

The top-level crate, meanwhile would define a bounding set - say, "atomic + float + linux + alloc=jemalloc". This would permit any single provider for each of atomic, float, and linux, and mandate that if anything relies on alloc, that MUST be satisfied using the jemalloc crate.

I think this would actually permit full user-level creation of scenarios outside the stdlib, which is quite nice.

@eternaleye well I think we’d want to have and/or so yes “panic unwind & panic abort” makes no sense but “os(windows) & os(linux)” doesn’t either.


Anybody read my first reply that this can be scenarios can be an exhaustiveness check on cfg, so we don’t need a new annotation? Unless I’m missing something this is a nice free simplification.