Pre-RFC: a vision for platform/architecture/configuration-specific APIs

So were doing totality checking on various cfg combinations? This looks really good! A few scattered thoughts

  1. I am reminded of the subset-std https://github.com/rust-lang/rust/issues/27701. The only worry for me on this plan is that it could compete with importing various combinations of crates beneath the facade someday.

  2. #[scenario_requirements(..)] might not actually be needed? Is enough to assume the cfgs on public items, and trace their dependencies ensuring that all variations outside of the assumptions are covered? I forget if #![cfg(...)] is a thing, but that would avoid the need to repeatedly label a bunch of items too. Also this means we can do the same totality check over platform attributes and cargo features alike.

  3. Cargo should be made aware of scenario requirements. Specifically, #![cfg(..)] or #![scenario_requirements(..)] should be duplicated in Cargo.toml. Then Cargo can verify that dependencies respect the declared scenario requirements. Furthermore, https://github.com/rust-lang/rfcs/blob/master/text/1361-cargo-cfg-dependencies.md can be re-purposed to “unlock” non-portable items—conditional deps (and “re-added” normal deps) would have all items exposed portable up to the assumptions of the [target.cfg(..).section]. An open question is how to combine the targets-/feature-assumptions of conditional deps listed in two places.

  4. The obvious reason for the lint is because we’re trying to reuse std lib binaries in many contexts. At first I thought that with explicit std dependencies + rebuilding std every time, std simply wouldn’t contain insufficiently-portable items, and all would be good. But actually things are more complex: Since dependencies are shared, they must contain items sufficient for their most demanding consumer—but we should still ensure that other deps don’t actually overstep their portability constraints. Also note that this also applies to features. We therefore want the lint to make sure that each dependency only gets what it asks for. Here is an example: Consider foo and bar depend on initial and terminal depends on foo and bar, a “diamond”. Consider also that foo requires features x and y of initial, but bar only depends on x. bar should not be able to use any items exposed by y.

  5. With the above in place, it’s possible to build crates up to a large set of enabled non-portabilities and features in order to reuse the binaries—this applies both to a hypothetical global Cargo cache, and the way we build the std lib for the sysroot today. However there should always be a way opt-out of this for when the implementation details of features/exposed-platform-attributes matter. For example without --gc-sections, unused impls in libcore would still cause linking problems, and some features (though presumably any of the ones in std) change the implementations of features in important ways without affecting their portability (the item exists whether or not the feature is used).

This seems like a good idea, primarily because of the non-hierarchical APIs issue.

I’d prefer to see this integrated with the same target configuration tokens used in cfg, rather than using a disjoint set of tokens. If this new mechanism needs to introduce a new set of target configuration tokens or expressions, then all those same tokens and expressions should also work in cfg, both for consistency and to provide more power to cfg. That then allows the use of cfg!, cfg_attr, and similar mechanisms. See the existing set of built-in cfg tokens.

The pre-RFC doesn’t mention whether the new scenario lint should warn or error by default. I would argue that it should error by default, but the error should detect attempts to use a symbol that requires a specific scenario and tell the user how to add the necessary scenario to their code. For instance, if you call a function to get a file descriptor, from a function that doesn’t explicitly say it only supports unix, you should get a compilation error, but the error should tell you exactly what you can copy and paste onto your function to make it work.

Also, in the example with platform_specific_helper, cross_platform doesn’t actually cover all scenarios; it runs on cfg(unix) and cfg(windows). Perhaps Rust should have an additional (optional) new lint for the benefit of porters, to identify functions like this that declare more portability than their implementation supports? (The RFC doesn’t necessarily need to depend on this, but perhaps you could list it as a useful additional enhancement?) This also ties into the question about whether “mainstream scenario” can change over time.

That said, this also does need a way to define functions that do runtime detection. For instance, a function portable to all x86 platforms might use cpuid to detect features at runtime, and then call a function that specifically requires avx instruction support, with fallback to a function that works everywhere. In that case, the avx-specific function can’t necessarily use cfg(cpu(avx)) or similar, because the function needs to exist for all CPUs; however, it should somehow declare itself as only running on such CPUs, so that if you call it from another such function it works, but if you call it from a more general function then that function has to explicitly tell Rust that it expands portability.

Given that you can't currently tell rustc "throw in some avx instructions when compiling this function" I don't really see the value of this proposal. You could force use of those instructions in an asm!(...) block, but if you're doing that, you might as well manually insert the feature detection there, and handle the 'feature unsupported' case. And since you're now handling the 'no avx' case, you don't need to declare your function #[cfg(cpu(avx))] and all is well.

Perhaps it would be better to investigate doing this kind of stuff automatically inside LLVM (perhaps such a thing is possible already? I wouldn't know), e.g. by teaching it to recognize an attribute that informs it to always compile with some cpu-features forced enabled, and if the actual compilation target does not have those features, then it should insert runtime feature detection. E.g. imagine a #[use_feature(avx)] attribute on a function. If the compilation target is a cpu that supports this feature, the attribute is ignored. If the target is a cpu that does not support this feature (e.g. 686), it would compile the function twice: once with the feature enabled, and once without, and it would insert runtime feature detection and pick the correct implementation.

Yes, the grammar should be exactly the same, if we don't just use cfg as I propose.

It would need to be just a warning by default for backwards compatibility.

It should have #[scenario_requirements(any(unix, windows))] (or #[cfg(any(unix, windows))] with my plan).

I'm find with that being out of scope initially. This boils down to inline assembly we can't verify anyways, or adding some sort of "fat binary" knowledge to LLVM which is a lot of work, or doing lots of compilation units and hoping linking works, which is optimization-inhibiting and just kinda janky. (To be clear we plan on doing lots of LLVM compilation units anyway for cached recompilation, which is great. It's the mixing of platforms that's a hack. Also consider then which platform would the linker invoked with?)

I like this idea generally. I need some more time to address all of it…

One thing that’s related that there’s been a lot of discussion about, but should be brought up here too: the situation with libcore and OS dev, specifically around floats. Would still like to see some kind of eventual resolution here, and with the “use cases” approach, I’d imagine it fits in.

I like the goal of this idea, and I very much like the terminology around “scenarios” - however, I’m not sure lints are the correct way of evaluating this.

In particular, “scenarios” seem to me to be simply another form of dependency - specifically, dependencies on the environment, rather than on other Rust code.

The current unit of dependency management is the crate, and I think that offers an opportunity to do better than is possible with lints:

  1. Add a crate-level list of “scenarios” required by the crate
  2. Allow features (or platform-specific config) to add scenarios to the list
  3. Allow binary crates to define a bounding set of scenarios which dependencies cannot exceed

As an example, the core float case could optionally “assume a float-capable scenario”, enabled by the float feature, which is in its default features. A kernel (which is a binary crate) would omit float from its scenario bounding set, and thus if a dependency pulled in a float-enabled core, it would report an error.

In the long run, this could even be made more ergonomic - rather than immediately reporting an error when a feature pulls in a disallowed scenario, continue, and build up a description of how and why the bounding set was exceeded. It’s possible that this could even extend to proposing “fixits” - such as what dependency of the top-level binary crate could be removed (or have a feature disabled) to resolve the issue.

The possible scenarios would be defined centrally, perhaps by rustc - good ones to begin with might be float(width), atomic(type), and os(name).

2 Likes

I had asm! in mind when I wrote that. And no, not every asm! block should do its own autodetection; sometimes you want a function that assumes the CPU feature set, so you can call it from another function that already knows the feature set and avoid duplicate detection code.

GCC has a mechanism like this; see __attribute__((target(...))) and __attribute__((target_clones(...))), for instance. The former allows you to optimize a function specifically for the target architecture; the latter constructs several functions and an ifunc resolver to choose between them at runtime. (The latter has some overhead that you may not always want, though.)

Wouldn't it only apply if you call an appropriately tagged function, which didn't exist to call at all in a prior version? How could that break backward compatibility?

Not necessarily just inline assembly. Consider intrinsics, for instance. My motivation to bring this up came from a GCC bug, where it didn't allow calling a function optimized with #pragma GCC target("bmi2") (such as the intrinsics in <bmi2intrin.h>) from a function not compiled for that target, even though the generic function in question called a cpuid intrinsic first. No inline assembly involved, just compiler intrinsics.

I imagine privacy can take care of that without attributes. If you ensure that the pub functions in a module do the feature detection, then the non-pub functions can avoid the detection code.

A module full of functions for a specific target instruction set may want to mark its pub functions accordingly, and leave out the detection.

In any case, I don’t think that needs a new mechanism; rather, I just want to make sure the mechanism proposed here doesn’t completely omit any possibility of telling the compiler “yes, I really do mean to call a more-specific function from this more-general function”.

1 Like

There’s one major problem with the lints. They run so late, that they never see a function that has a cfg attribute which causes the function to disappear. We’d need a new kind of lint that runs on the pure parsed AST (so before this line)

Oh it’s worse than that. It’s infeasible to brute force all possible configurations individually, so I’m pretty sure we’d need to somehow do name resolution for them all at once.

Hmm. This feels related to the current "require-provide" stuff around allocators and panic runtimes.

1 Like

Yeah, agreed - there’s a very similar shape to the way those crosscut the crates involved in a resolution.

EDIT: Hm, although there is one divergence worth noting - allocators and panic runtimes are “select one”, while scenarios as described are additive (much like cargo features) - in essence, scenarios as I framed them are “cargo features for the runtime environment”, plus the ability to restrict which scenarios are permissible.

The only issue I see with this is “who controls these lints”? Do we have to wait for a new version of the compiler to appear? Or can external crates add their own?

For example I would like to add to the list, next to SIMD, support for Bit Manipulation Instruction Sets, that is, support for Intel’s BMI, BMI2, and AMD’s TBM instruction sets.

The motivation is that right now it is hard to write a fast quad-tree or octree in Rust on Haswell because any octree using BMI2’s parallel bit deposit/extract instructions is going to beat it by 2-4x when computing Morton codes. If one uses the llvm-intrinsics crate for this the lack of a cfg macro makes it hard to provide a fall-back for other architectures.

The only way I could workaround this right now is to modify the compiler myself, submit pull-request, and wait for this to get accepted and stabilize.

While this should happen anyways, there are a lot of scenareos in real-life, one cannot always expect the compiler to know about them all, nor expect the compiler devs to care about them all. I basically gave up on bit manipulation algorithms for num because once the API was more or less stable I tried to make them fast and this turned out to be impossible without using intrinsics, and the support for the bit manipulation intrinsics was and still is zero.

So I guess my question is, how will external crates be able to easily add their own constraints to the mix? That is, how can my bit manipulation algorithm library add a new architecture/instruction set like BMI2 and use a cfg flag on that for providing fallbacks for the cases in which this instruction set is not available?

If we do add this sort of complex configuration lint, I’d really appreciate if it was optional because I can imagine there’d be a lot of edge cases that result in that lint taking a horrifying amount of time as it tries to resolve every cfg combination. I’ve been bitten enough by compile time regressions as is.

Another category of feature that I believe falls into this same design space is the inclusion of the allocator and unwind crates. These have a global affect on all downstream components - you are either in the ‘jemalloc allocator scenario’ or the ‘system allocator scenario’; the ‘panic unwind’ scenario etc.

I don’t think it makes sense to tack this onto the lint infrastructure - they don’t seem particularly related to me. These ‘scenarios’ comprise a set of constraints that must be obeyed by all downstream crates. I like the ‘scenario’ terminology - we should consider it a fresh subsystem and not part of lints. Every time a crate introduces or depends on one of these global features or configurations it introduces a new constraint to the compilation scenario.

As I said in my earlier post, those crosscut in a similar manner, but have some important differences - for one, they're mutually exclusive. One can't be in the "panic unwind + panic abort" scenario, while "atomic + float + linux" makes perfect sense.

A possible way of reconciling these would be to allow one provider for each scenario - thus, crates can rely on being in a "allocator" scenario, and exactly-one crate could be (say) "=allocator", enabling that scenario.

The top-level crate, meanwhile would define a bounding set - say, "atomic + float + linux + alloc=jemalloc". This would permit any single provider for each of atomic, float, and linux, and mandate that if anything relies on alloc, that MUST be satisfied using the jemalloc crate.

I think this would actually permit full user-level creation of scenarios outside the stdlib, which is quite nice.

@eternaleye well I think we’d want to have and/or so yes “panic unwind & panic abort” makes no sense but “os(windows) & os(linux)” doesn’t either.


Anybody read my first reply that this can be scenarios can be an exhaustiveness check on cfg, so we don’t need a new annotation? Unless I’m missing something this is a nice free simplification.

Winelib on Linux would be "=windows", to use my terminology, for one. Cygwin gives Windows + Unix, as does midipix.

And no, such an exhaustiveness requirement would be

  1. A huge amount of boilerplate for each crate
  2. Semantically backwards
  3. Relatively inextensible

winelib, Cygwin, midipix

Fair enough of the principle, but do any of those work like that? Last I checked there is a huge amount of code in std and the community that assumes a unix-windows dichotomy.

  1. A huge amount of boilerplate for each crate

Huh? My proposal is that we cfg just like today, but something checks to ensure regardless of the (valid) combination of cfg tokens, there are no unresolved identifiers. The compiler, not the users "proves" that that the cfgs are total. If anything the problem would be compilation time bloat, not code bloat.

  1. Semantically backwards

I am not sure what you mean.

  1. Relatively inextensible

Do you mean any(os(windows), os(unix)) seems total, but then somebody adds multics to rustc, so the code wasn't so portable after all? We can avoid that.