Pre-RFC: target-feature detection in libcore

It’s concerning, since it would be nice to have a similar lang item for getrandom. If it’s the case I think extern existential proposal should be revisited, i.e. we need a functionality to define items overwritable from other crates. It may even find uses outside of core/std.

Even then, you might be in an operating system environment that doesn’t want to save and restore the SSE registers. So you can’t quite assume that.

2 Likes

The RFC was not rejected, but is proposed to be postponed due to a lack of bandwidth to deal with it and not due to some disagreement with the feature proposed itself.

That was not the decision and the conclusion is not necessarily to instead add lang items (as you note later).


One thing that has me concerned here is the interaction with const fn. It seems to me that if you start querying for target features in libcore that will significantly postpone (or possibly render impossible) the constification of libcore.

Those are some compelling numbers! Is there an implementation of SIMD-accelerated utf-8 validation that we could compare against as well to see the kind of possible speedups there?

Here’s a bad idea which is an alternative to lang items at least. We could add a perma-unstable function to libcore which is “tell libcore about cpu features”. Internally it does atomics or w/e to store it in some global, and runtime checks for simd features check this global. The standard library when it starts up would then tell libcore about detected cpu features. The main downside of this approach is that rust-used-as-a-library won’t have a stable way to enable simd acceleration in libcore, only Rust binaries will have a way to do it. Hence the introduction of this not being a great idea.

The RFC was not rejected, but is proposed to be postponed due to a lack of bandwidth

Sorry, you are right. I might have had a different RFC in mind.


That’s a very important concern. IMO, a const fn language feature that needs to pick between compile-time execution or efficient execution at run-time is flawed. This requires users to either be extremely conservative with making functions const fn, since that could prevent future run-time performance improvements, or to duplicate APIs for use in constant or “runtime” expressions.

C++ constexpr had this problem, and they fixed it by allowing const fns to query whether they were executing in a constant or run-time context, such that they can provide different implementations.

I don’t know if C++ solution would be a good solution for Rust, but this is a problem worth solving if we want to prevent an API split.

10 Likes

The other problem I see with this is that, if a binary doesn’t use run-time feature detection, they pay for it when libstd is initialized. libstd already stores a static atomic for caching the features, so if these are not removed, all rust binaries are already paying a price for it, even if they don’t use it. So this might be an acceptable tradeoff.

We could move these atomics to libcore, initializing them always to false, and instead of initializing them properly on first use, do that during libstd initialization. Some of the initialization code needs to do system calls and/or access the file system, so I don’t know whether all of this is available during libstd initialization, but it could work.

Either way, we would then move the functionality to query the active features to libcore, leaving only the initialization code in libstd. This would work for binaries linked with libstd.

The question is whether we want to also allow binaries that are not linked with libstd to provide their own initialization, and how would that work. I suppose that if these atomics are public, and their api is clearly defined, then a no_std binary can just write anything to them, at the beginning of main. Or maybe we could expose the “tell me about target features” API to users. That might work too.

If you’re building a cdylib then there is no place for libstd to run initialization code.

1 Like

I only did some benchmarks last year: https://github.com/killercup/simd-utf8-check

Edit: Oh, maybe I did implement this, too? I honestly don’t recall doing this :smiley:

@comex cdylibs are tricky. First, note that access to the feature-detection runtime would happen in libcore, which doesn’t really have any tools (e.g. dlsym) to handle weak symbols, so libcore cannot query whether the runtime state exists, or whether a libstd function exists, that could be used to initialize these symbols.

In C++, you can initialize global variables on binary initialization and dynamic-library link time:

static int foo = runtime_initialization();

IIRC, there is a segment in the binary where you can have an array of function pointers that get called on initialization and finalization of the binary or dynamic-library by the linker. We can add a function pointer there to some function (e.g. from libstd) that initializes the runtime for cdylibs, when the cdylib is compiled with libstd. This is super hacky, but I don’t know how we could solve this in Rust.

The Rust way would be to use lazy_static!, and initialize this on first access, but libcore cannot do the initialization since it doesn’t know how. Some other code needs to call libcore to do the initialization, and that code needs to be called at runtime somehow, either in life before main, or by the linker when the library is dynamically linked.


EDIT: if libstd is not linked into the cdylib but linked into the final binary, ideally, the cdylib should be able to do proper run-time feature detection. I have no idea how to allow that. While the runtime could be weakly linked to the cdylib, a libcore-only cdylib has no tools to introspect that.

Right not AFAIK libcore does not have any read-write data, and it would be concerning for embedded targets if it did gain even a single field like that.

1 Like

AFAIK this is correct: we only ship read-only data as part of libcore.

Could you elaborate on which problems would this introduce ?

Consider that if libcore only ships read-only data (and code, of course) then, unless an application adds read/write global data (which embedded applications often don’t), the entire image can be put into ROM on the target device, and “execute in place” from that ROM. If libcore has a hard dependency on read-write data, it means that some code must now run before main and zero-initialize .bss and/or copy .data from a ROM section, whereas before this change, on certain targets, the Rust main can directly be the entry point, if the CPU initializes the stack pointer to a useful value on reset, e.g. like Cortex-M cores do.

(But what if the application does include global read/write data itself, wouldn’t that be unsafe? No, since a developer of such an application would set up a linker script such that it would statically reject any attempt to define a mutable static.)

I would personally consider this a breaking change, a regression, and a loss of feature parity with C in terms of deployability on embedded devices.

A possible solution I see that would satisfy everyone is an opt-in flag in target JSON that enables this runtime mechanism.

Run-time feature detection is only possible if libstd is somehow linked into the final binary. That is, the target must support libstd, which requires a relatively featureful operating system.

If the target doesn’t support libstd, then no target-feature detection can be done at run-time, so there is no need for libcore to store anything.

So I’m not quite sure how ROM-only targets come into play here. Are there ROM only targets with libstd support ? If so, libstd already uses read-write memory (e.g. the system being discussed here is already implemented in libstd), so how does libstd currently solve this problem there ?

Are you saying that libcore would include this read/write data only if libstd is also linked in? In that case forget what I said, because I misunderstood you and there is no issue.

Are you saying that libcore would include this read/write data only if libstd is also linked in?

Not exactly, more like, “if libstd could be linked”. For the targets you mention, if this is not possible, that would never happen. I suppose that we can control this via the target-specification file somehow like you proposed as well. Right now, this is all behind cfgs in libstd, so that only the targets that can benefit from it pay for it (also some targets have more features than others, so the caches need to be different as well).

FWIW with the “lang item”-approach, we could support @whitequark use case of doing runtime feature detection in read-only binaries. The only thing that must be available is a:

#[has_target_feature]
fn has_target_feature(feature: &'static str) -> bool;

lang item. When libstd provides this, it can implement it to contain a static cache inside. When a binary that needs to live in read-only memory implements it, it can either always return false (falling back to compile-time feature detection automatically), or actually perform runtime feature detection each time it is queried, without caching the results anywhere, and avoiding having a cache in read-write memory.

This also solves the issue with cdylibs, because each one has to provide this, and the cache would be initialized on first query.

This would also solve the issue for targets without atomics, where either thread local caches could be used, or a mutex, or a global variable if the application is single threaded, etc.

We don’t have to go straight there either. We could do what @alexcrichton proposed first, and move later on to a lang-item based system.

The current implementation is much closer to the lang-item approach, than to the approach where libstd would initialize a cache in libcore during binary initialization. But I don’t know how this weighs in.

1 Like

I remember there was a talk about supporting cargo features for std, so core and std could be the same library, just with “std” feature on or off. And the CPU detection could be a feature flag on its own.

The current RFC proposal seems stalled and lost in the weeds. There has been some recent progress on how to organize the discussion of how to plan for figuring out how to head in that direction. It may very well happn, but I wouldn’t hold my breath for it being a practical solution to this problem in the near term.

.init and .fini, but please don’t do that in any automatic fashion. Doing that would lead to some major surprises when building an ELF object to run in an environment that won’t run those, or when running in an environment that can’t run the detection.

I’d suggest having a way to manually initialize the detection, along with an optional way to say “initialize me on load” that clearly documents the method and implications of doing so.

2 Likes

Thank you all for all the feedback so far. I’ve prepared an RFC that I believe satisfies all use-cases mentioned without any of the drawbacks that have been discussed here: https://github.com/gnzlbg/rfcs/blob/target_feature_runtime/text/0000-target-feature-runtime.md .

Please let me know what you think: I’d like to iterate on this design a bit before submitting it as a proper RFC.

It would be particularly helpful to know if there are any use-cases that are missing. The main use case this RFC does not support is overriding the target-feature detection runtime provided by libstd. We could allow that in a forward-compatible way if we wanted to, but it would be better to motivate that with real use cases, and I personally don’t know of any.

The RFC describes what the whole implementation would look like, but note that we don’t have to stabilize it all at once. We could deliver most of the value by just stabilizing using the target-feature detection macros from libcore, keeping the rest of the API as “unstable”. The rest of the API allows users to provide their own detection run-time, which would allow these macros to do something meaningful in #![no_std] binaries, which is something that can be stabilized later.

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.