They'd be easy to add, but "fast" floats are pretty much in limbo right now. They're not exposed to stable, and there's no plan to do so. So there's not really a reason for the compiler team to take the cost of having intrinsics for them.
All intrinsics are unstable and implementation details. Many intrinsics have a stable wrapper function, but this doesn't imply that the wrapper will forever be implemented by calling an intrinsic. Many can be rewritten to avoid any intrinsics.
Intrinsics come and go over time. Sometimes the language itself gains a new feature that allows implementing the wrapper without intrinsics. For example std::mem::forget used to be implemented using an intrinsic, but is now implemented using ManuallyDrop. Other intrinsics become unused for other reasons and may be removed. Yet other intrinsics are introduced as implementation detail for a new feature. There are also intrinsics that can only be meaningfully implemented by a single codegen backend. While LLVM is currently the only stable backend, there are several others like cg_clif (created by me), cg_gcc (a GCC backend) and cg_spirv (part of the rust-gpu project). As intrinsics may be added and removed at any time, they won't ever be stabilized and thus can't be used on stable.
I'm not on the compiler team (nor the compiler contributers team) and am not familiar with their thinking on such things.
But generally from a software engineering perspective, if there's not a need to have it for an approved initiative, then it's unclear that adding it is necessary.
One can always use a local fork for experimentation, or for fneg and fcmp there are reasonable proxies that should allow experimentation just fine -- fast_sub(0, x) probably works great for fast_fneg, and you could try x.partial_cmp(y).unwrap_unchecked() for fast_fcmp.
The fast_* intrinsics are unfortunately mostly useless as it is now, since there are many different fast flags and these just enable all of them. So it would be a service to Rust to further this area somehow. I'm not working on it, unfortunately, even though the most interesting direction to explore (to me) would be the safest subset of flags that would still improve autovectorization for certain loops.
Okay, so it sounds like we need to wait for some higher level facade on top of those intrinsics. Having those optimizations working would be really great not only for gaming but also for cpu intensive numerical code. I know there's SIMD but that's a bit different story. Thanks again - will keep track of this issue