RFC use `–cfg reduce_unsafe` to signal preference of safe code over performance

@kornel I understand what you're saying. That isn't what I'm trying to cover. I'm talking about several lines which could be either way but there is a motivating reason towards the unsafe approach; but we want it to be tested and it doesn't hurt to use the safe variant. There shouldn't be any measurable differences besides time.

@josh This would be equally as tested though may be used less in production than the unsafe variation. The default for testing is both followed with an assertion they agree.

@mbrubeck I am primarily interested in the testing and auditing side. None of my projects at least will be production ready for a while; but I'd like the option to choose the safer variants when I don't gain anything from the unsafety in my application.

Any library author who actually wants to provide this functionality can do so already using a cargo feature.

This goal is not realistic. You can't perform a syscall or manipulate the heap without unsafe code.

3 Likes

That's why you have a whitelist, including core, alloc and std; and anything else, like in my case aesni, curve25519-dalek, etc which need SIMD intrinsics. Start with forbid all, then allow what you need, acknowledging the crates which need unsafe along the way. Finding the unexpected cases and either accept, replace, or update those.

Yes, a cargo feature works too but the problem with cargo features is the inability to globally control them in cargo and I didn't want to step on cargo's toes by declaring a global feature in case someone else wanted to use the feature name.

I've kind of wanted something I think is the semantic opposite of cfg(reduce_unsafe). More like:

#[cfg(fast)]

...to signal that given a set of tradeoffs, performance is the highest concern

2 Likes

@bascule Shorter and may capture the intent better. I like.

arrayvec is probably worth getting out of your dep tree if you can

Why?

tinyvec already covers the subset of use cases that don't fundamentally require unsafe.

I'd say sweepingly recommending this over arrayvec for no particular reason is not good advice at all.

It has no benchmarks of any kind that actually give a good idea of how it compares to arrayvec, and relies on T implementing Default (which is a pretty huge restriction).

Just looking at some of the stuff it uses safe code to implement, also (that generally speaking would otherwise be implemented with unsafe code in most similar crates) I pretty sincerely doubt that it's actually competitive performance-wise in all of the areas that matter.

"if you can" and "subset of use cases that don't fundamentally require unsafe" are reasons of their own. If you need the unsafety, then sure, it may not fit your use. But if it does fit, at least in all benchmarking I've seen and done it performs no worse than arrayvec, smallvec, or staticvec. The latter three may be generally more useful, but so is Vec. They're already specializing for smaller collections.

If you need the unsafety, then sure, it may not fit your use.

I don't really know what you mean by this. No user of a crate "needs the unsafety". The crates just typically use unsafe internally to implement certain functions.

But if it does fit, at least in all benchmarking I've seen and done it performs no worse than arrayvec, smallvec, or staticvec.

Are there actually benchmarks out there that include it or can at least be run with it? Trying to adapt it, for example, to even the "extend" suite I have in the StaticVec repo which I in turn adapted from ArrayVec doesn't work because of the non-existence of Default impls for arrays of length > 32.

2 Likes

For the arrayvec vs tinyvec example; if your type implements Default then tinyvec may suffice; otherwise you rely on the unsafe approach used in arrayvec or staticvec notably MaybeUninit.

Default's current restriction of [T; N] where N < 33 should be lifted when const-generics land so this isn't really a fair complaint while you're promoting staticvec. You need nightly for now and tinyvec could bring its own default trait to hack around this but that'd be awkward even with a feature flag.

Edit: You can also use tinyvec::ArrayVec::from_array_len([Default::default(); N], 0) to initialize a tinyvec::ArrayVec of any size while its elements are still Default.

Default 's current restriction of [T; N] where N < 33 should be lifted when const-generics land so this isn't really a fair complaint while you're promoting staticvec .

The point was moreso that arrayvec, while not using const generics still has no such Default-based restriction on any function.

Edit : You can also use tinyvec::ArrayVec::from_array_len([Default::default(); N], 0) to initialize a tinyvec::ArrayVec of any size while its elements are still Default.

I'll try that.

I should have mentioned that does need const-generics. The point there is you don't need the array to impl Default with this initialization method.

I should have mentioned that does need const-generics.

Unless you use macros for the impls, which is what arrayvec and smallvec have done for years.

Anyways, after converting and running the extend suite (sans the write bench as tinyvec does not implement that) we're looking at it being between 613ns to 893ns slower (on my i7-4770) than an average of staticvec, arrayvec, and smallvec for that particular set of functions.

I'm afraid I don't see the upside. It's just eschewing provably correct, time-tested "only-way-to-do-it" unsafe impls of things in exchange for objectively slower safe ones, for no reason other than to cut down on the literal number of unsafe blocks.

Macros only expand for the sizes you tell them to, a PR for each size is annoying, but works...

If the only reason is to cut away the unsafe blocks; this may still be preferred when the performance difference doesn't affect the application or its users. Thus I'd choose safety just in case the unsafe code isn't as sound as we expect is, or to defend against future changes which break it.

1 Like

Macros only expand for the sizes you tell them to, a PR for each size is annoying, but works...

I mean, I exclusively use const generics in staticvec so none of the "specific size" stuff matters at all, but I don't quite get what you mean by "a PR for each size" (versus a macro-based list of sizes such as arrayvec and smallvec use.)

If the only reason is to cut away the unsafe blocks; this may still be preferred when the performance difference doesn't affect the application or its users. Thus I'd choose safety just in case the unsafe code isn't as sound as we expect is, or to defend against future changes which break it.

Fair enough, but we're talking about really basic stuff such as implementations of extend, for which you can look at the arrayvec, smallvec, staticvec, and Vec versions (which all use unsafe but all have pretty short implementations) and quite quickly be like "yeah, that is definitely correct`. There's no air of uncertainty involved.

The list of sizes included in these macros aren't exclusive for the excessive range. They may use feature gated ranges to reduce this cost, this is also annoying and does slow down compilation times. If the size you want isn't in those ranges, then you need to PR it in. I added the option for const-generics to tinyvec to sidestep the problem for my own non-production uses where I'm always using nightly.

Auditing unsafe code isn't just about the code in the unsafe { .. } block, this depends on all inputs and influences of these inputs. Returning to my example before arrayvec etc were introduced: from_utf8. Any code which changes bytes such that they're no longer UTF-8 introduces UB.

Auditing unsafe code isn't just about the code in the unsafe { .. } block, this depends on all inputs and influences of these inputs. Returning to my example before arrayvec etc were introduced: from_utf8 . Any code which changes bytes such that they're no longer UTF-8 introduces UB.

I don't think that's necessarily a great example. A "bare" use of from_utf8_unchecked directly attached to a normal let binding like that, rather than one that's used in the context of some larger (safe, usually, often with a context-relevant debug_assert somewhere within in) function would be pretty odd to see in the first place.

Have two examples just looking at two string processing crates. csv trusts the rest of the code in the file. bstr includes a validation step but doesn't have a debug assertion which calls the widely trusted validation of core::str::from_utf8. Both could have their invariants broken by changes made to safe code.

That's how unsafe scoping works, though. StringRecord encodes as part of the type that the contained byte slices are valid utf-8. As such, it's unsafe to mutate the held byte slices.

All unsafe necessarily at some point relies on the correctness of safe code.

I'm fairly certain neither of these examples would reject a PR to add a debug_assertions check of core::str::from_utf8.

1 Like

Part of the problem here is that it is "safe" to mutate the bytes; but unsound to do so such that it is no longer valid UTF-8. The code does look all sound today, but any future changes to safe-code may break soundness. I have already suggested using debug_assert!() and it may be added. Though BurntSushi isn't interested in the cfg-branching, which is fine, the debug_assert matters more.

All of this seems like something that should happen at an individual crate level.

However, a friend provided me with this handy macro that's like a "more aggressive debug_assert", if you want to be able to toggle part of code from slow and safe to faster and unsafe:

#[allow(unused)]
macro_rules! assume {
  ($x:expr) => {{
    if cfg!(debug_assertions) {
      assert!($x);
    } else if (!$x) & cfg!(feature = "enable_unreachable_unchecked") {
      unsafe { core::hint::unreachable_unchecked() }
    }
  }};
}

This can be used to, for example, eliminate bounds checks that the compiler can't see it should eliminate. Obviously you'd only use it when there's some sort of invariant you understand that the compiler does not, and you'd use aggressive fuzz testing in debug mode to search for the assert firing, and so on and so on. All the usual "this is fire, use it carefully" cautions apply.

Beyond this, I mostly agree with the general pushback of the thread. Because Rust is so precice with types and so leaky with abstractions, the best Safe way to do something and the best Unsafe way to do something will not often be the same. When they are identical to each other from the outside, in that case you could probably use the above assume! macro.

@SlightlyOutOfPhase: my favorite neat feature that tinyvec::ArrayVec can do that other vecs just can't do is that you can take the slice of the unused portion and pass that into your read function and then just bump the length of the vec by however many bytes you read. Which is pretty nifty I think. Obviously not enough on its own to switch over, but still cool.

Also you said "613ns to 893ns slower", but out of how many? is that like 5% slower or 100% slower?

4 Likes