Crate evaluation for 2017-07-25: rand


#1

Crate evaluation for 2017-07-25: rand

For additional contribution opportunities, see the main libz blitz thread.

This post is a wiki. Feel free to edit it.

Links

Needs your help!

Anything that is not checked off still needs your help! There is no need to sign up or ask for permission - follow any of these links and leave your thoughts:

Guidelines checklist

Legend

  • [y] = guideline is adhered to, no work needed.
  • [n] = guideline may need work, see comments nearby
  • [/] = guideline not applicable to this crate

Checklist

Guidelines checklist

This document is collaboratively editable. Pick a few of the guidelines, compare the $CRATE crate against them, and fill in the checklist with [y] if the crate conforms to the guideline, [n] if the crate does not conform, and [/] if the guideline does not apply to this crate. Each guideline is explained in $

  - [n] Crate name is a palindrome (C-PALINDROME)
   - rand backwards is dnar which is not the same as rand

Cookbook recipes

Cookbook example ideas

Come up with ideas for nice introductory examples of using $CRATE, possibly in combination with other crates, that would be good to show in the Rust Cookbook. Please leave a comment in that issue with your ideas! You don’t necessarily have to write the example code yourself but PRs are always welcome.

API guideline updates

What lessons can we learn from rand that will be broadly applicable to other crates? Please leave a comment in that issue with your ideas!

Discussion topics

(List concerns that would be valuable to discuss in the libs team meeting)

Crate issues

(List potential issues to file after the review)

How are we tracking?

Pre-review checklist

Do these before the libs team review.

  • [x] Create evaluation thread based on this template
  • [/] Work with author and issue tracker to identify known blockers
  • [ ] Compare crate to guidelines by filling in checklist
  • [ ] Record other questions and notes about crate
  • [x] Draft several use case statements to serve as cookbook examples
  • [ ] Record recommendations for updated guidelines

Post-review checklist

Do these after the libs team review.

  • [ ] Publish libs team meeting video
  • [ ] Create new issues and tracking issue on crate’s issue tracker
  • [ ] Solicit use cases for cookbook examples related to the crate
  • [ ] File issues to implement cookbook examples
  • [ ] File issues to update guidelines
  • [ ] Post all approachable issues to TWiR call for participation thread
  • [ ] Update links in coordination thread

#2

Welp, here’s a big one. This is scheduled for 7/25 but I suspect that date is going to slip to give this time to percolate. A lot of people have opinions about rand.

I’d urge caution about letting perfect be the enemy of the good - this crate has been around a long time and has suffered from neglect and analysis paralysis. I bet there’s a whole bunch of improvements we can make, even if we don’t solve all the problems.

Let’s see what happens!

@briansmith has previously critiqued this crate here, and I bet there are some other prior critiques that could be dug up.

cc @nagisa @bhickey @bstrie


Rust libz blitz!
#3
  1. The documentation states “Cryptographic security. […] OsRng […] The other random number generators provided by this module are not suitable for such purposes”. However, at least the chacha-based RNG should be suitable, and anyway a cryptographically safe non-OsRng RNG definitely needs to be provided, since reading from /dev/urandom is likely too slow due to kernel overhead.

  2. The RNG returned by thread_rng() should be guaranteed to be cryptographically secure, since that’s the safe default. An unsafe_thread_rng() could also be provided that is faster but cryptographically insecure. This mimics the Rust policy on hash table where the default is secure (note that misusing an insecure RNG is catastrophic, while misusing an insecure hash table merely leads to denial of service).

  3. Maybe there should be a global_rng() / GlobalRng that would be the same as thread_rng() / ThreadRng except global and thus using an Arc<Mutex<<…>> instead of an Rc<RefCell<…>>. This could then be used to initialize the per-thread RNGs without requiring reading /dev/urandom. When the reseeding threshold is reached on a thread_rng, the global rng also needs to be reseeding before using it to reseed the thread rng, to preserve current behavior.

  4. Speaking of the reseeding threshold, it is unnecessary if the RNG algorithm is assumed to be cryptographically strong and the seed kept secret, and if that algorithm is broken or if something can inspect the seed, it’s not clear if reseeding would do any good as the attack could often just be repeated, and 32KB of predictable random data is often going to be enough to catastrophically compromise the system. Could still keep it for extra safety, but should perhaps at least set the threshold based on benchmarking to a value that causes minimal overhead (which might already be the case, not sure).

  5. There seems to be an inconsistency with the Rand implementations for f32 and f64: for all other types, rand picks from an uniform distribution over all possible values, but for f32 and f64, it instead (abstractly) picks from the uniform distribution on [0, 1) and then rounds to the nearest float. I think it might make sense to either remove the implementations of Rand for f32 and f64 or make them generate an u32/u64 and bitcast, and move the current Rand for f32/f64 implementation to HalfOpen01<f32/f64>.

  6. Is the Rand trait needed at all? Could it be removed in favor of using Deserialize and a serde deserializer that just generates random data, thus greatly simplifying the Rand crate and getting random generation for free for any deserializable type? Looking at serde it seems this might be possible, but not completely sure.

  7. How about an AES-based RNG? Many CPUs have hardware AES acceleration, so it might be faster than anything else on those, while obviously being cryptographically secure (with proper constant time implementation).

  8. Maybe there should be some support for virtio-rng when running in a VM without a kernel, but that might be out of scope since it’s a non-standard environment. However, allowing external crates to replace or implement OsRng might be a good idea for this reason.

  9. RNGs, especially cryptographically secure ones, should be designed, if possible, so that retrieving the seed doesn’t allow to determine past random numbers, but only future ones, and whether this is the case should be explicitly documented for the RNG


#4

The current behavior matches that of Java’s Random.nextFloat, Python’s random.random, Go’s rand.Float32, and C++'s std::uniform_real_distribution(). I don’t know of any RNG libraries that would return a uniformly random bit representation of a float, since what you get out of that has a totally bizarro distribution, including a huge number of NaNs.


#5

I think it’s fine for next_f32/next_f64 to do this, but for Rand it’s inconsistent with implementations on other types that return an uniform distribution over all possible values.

The only case where having a Rand (or serde deserializer, if Rand can and is replaced with that) implementaton for f32/f64 is important is for deriving implementing Rand on structs that contain f32/f64, and the only reasonable purpose for that seems to be for randomized testing/fuzzing, and there the random u32/u64 and bitcast seems the correct one, since it can generate all possible inputs.

The Java/Python/Go/C++ examples are type-specific interfaces like next_f32/next_f64 or explicit distributions, as opposed to a generic random generator like Rand.

If the current Rand behavior is desired, one can use a newly introduced HalfOpen01<f32/f64> instead of f32/f64.

It might also be a good idea to rename next_f32/f64 to next_f32/f64_0_1 to make it clear that they don’t work like next_u32/u64.

Also a normal distribution with mean 0 and variance 1 might be a better choice than uniform on [0, 1]) for the “default” f32/f64 generation function, since it’s a more canonical distribution and can actually generate all finite values (though for Rand it’s still incorrect in my opinion). Although there’s the issue that such a distribution leads to overflow when used with addition or multiplication, so it might be dangerous and thus not advisable to expose it as the default.


#6

Also I would like to mention that there is no way to have Range and impl SampleRange for user defined types due to Range fields being private. This makes for counter intuitive api.


#7

The single biggest improvement I’d like to see is the decoupling of the traits from the implementations, such that the no_std ecosystem, which can’t have an OsRng, can use these traits to define their own RNGs.

What do you mean by this? AES can be used as a DRBG for sure, but you still need an entropy source.


#8

I was just reveiwing @briansmith’s rand definition in ring, and quite like it. This looks like a really good, minimal, and correct foundation. Though I think we want to give users more conveniences than that. The method that most casual end users want is more like the existing rand crate’s polymorphic random method.

I’d personally feel really good about paring rand way back to be more like ring’s rand with some conveniences. Leaving most of the other matters touched on by rand today to future work in other crates.


#9

ring is very minimal, it’s core definition being SecureRandom, that has one method.

The equivalent in rand is Rng, which also has a fill_bytes method, though that one does not return a Result type. Rng then has lots and lots of convenient methods on it. I wonder to what extent those convenient methods are enabling that nice polymorphic random function (I’d guess a lot).


#10

Another interesting comparison to draw between ring’s trait and rand’s is that ring’s SecureRandoms fundamental operation is fill_bytes while Rngs is next_u32, and it provides a default method for fill_bytes.


#11

Thanks for the review @jmst.


#12

Ok, just some more musings. One could imagine that a minimal rand crate is just the simple SecureRandom definition from ring, which gives you simple and correct access to random numbers, and then also a polymorphic random function that does the right thing for basic casual use cases.

Some of the consequences of this design might be

  • If you want seedable, deterministic random numbers you would have to go elsewhere, maybe via a different implementation of the SecureRandom trait, in another crate.
  • If you want seedable, deterministic random numbers in some other form than a byte buffer you would have to… maybe random is parameterized, or has another form that takes a SecureRandom.
  • All the conveniences that today’s Rng trait provides, and that presumably feed into that nice random function could be hidden implementation details, or perhaps an extension trait.
  • Performance of a naive implementation of the random function would be bad because it has to ask the OS for bytes in small chunks. This is a big part of the reason the thread local rng exists. So we would probably still need the thread local rng but it could be an implementation detail.

I also want to call attention to this point from @briansmith’s earlier review:

This mixing causes trouble for some applications. For example, Rng has next_f32 and next_f64 methods, which means that it cannot be fully implemented on any system that does not have floating point support and/or which wants to avoid linking in the floating point libraries.

The issue of working in float-free environments is real, and something worth thinking about. But of course the core library also suffers this problem, as it contains float definitions. I’m not sure if we have today a way to cfg based on cpu float capability, but it sure would be nice if the rng is not tied to that.


#13

Ok, just one more extreme idea. It seems viable to rename this crate to oldrand, change this crate to be nothing but the content of ring’s rand module (though maybe renaming SecureRandom to Random), and leaving all other conveniences to battle it out as experiments in other crates (though I do think we would immediately need that polymorphic random function somewhere). Then we would have a stronger foundation at the source of random number generation, and have room to breath about all the other features people might want.


Rust libz blitz!
#14

In principle I’m supportive of this, but I have some concerns around reproducibility. Ergonomic reproducible randomness is nice to have. Haskell’s Random Monad does a good job of this. Without global or thread state, we oblige users who want reproducible results to pass around additional nuisance parameters.

Today thread_rng() offers no reproducibility guarantees and with good reason: if we had seed_thread_rng(...), a callee could smash the caller’s seed. I have a prototype that tracks thread_rng state in a stack and offers rand state safety versus non-pathological callees.

The typical setup is to run AES CTR-DRBG. Seed AES from entropy and then encrypt a counter to generate your stream. While it’s true that AES-NI is pretty fast, the unaccelerated performance is lousy. ChaCha20 might be a good alternative. It isn’t much slower than AES-NI and was designed with low power devices in mind.

This sounds like an exciting area for research. I haven’t seen literature on the subject, but I’m more interested in PRNGs than CSRNGs. IIRC Blum-Blum-Shub should satisfy this property as long as the primes remain secret. A secure hash function might fit the bill, but I’d want to see actual cryptanalysis: n_t+1 = HMAC(n_t, k)


#16

FYI You misquoted me


#17

The folks behind Keccak (selected as SHA-3) talk about PRG use of the sponge construction in section 4.2 of http://sponge.noekeon.org/CSF-0.1.pdf


#18

Oops, fixed, I think. Sorry bout that. I was merging comments and made a copy-paste error.


#19

I’d love if the default trait method was something like filling a buffer and then any conveniences implemented in terms of that.

You probably want to allow users to specialise the from_u32 etc somehow, just so it would be possible to avoid putting the u32s into a buffer and extracting them back out when the RNG internally generates u32s. Filling a buffer should still be a required function, though.

For example RDRAND instruction can natively generate u16, u32 and u64 bit chunks.

Also I’m not sure I’ve seen noted in the thread yet is the huonw’s ages old attempt at redesigning rand https://github.com/huonw/rand/tree/next/src. I’m sure we can draw ideas about distributions and ranges of numbers from the crate.


#20

As with RDRAND, I feel like it is a good idea to have it is as a separate crate. in fact I believe that we should strive to have the bare minimum (interface + thread_rng) in the rand crate and the other generators (weak, hardware accelerated, mixing, all sorts of PRNGs) in external crates. Possibly also split out the distributions (if we’re not putting them into the core rand interface as per huon’s proposal).


#21

I don’t know if this has been discussed before, but both SecureRandom and Rng look a lot like std::io::Read.

IMO the basic primitive of a random number generator should be just a marker trait that inherits from Read.