Crate evaluation for 2017-07-25: rand


#42

Do they? There’s already a ton of float code in libcore, and it doesn’t seem to cause big problems.


#43

Maybe not then; but isn’t it an issue for platforms without native floats? I’m sure I read a comment to this effect somewhere; can’t find it now.


#44

Yes, there are issues, but there are also work arounds, and in any case two more methods that involve floats won’t make a difference since there’s already dozens.


#45

I’ve been working on an experimental refactor.

There’s a couple of things left to fix, and a few questions. (E.g. many functions on distributions only support static dispatch; should they support dynamic as well?)

This keeps next_f32 / next_f64 for now at least.

Please take a look.


#46

@dhardy, thank you so much for your work pulling together constraints and giving a strawman design!

The libs team had a preliminary meeting, with guest @briansmith, to talk through the known issues with rand and also discussed a strawman design – sadly, this happened just before your initial proposal. However, there’s a lot of overlap with where we landed.

Here’s a sketch of what we discussed:

enum RngError { /* to be determined */ }

trait Rng {
    fn fill(&mut self, dest: &mut [u8]) -> Result<(), RngError>;

    // these defaulted methods allow for specialized, more
    // efficient implementations
    fn next_u8(&mut self) -> Result<(), RngError> { ... }
    fn next_u16(&mut self) -> Result<(), RngError> { ... }
    fn next_u32(&mut self) -> Result<(), RngError> { ... }
    fn next_u64(&mut self) -> Result<(), RngError> { ... }
    fn next_u128(&mut self) -> Result<(), RngError> { ... }
}

trait SecureRng: Rng {
    // At the moment this is just a marker trait, but may grow defaulted methods
    // over time.
}

/// Draws random values directly from the OS. Note that you can make as
/// many of these as you want, meaning that you can freely use the OS rng
/// on multiple threads.
struct OsRng;

// Provide some fast, seedable rng to use for e.g. simulation contexts
struct InsecureRng;

trait Rand<Dist> {
    fn rand<R: Rng>(rng: &mut R, dist: Dist) -> Result<Self, RngError>;
}

impl Rand<RangeFull> for u64 {
    fn rand<R: Rng>(rng: &mut R, dist: RangeFull) -> Result<u64, RngError> {
        /* use `fill` in the obvious way */
    }
}

impl Rand<Range<u64>> for u64 { /* ... */ }

// NB: there is *not* an impl for `RangeFull` on floats, which means you must use
// `random_range` as the convenience (and hence explicitly give the desired range)
struct HalfOpen01;
impl Rand<HalfOpen01> for f64 { /* ... */ }

struct Open01;
impl Rand<Open01> for f64 { /* ... */ }

struct Closed01;
impl Rand<Closed01> for f64 { /* ... */ }

// Convenience functions:

fn random<T: Rand<RangeFull>>() -> T {
    // TBD: what's the right "DefaultRng" here?
    T::rand(&mut DefaultRng, ..).unwrap()
}

fn random_range<R, T: Rand<R>>(r: R) -> T {
    T::rand(&mut DefaultRng, r).unwrap()
}

This design pushes for maximal simplicity in the Rng trait, but does allow for errors (important for cryptographic applications). The SecureRng trait is something that @briansmith in particular believes is desirable in a crytpographic setting; I think given that it’s intended as a subtrait, we could punt on it at the beginning, until we have a better idea for the methods it should provide.

I think the biggest difference between this design and your proposal is that it retains the Rand trait, but parameterizes it over a distribution type. The goals here are similar to yours, and I think the result is comparable to the gen module idea, but would require fewer traits overall. I’m curious to hear what you think.

On a couple of the broader questions you raised:

  • Potential breakage. We didn’t talk about this extensively in the last libs team meeting, but my sense is that putting out a new major release (0.4) with significant changes is desirable and inevitable. The crate has long needed cleaning up.

  • RFC process. While I don’t think an RFC is required for moving forward with a proposal, I think it’d be a great step to take. RFCs have much greater visibility, so we’d have a better chance of hearing from all of the stakeholders. It would also mean we’d have a clear vision on record for the crate, which is something we’ve needed for a while. Finally, if we do want to move to 1.0 and move the crate into rust-lang proper, at that point an RFC is requried. This could essentially be that RFC, but we’d start by moving to 0.4 before finalizing the API.

In any case, @dhardy, the work you’ve been doing is great, and seems well-aligned with both the team’s thinking and stakeholder needs. If you want to lead the charge on turning it into a full RFC, I’m happy to help and provide feedback along the way!


#47

I think it would be nice to add traits for seeding and feeding entropy to the RNG. Maybe something like this:

trait SeedRng: Rng {
    type Seed;

    fn new(seed: Seed) -> Self;
}

trait FeedRng: Rng {
   fn feed(entropy: &[u8]);
}

Not sure how it will be better to deal with whitening and if feed method should accept only &[u8] or have helper methods for other types. Probably both traits should not return Result, although I could be wrong.

Also I think we need rules regarding thread/fork safety. CSPRNG must stay safe in both cases, while for scientific/game PRNGs it could be useful to copy state to produce the same results in different threads/processes. Will it be enough to use SecureRng for that?


#48

Replying via email on my phone, so please forgive the terseness and lack of proper formatting.

Re: whitening. Leaving aside the question of whether we should allow seeding of secure rngs (some have argued that we shouldn’t), I think we should probably say that whitening is out of scope, and that for cryptographic security, the user must provide a uniformly-distributed, cryptographically-secure seed (probably from the kernel or some other source that is known to be secure and produce the proper output distribution). Whitening is a very complicated, poorly understood concept, and I don’t think we should take on the job of trying to do it correctly.


#49

Can’t specialized implementations be done with specialization of the proposed Rand trait instead? Also, like I mentioned in the meeting, I haven’t seen any evidence that it’s actually useful (for crypto RNGs, at least) to specialize these cases, so I’d prefer to leave them out in favor of having the Rng and SecureRng interfaces be simpler and thus easier to analyze and prove correct.

NIT: IMO, Rng isn’t the best name. First of all, the “r” part of the name is already established in the module name, rand, so the R should be dropped in the same way that std::io::Result dropped its IO prefix, and the n part is misleading since it isn’t (just) for numbers; I think rand::Generator would make the most sense. Secondly, I think the name Rng is best reserved for the mathematical concept of “rng”, which are rings without identity.


#50

The same applies to RngError. It can just be rand::Error.


#51

Actually next_f64 is not sufficient to reap all benefits, you need something like fill(&mut [f64]), because dSFMT uses SIMD.


#52

The design of basic crates shouldn’t ignore future planned features: we will have integer generics (and later hopefully well implemented ranged integral values). So you can use them to define a static interval and generate random integral values that are statically known to be only inside that interval. This should allow a match{} on such values that specifies just the values in that interval and nothing else.


#53

Those are meant to actually return a value, not (), right?


#54

I think that the random number generation is the core of it, and it’s the first thing that should get on the track for 1.0.

The Rand trait and its derive is an add-on concept that fits well as a crate of its own, unless I’m missing some coupling that they have? Rngs implementing Rand are one of the most dubious impls, and they can easily be cut if needed.


#55

@aturon thanks for reporting from the libs team.

fn next_u8, next_u128 … — ok. But with SIMD in mind (@vks), current CPUs can probably generate even more bits in one go, though maybe returning Result<[u64; 4]> isn’t so useful.

SecureRng trait seems a good idea.

InsecureRng — maybe FastRng would be better? Forget the “seedable” point — that’s pointless if the underlying implementation may change. IMO this should be a generator with good statistical properties and documented as being such — see for example the comments at the top of this article.

Rand<RangeFull> — this isn’t a replacement for the existing Rand; e.g. floats or that weird Option thing. Maybe that’s okay or maybe not? E.g. withoutboats’ comment.

Also RangeFull isn’t a distribution; e.g. you could implement this for some Option<T> or for f64; the name doesn’t say f64 couldn’t use an exponential distribution. UniformFull might be better?

My refactor deals with this a little differently:

  • Simple generic functions like fn uniform<T, R>(rng: &mut R) -> T
  • Besides being simpler, this is also useful for documentation, e.g. fn codepoint<...>(...) -> char
  • For generic applications, DefaultDist pulls these together, although it suffers from the same problem as the old Rand: the design begs implementation for just about any type without making distribution clear. [To be clear, this is supposed to support f32/f64 but hit a compiler bug.]

@newpavlov did you miss SeedableRng? This isn’t new. About FeedRng: if you want this kind of capability isn’t it better to use OS generation?

@briansmith how are non-crypto RNGs meant to compete on performance without specialisation of next_u32 etc.? rand::Generator seems an appropriate name.

Finally: don’t forget iterators. The existing Rng has gen_iter and gen_ascii_chars. My refactor has fn iter() returning a “pseudo-iterator”, which is slightly more powerful. This could be separated from the main trait: replace Rng::iter(&mut self) with RngIterator::new(&mut rng).

Edit: links now point to generated doc, not source.


#56

I was talking about sketch presented by @aturon which did not have SeedableRng in it. Regarding FeedRng some platforms simply do not have OS CSPRNG (most notably bare-metal) and some users would like to implement/use user-space CSPRNGs (lets leave aside security concerns about such setup). Also I believe Linux has support for feeding OS RNG additional entropy.


#57

Thanks @dhardy! A couple further notes:

  • I’m somewhat ambivalent about the specialized next_foo methods. Since they’re default methods anyway, we could consider not starting with any of them, and then add them over time as use cases arise.

  • Re: RangeFull, you make some very good points, and I do like the uniform function as a way of being much more clear about what’s happening.

  • That said, there are two concerns here re: ranged distributions specifically:

    • Most casual uses of rand involve ranged distributions, and we need to ensure that it’s very easy to get these from the crate itself; if people do a UniformFull and then mod out, they won’t get a uniform distribution. This may just be a matter of providing a top-level function.

    • It’d be nice to use the range syntax for uniform, ranged distributions.

  • I’m fine with most of @briansmith’s suggestions around naming etc., but would prefer to avoid a hard split between crypto and non-crypto generators if we can. Having crypto be a subtrait, in particular, seems appealing to me, if we can make it work well for all parties.

  • Are you up for ultimately putting together an RFC here? What would be most helpful to empower you to do that?


#58

@aturon I found a good solution to the Rand stuff already: implement Rand<D> for T for any distribution D: Distribution<T>, and implement Distribution<T> for Uniform etc. Bikeshed names if you want, but I like the way this turned out. Your Rand<Dist> idea tidied up the code very nicely.

Rng vs Generator etc… meh, but don’t know what to rename OsRng to.

There’s a bunch of discussion in the PR thread about error handling and crypto-vs-non-crypto traits, no real consensus. I think everyone would rather not split the crate into crypto and non-crypto versions, but there’s definitely some differences in requirements.

I could put an RFC together, sure. But (a) there’s still some design issues to try to sort out (namely error handling), and (b) I’m not really sure how to write an RFC that basically asks a hundred design questions and then tries to answer them with a 5000 line library. But I guess I could try.


#59

For a randomized data structure or algorithm, one wold prefer to have the random number generator be infallible, because the data structure itself is probably infallible (perhaps, modulo allocation failures, which don’t get reported as errors or panics anyway). Ideally we’d be able to do this without relying on unwrap() since relying on unwrap() makes it harder to prove (formally or informally) correctness. For crypto applications, people have differing opinions on whether we should abort, panic, or return an error code, but IMO it should be possible to at least recover from a PRNG failure. So, the type signature of even fill() isn’t really the same between crypto and most non-crypto applications.

More generally, I don’t see much commonality between the crypto- and non-crypto uses.

FWIW, I’m currently planning to rename ring’s SystemRandom to SystemGenerator at the same time I rename SecureRandom to SecureGenerator.


#60

I totally agree with @briansmith that the APIs apparently have little overlap, but what about the uses? Sure, games and simulators on the one hand don’t care much about cryptography. But it seems some cryptography users care about distributions. And then you have randomised algorithms which don’t properly fit in either camp.


#61

It seems exceedingly rare that one would want anything other than a uniform distribution from a CSPRNG. Even if one did need it, the implementation based on a CSPRNG would always (AFAICT) involve generating a random value using a uniform distribution that all CSPRNGs output, and then mapping it to target distribution. Thus the CSPRNG implementation doesn’t need to have any knowledge of distributions.

In particular, if I were to implement a general-purpose crypto library and NewHope, I would make a CSPRNG that only supports the uniform distribution in the crypto library, and then add the code for the non-uniform distribution to the NewHope code itself.