Crate evaluation for 2017-07-25: rand

So are you advocating use of the ring library for cryptography and rand for other applications? Or rand for crypto and another crate… but moving all the distributions there?

Edit: this seems like a good question to take to an RFC. But it’s good to get people’s thoughts first.

The main thing I advocate for is keeping the CSPRNG API and implementation as simple and crypto-oriented as we reasonably can. I don’t expect every crypto application to use ring. At the same time, I’d rather use and implement ring’s very simple ring::rand API than anything else discussed so far. (I want to extend it with support for personalization strings, additional info, and perhaps reseeding, for some some specialized, but common, crypto uses, but those features seem to be well outside the scope of the present discussion.)

Yeah, a Distribution for T is really just a function from u32 (or u64) to Option. That’s easy enough to do without knowledge of where the input came from. So we should write the distributions in terms of that and then provide helper functions to run a Generator or CryptoGenerator as necessary until a result comes back from the Distribution (or the crypto generator Result is an Err).

To be clear, I don't think such an RFC needs to comprehensively cover the design, but rather lay out major questions, proposed answers, and salient parts of the API surface. I think the proposal you gisted earlier is not far off from what I'd hope to see.

The libs team doesn't have a lot of bandwidth to take this work on right now, so if you're willing to lead the charge here (as you're already actively doing!), full steam ahead. And please reach out if there are questions I can answer from the libs team perspective.

@aturon perhaps you didn’t see this.

I’ve stopped commenting in this thread for the most part because browsing history in discourse threads is much harder than on GitHub.

(I haven’t managed to do much this week anyhow.)

Indeed, I hadn’t! I’ll keep an eye on that thread now. Thanks again for taking this up, and let me know if I can be of any help (happy to review drafts etc)

I have been meaning to chime in on this thread since it was first posted, but have been either too busy or too exhausted or both, and I’m not sure I read all of the various proposals that are floating around, so I apologize if this repeats anything.

The main thing I want to say is that I understand why people like @Lokathor want to avoid getting into the question of whether their design is sound for cryptographic applications — but, I think it’s vital that the default primitive random number generator, the one you get if you don’t do anything special, is an automatically-seeded CSPRNG. This is because people will reach for the equivalent of C’s rand(3) in situations where they need a CSPRNG but don’t realize it, such as generation of HTTP session cookies. Yes, there are better ways to generate HTTP session cookies, but if the rand-equivalent is an auto-seeded CSPRNG, the low bar goes from “catastrophically insecure” to “nontrivial to exploit,” and that’s valuable. Relatedly, there’s reasonably strong evidence that the simulation people should be using CSPRNGs, whether they realize it or not. And I have it from people who have actually implemented CSPRNGs that the state of the art can be as fast, on an amortized basis, as a linear congruential generator(!) so there’s really no reason not to go for cryptographic soundness.

So without getting into the argument over user-space versus operating system RNGs, or reseeding, or whatever (I have opinions about those too, but my parking meter’s about to expire) the fundamental split between “generate random bits” and “generate picks from a distribution” seems sound, and borrowing C++'s notion of pluggable generators also seems sound, but I would banish all non-cryptographic algorithms to a historic_rng crate or something, clearly labeled “use only to reproduce old results,” and I would also make you have to work at it to supply your own random seed.

The other thing I want to mention right now is that, according to an unpublished 2007 manuscript, everyone generates random floating-point numbers in the half-open interval [0,1) incorrectly.

5 Likes

Very interesting on the floats. Thankfully all the current designs and re-designs are set up so that generators are not producing floats directly, so we can write the correct code for that just once.

As to using only CSPRNG types because they’re fast enough, well, if they are that’s great. PCG is about 1.5ns per u32, Xoroshiro is <1ns per u32 though it’s a little weaker on the randomness tests. I don’t know how fast the CS types are, but if they can be in that zone without running out of entropy or whatever then it’s great and we should use them all the time. If not though, I’ll take the faster generator with less secure results for my video games. The article you linked did say that once they weren’t using every single number in sequence the issues cleared up, and that seems like how it would go in a lot of situations (eg: uniformily genning numbers in a range that isn’t a power of two wide will end up throwing out some numbers that land outside the uniform range).

Again, as I mentioned above, a) there’s no such thing as a securely-seeded CSPRNG “running out of entropy,” and, b) it is a very bad idea for us to provide user-land CSPRNGs.

I’m sorry. There is this topic here and also the one on Github, so I lost track a bit. In the Github thread there was a user that insisted several times that the return type of next_u32 should be Result<u32,RngError>, and that is what I meant. If CSPRNGs don’t actually throw errors, then I don’t know what they were talking about.

No worries! I just want to make sure we keep in mind the dangers of user-land CSPRNGs, especially since they’re tempting given their performance (which is much better than the performance of reading /dev/urandom, calling getrandom, etc).

A correctly-seeded CSPRNG should never error. But a hardware RNG might.

Providing a user-space CSPRNG requires care and attention to detail, but I think we should do it anyway, because it’s the only way to get both the performance that most people want, and the randomness strength that people ought to want (whether they know it or not).

And it’s honestly not that hard. For one thing, if the C library provides arc4random_buf, just call that and you’re done. This exists on most of the BSDs nowadays, and I think musl libc has it as well, and there are plans to get it into glibc (but that might take a while, sigh).

If the C library doesn’t take care of it for you, then the hard part is not the algorithm — any modern stream cipher will do just fine — but making sure it’s properly seeded and the internal state doesn’t leak and can’t be duplicated. This is just a bunch of fiddly little details and an obsessive test suite. The only thing that I know can’t be done reliably on Linux right now is protection against someone calling the raw fork or clone syscalls behind your back (you need MAP_INHERIT_ZERO for that).

1 Like

I’m mostly weary since so many people have tried and failed to get it right in the past.

However, I take your point about performance. How about this: We default to getting randomness from the OS directly (thus, the default is secure, but slow), and we provide a CSPRNG implementation that can be used optionally, documenting the usual caveats and warnings (which basically boil down to “only use this if you know what you’re doing”). Many programs will know, for example, that they will never fork, and thus don’t have to worry about fork safety - those programs can use user-space CSPRNGs if they deem it reasonable. However, whatever default we choose has to be safe for all uses, and thus avoiding user-land CSPRNGs by default is the safest choice.

I definitely agree with @zackw that we should include a user-space PRNG, because quite a few users will want something fast, and quite a few will want something deterministic (repeatable given the seed). They shouldn’t have to go elsewhere to find what they’re looking for. I also agree with both of you that the default user-space PRNG should be a CSPRNG.

I wonder if we should maintain a rand-extra crate covering other generators; this would at least allow good code review and documentation on the quality of each generator.

Regarding making the default randomness source the OS, I will certainly raise @joshlf’s point in the RFC.

I’d like to make it possible to replace the generator returned by thread_rng() at run-time (some adaptation probably needed), in which case making the default source the OS makes even more sense. This would be very useful for testing randomised algorithms since the test can override the “random” values. Unfortunately it could mean that a library needing secure random numbers and using the default source could end up being given less securely generated numbers if another library or binary reconfigures the generator.

@zackw I read your “reasonably strong evidence” paper, but it is talking about statistical problems with a few early generators. Nothing about non-crypto PRNGs in general.

@zackw again, I’m surprised you advocate using arc4random.

I am sorry I don't have time right now to do the comprehensive literature review that it would take to prove this to you, but the track record is that every non-cryptographic PRNG eventually fails when the statistical demands on its randomness get strong enough. Crypto PRNGs are not immune to this either, but because they're designed to a much higher standard in the first place, it doesn't happen as easily.

We're stuck with the name arc4random for compatibility's sake, but I had been under the impression that all of the BSDs had replaced the RC4 algorithm by now. If I'm reading FreeBSD's actual code correctly, they replaced the in-kernel RC4 (backing /dev/[u]random) but not the copy in libc, which is disappointing and probably worth a bug report if there isn't one already. Anyway, two laggards (FreeBSD and OSX) doesn't taint the entire API, just needs a short blacklist.

That seems reasonable to me at least as a first stage. We could always get fancier later

Can we avoid having thread_rng at all? The less ambient state the better.

I think a lot of people want some convenience like let x: u32 = random();. This uses thread_rng() internally.

I've been trying to read up on the PRNGs behind /dev/urandom and it appears Linux and BSD have gone different directions. A lot of it is about reseeding and irrelevant to us. This article has a few nice points, but quite a bit of it isn't really relevant here (we can do all the reseeding stuff with a user-space PRNG if we really want, but preferably users should use the OS generator).

I would advocate for at least trying to get people to not do that. We are talking about a total replacement of the crate -- let's start with an API that requires you to create a RNG instance before you do anything, and find out how loud the screams are.

1 Like

For example, I’ve been writing a small DB, and it can be convenient to have random identifiers for new elements. That randomness has to come from somewhere, but it’s not very convenient to have to store a PRNG state somewhere and reference it just for that. Pure-OS generation would also work, but it would be nice to have deterministic tests.