A random number generator for libstd

Finn · September 26, 2018, 6:31pm

Often, during development and implementation of algorithms, one is in need of a basic random number generator. For example, a sorting algorithm may be tested against shuffled arrays.

For convenience and to increase productivity, I think it is a good idea to have a basic random number generator available.

What functionality should it have?

In my opinion an RNG API should at least be able to achieve the following:

to generate random integers of type u8, u16, u32, u64
to generate bounded random integers of type u32, u64
to generate random integers from a range, i.e. to throw a die
to generate random floats in interval [0,1)
to shuffle a slice

There are some subtilities that lead to slight biasing or another form of non-equidistribution.^[1][2] Under this premise, generating bounded integers and random floats are better not achieved by a self-built procedure.

In addition, it should be easy and elegant:

to randomly pick an element from a slice
to fill a buffer with random numbers
to efficiently fill a buffer with random bytes
to create an iterator, producing random numbers

The module should not:

have a generator with state space larger than 256 bit
have any fancy stuff, i.e. ways to produce samples of custom distributions
have a cryptographically secure generator

Such tasks are better achieved by special crates, for example rand and ring.

libstd or libcore?

The floating point RNG might call a function from “libmath”. Preventing a potential dependency hell, it might be better to leave everything in libstd first.

A schedule proposition

We open a challenge. People make proposals which will be criticized.
After some time the most appropriate design will be chosen.
We hold it unstable two years or longer, in order everybody can have an eye upon it.

A first sketch

To start the challenge, this is my first sketch:

sketch1

The API is more important than the internal implementation. But note that the internals can have an effect on the API too. For example, if a generator has 128 bit state space or more than one state space, it might also be seeded by (seed0: u64, seed1: u64).

References

[1] M.E. O’Neill: “Efficiently Generating a Number in a Range”.
[2] Taylor R Campbell: “Uniform random floats: How to generate a double-precision floating-point number in [0, 1] uniformly at random given a uniform random source of bits”.

repax · September 26, 2018, 8:21pm

By not committing to any particular random-number generator, distribution, efficiency, entropy, etc in std everybody is free to pick one that is suitable to their specific needs. I think this is better than having a standard PRNG, which I think could run the risk of being viewed, in some way, as recommended or official.

Cargo makes it very easy to add dependencies in Rust.

mcy · September 26, 2018, 11:17pm

To add to this, for example, a “convenient” RNG might be a simple Mersenne twister (what basically every language uses) but it might be pretty easy to accidentally make it non-cryptographic in the name of performance; even if there’s a big warning in the docs explicitly disguaranteeing cryptographic properties, it will be viewed as a sanctioned source of random numbers.

Having RNG live in its own crate means the implementation is less likely to be seen as recommended for general use, and that crate can avoid being in the six-week-release lockstep rustc, cargo, and std are in.

ahmedcharles · September 27, 2018, 1:04am

Given how easy it is to add dependencies in Rust using Cargo, I’ve come to the conclusion that std should have way less stuff in it rather than more stuff. But that’s never going to happen, so I’ll settle for no new huge additions.

mcy · September 27, 2018, 1:20am

I think std is already about the right size, baring intrinsics that we haven’t gotten yet (e.g. std::simd). Off the top of my head, the only thing that’s a little weird is std::sync::mpsc. Otherwise std should be mostly like libc: basic conveniences, allocation, and syscall wrappers (except without all of the libc footguns, of course).

josh · September 27, 2018, 1:29am

Given that fast CSPRNGs exist, do we have any good reason not to pick a CSPRNG?

The vast majority of RNG users don't require absurdly fast random number generation, and I'd rather have the scenario in which people who need speed at the expense of security need to pull in a "non-cryptographically-secure PRNG" crate, rather than the scenario in which people who need security need to pull in a CSPRNG crate. "I need to go so ludicrously fast that I can't afford a CSPRNG" is an unusual special case.

Also, one more requirement: the RNG needs to accept an (optional) seed to reproduce results later. (Needed for things like game replays.)

mcy · September 27, 2018, 1:43am

They have a point. My opinion: std is not the place for anything with cryptographic guarantees; those should live in libraries curated by domain experts.

That said, I don't thing RNG belongs in std.

CAD97 · September 27, 2018, 1:47am

To get at the root of the issue, why is rand not sufficient, that you want random number generation in std?

Rust takes a very minimal-std approach (or at least tries to); batteries are available for free at crates.io should you need them, rather than packed in with the stdlib. There’s a general consensus that std/core should consist of a) things that can’t be implemented elsewhere* and b) things that 99% of Rust users can benefit from (directly or indirectly). This means core abstractions that the language itself interfaces with and widely applicable ways of interacting with those abstractions.

A library living outside of std is liberating in that it doesn’t have to abide by Rust’s versioning, and you can explicitly list what version of it you need. rand has only one goal: generate random numbers correctly and usefully. std has to worry about so much more.

I agree that an RNG source should default to a CSPRNG if unspecified, for the same reason that std's HashMap defaults to SipHasher instead of something more performant: secure by default. If you need better specific behavior, you know what to ask for.

I also agree that std shouldn’t have to worry about providing chryptographic guarantees, and leave that to domain specialists. Another point towards having rand outside std and as the de facto “standard batteries” for those who need them.

*well ok, core and std are less special than they could be due to the ability to ignore them and be them with #[lang] attributes, but the average user shouldn’t need to handle reimplementing the tautologies that are <u32 as Add> and <Box as Deref> and other such lang items.

ahmedcharles · September 27, 2018, 2:03am

My position is that std should be an interface to the compiler, i.e. it should deal with lang-items, intrinsics, anything that can’t be implemented in stable Rust. I’d much prefer if it didn’t include OS integration, though and ironically, I think it should provide FFI to the relevant C types, which it doesn’t today (note, C types, like c_int, not libc (which should be the C standard library but isn’t… but I won’t rant about that here)). I suppose you could say that Rust has shown me that a minimalist standard library is possible and perhaps desired.

JeffBurdges · September 27, 2018, 3:22am

Afaik, there is no “convenience” or “increased productivity” achieved by merely avoiding the rand crate, which provides all the desired functionality without excessive complexity.

Instead, we should adding links from std's documentation to select high quality crates like rand, or even rustup installing rand's documentation.

We could ask if rand itself has some core that belongs in std, but really orthogonality wins over the “convenience of not googling” here.

As an aside, you could avoid using rand by abusing RandomState. Ironically though, one could easily imagine HashMap being removed from std after an external crate does probabilistic data structures as well as rand does to prngs.

As a further aside, I do think most functionality in std belongs there: Iterators may only interact with the core language through for loops, but they go beyond this in organizing the language. I suppose generators will drag Iterators and Futures deeper into std, but previously.

hjr3 · September 27, 2018, 3:57am

The std lib does provide ffi types. Example: https://doc.rust-lang.org/std/os/raw/type.c_int.html

ciprian.craciun · September 27, 2018, 6:51am

Please do not do this!!!

(I know it is a re-hash of what others have said in earlier replies, but I think this is such an important point, that I feel then need to repeat it...)

Do not introduce a non-cryptographical random number generator as the default in the standard library, especially for a modern language that tries to be "safe" (at least in some regards)... Today any modern OS has a secure random source, and there are enough fast cryptographic secure random number generators out there, that there is no good reason to use them.

scottmcm · September 27, 2018, 8:26am

This scares me politically. I don't want to encourage people sending PRs to link to their crates from std docs, no matter how good that crate is.

kornel · September 27, 2018, 8:48am

Perhaps adding of dependencies could be made easier, more automated instead, so there wouldn’t be a desire to have something in std just because it’s convenient, rather than because it’s essential for the language or interoperability?

For example, Node.js is experimenting with including modules on the fly. The Rust equivalent would be to have:

use rand::random;

Automatically add the rand create as a dependency.

Nemo157 · September 27, 2018, 8:58am

That seems like it’s almost possible now, a custom lint that extends the existing error[E0432]: unresolved import `rand` error and checks if the crate exists on crates.io, then emits a cargo fixable suggestion to add it to the dependencies (hmm, can cargo fix touch the Cargo.toml, or is it limited to the source files currently?).

Finn · September 27, 2018, 1:04pm

Ok, your safety arguments are pretty convincing. Often people are somewhat feather-headed, they could fall into a trap.

Having a cryptographically secure generator as default, finding a canonical API starts to become problematic. Despite that, std would become bloated rather than minimalistic. Furthermore, having half of the functionality in std and the other half in rand is quite cluttered.

Then it seems better to leave RNGs out of std.

mrec · September 27, 2018, 2:46pm

I agree that's a requirement for something, but I think it's also a reason why this shouldn't be in std. You're advocating a CSPRNG. If a weakness is found in a CSPRNG, it's no longer CS, at which point the algorithm needs to be fixed or replaced, at which point you've lost reproducibility from a given seed.

zackw · September 27, 2018, 5:04pm

People who need reproducibility need to fix the algorithm anyway, and that in itself strikes me a inappropriate for std given the existence of rand and the general desire to keep the stdlib small. A stdlib RNG should be like rand::rngs::ThreadRng, IMO: based on a cryptographic generator, but without any guarantee of which cryptographic generator, and not offering any way to explicitly set the seed, thus directing people who need reproducibility to something else.

dhardy · September 28, 2018, 3:43pm

As maintainer of the Rand crate, I’ve seen most of these arguments before — though the question about integration with std remains an interesting one.

We specifically made the rand_core crate fairly minimal. OsRng adds an external source of randomness (entropy) to avoid determinism. thread_rng is a fast auto-seeded cryptographic generator. Whether you need any more functionality really depends what you want to do with your randomness.

Personally I don’t think all this should be included in std itself. The amount of code is non-trivial (OsRng is implemented for many different platforms, we have a backup entropy source and are discussing options for entropy in no_std crates, thread_rng has periodic auto-reseeding with negligible overhead, …).

The caveat is that HashMap requires random keys hence some of this is included in std anyway — as duplicate code. Possibly the best solution to this is for std to depend on rand at some point in the future (though this is a circular dependency) or for HashMap to be moved out of std. Or maybe the required code should be in std, and exposed, and rand should use that instead of duplicating functionality — but this puts quite a bit more code in std and requires more maintenance work in std; still, it is the most straight-forward to implement.

Note that the Rand lib does contain quite a lot of code, though we’re in the process of moving PRNGs out to sub-crates and may also move most of the distribution code outside of Rand. Still, if you only want random data and don’t want to simulate a dice-roll or sample from a normal distribution or shuffle a vector etc. then there is quite a lot of unused code in the lib.

Tom-Phinney · September 28, 2018, 3:53pm

Moving functionality to std just makes it less likely that IoT authors, who generally use no_std (due to their devices’ small memory footprints), will employ adequate sources of randomness. In the long run, requiring std for decent randomness is likely to increase the number of malware-laden captive IoT devices on the Internet, rather than improve the situation.

Topic		Replies	Views
Crate evaluation for 2017-07-25: rand	85	11050	March 25, 2019
What should go into the standard library? libs	13	5747	March 25, 2019
Expansion of standard library libs	46	7161	October 16, 2019
Libsystem, or, the great libstd refactor libs	37	12525	March 25, 2019
Calculating which 3rd party crates are good candidates for "std" inclusion via "left-pad index" libs	42	8017	February 4, 2020