// Truly random seed.
std::mt19937 rng(std::random_device{}());
// Avoid constructing distribution all the time
std::uniform_int_distribution<> dist(0, 100'000);
// Generate 1'000 random numbers in range 0-100'000
for (size_t _ = 0; _ < 1'000; ++_) {
std::cout << dist(rng) << '\n';
There is bit more code than there was with C, but bearably so, and most of the issues are fixed. The distribution will be uniform, all numbers in the desired interval are possible, and the code is thread-safe.
At second glance, <random>
is awesome, even if there is a bit of boilerplate for simple operations. The decomposed and pluggable design means that you can customize your random numbers by replacing only a small part of the random number generation pipeline. The standard also provides a wide range of Random Number Engines and distributions, so you should be able to do most things you want out of the box. It even provides an abstraction for getting actually random numbers for seeding the generators, std::random_device
.
At the third glance, when you've started using <random>
extensively and started digging deeper, you will find out that every single part of it is deeply flawed, and the best solution is to avoid using it completely.
Distributions are nonportable
Did you notice that the text above said
most of the issues are fixed
and then did not talk about portability? That's because both of the snippets, the C one and the C++ one, share one issue. Even if you hardcode the seed, the snippets will give you different results on different platforms. For bonus points, the results are not even guaranteed to be portable between different versions of the same standard library, as the standard library implementations are allowed to change how they implement std::uniform_int_distribution
.
What this boils down to is that if you have repeatability requirements for your generated random numbers, then you cannot use the standard-provided distributions. Luckily, generating random numbers using <random>
is properly decomposed, and you can "just" write your own distributions, and keep using the rest of <random>
, right?
Well...
std::random_device
might not be random, and there is no way to check
The C++ snippet uses std::random_device
to generate some initial randomness to seed our instance of Mersenne Twister in the form of std::mt19937
. The problem is that std::random_device
is poorly specified, and inscrutable.
In theory, it should serve as an abstraction over some external source of entropy. In practice, an implementation is allowed to use any deterministic random number engine to implement it, e.g. a Mersenne twister, and there is no way to find out. There is a member function std::random_device::entropy()
, which is in theory there to detect such case, but it does not work in practice.
The blame for this is shared between the standard and the implementations. The function's full signature is double entropy() const noexcept
, and it is the return type that breaks it. The standard provides a definition of entropy, but it does not provide any sort of guidance on how to count entropy of an external source of randomness, or expected return values for different cases.
This, in turn, caused different implementations to do their own thing. We will take a look at the big three, MS's STL, libc++ and libstdc++.
MS's implementation handles this the best. It knows its random_device
is just a thin wrapper over kernel's cryptographically secure random, so it always returns 32 and inlines the member function into the header to allow for constant propagation.
In order of sanity of implementation, libc++ is next, because it always just returns 0. This return value does not reflect reality, 4 out of 5 possible configurations of libc++'s random_device
use strong random backend, and the last one also provides strong random bytes unless the user deliberately sabotages themselves. The return value also makes libc++'s implementation of std::random_device::entropy
useless, but at least it is obviously useless, so the user is not given false hopes and expectations. There is value in this.
The worst implementation of std::random_device::entropy
can be found in libstdc++. The reason it is the worst is that it is not obviously useless, you have to think about it for a bit to figure out why the return value is useless. This is because, unlike libc++, libstdc++ can return non-zero values. In most configurations, libstdc++ always returns 0, but when it is configured to read from /dev/urandom
(or /dev/random
), it uses RNDGETENTCNT
to check how much entropy the kernel thinks it has available and returns that to the user.
The underlying problem of this approach is TOCTOU. If you first check whether there is enough randomness, and only then ask for that randomness, then by the time you ask for the randomness it could've been depleted, and you cannot get it anymore.
At this point, we know that we will likely have to implement our own distributions, and either implement our own random_device
, or detect which standard library we are compiling against, and hardcode versions that provide good random_device::operator()
implementations. But at least we can still use all the different Random Number Engines provided by the standard library, right?
Well...
There is no way to seed a Random Number Engine properly
The Random Number Engines almost work. But if something only almost works, it is broken.
Let's go back to the first line of the C++ example.
std::mt19937 rng(std::random_device{}());
It seeds a specific version of Mersenne Twister with unsigned int
worth of random data. Let's assume sizeof(unsigned int) == 4
. The internal state of mt19937
is 2496 (624 * 4) bytes. Taken together, this means that for every state we can seed the rng into, there are \(2^{4984}\) states that we cannot seed the rng into.
This has some interesting implications. For example, the program below will never print 7.
int main() {
std::mt19937 urbg(std::random_device{}());
std::cout << urbg() << '\n';
Some output values also uniquely identify their seed. If I tell you that the code program printed 3046098682, then you can quickly find the seed generated by random_device
, and thus predict all future outputs of a Mersenne twister seeded in this way.
In theory, the standard provides a way to seed the Mersenne twister properly. The tool is called SeedSequence, and there is an implementation of it in the standard library, std::seed_seq
. Once again, when you try to use it in practice, it breaks down.
std::seed_seq
is essentially a wrapper over std::vector
that you can give a bunch of randomness to, and then a random number engine can extract (stretched) randomness out. It is used like this:
auto rd_dev = std::random_device{};
std::seed_seq seq{rd_dev(), rd_dev(), rd_dev(), rd_dev()};
std::mt19937 urbg(seq);
This time we initialized our instance of mt19937
with 16 (4 * 4) bytes of randomness. Progress! There are two problems with this snippet though:
There is no way to know how much randomness you have to provide to a RandomNumberEngine T
, and thus how much randomness you have to feed into seed_seq
.
std::seed_seq
is very tightly specified by the standard. The implementation forced by the standard is not a bijection.
A fun fact about 1. is that std::mersenne_twister_engine
provides a member variable you can query to find out how much data it needs. However, this is an accident of standardization, and no other standard-provided random number engine provides a way to retrieve this information.
The second problem means that even if you hardcode seed sizes of all random number engine types your program uses, you still couldn't use std::seed_seq
for initialization, because it loses entropy... here is an example of this on Godbolt:
#include <array>
#include <iostream>
#include <random>
int main() {
std::seed_seq seq1({0xf5e5b5c0, 0xdcb8e4b1}),
seq2({0xd34295df, 0xba15c4d0});
std::array<uint32_t, 2> arr1, arr2;
seq1.generate(arr1.begin(), arr1.end());
seq2.generate(arr2.begin(), arr2.end());
// prints 1 because seed_seq::generate is not a bijection
std::cout << (arr1 == arr2) << '\n';
In other words, even if you write your own type that fulfils the SeedSequence named requirements, you have to hardcode the sizes of your Random Number Engine types somewhere.
Recap
To recapitulate, generating random numbers using C standard library has many problems, with some fixable at great programming effort, and other unfixable. If you are for some reason stuck with just the C library, you should definitely write your own.
Generating random numbers using C++ standard library fixes most of the problems of using the C library. However, the operative word here is most, and it introduces its own problems instead. In the end, whether you can successfully use <random>
depends on your requirements.
If you need cross-platform reproducibility, then you cannot use standard-provided distributions at all, and you have to write your own.
If you need actually random data for whatever reason, you either have to write your own version of random_device
, or hardcode a list of platforms + configurations where you can use std::random_device
.
if you want to properly seed a Random Number Engine, you have to write your own SeedSequence, and also hardcode the required seed sizes of all your Random Number Engines.
My use cases for <random>
usually require cross-platform reproducibility, need properly random seed values, and would prefer fully seeded RNEs. This means that I either have to write 90% of <random>
on my own, or use a different implementation, such as Boost.Random or PCG random utilities...
And I am not the only one. When I was writing a couple of standardization proposals for fixing <random>
, I made an informal Reddit poll asking people about their use of <random>
. The absolute majority of people answered either that they have their own implementation, or use Boost.Random. Few people used other open source libraries, and very, very, very few people use the standard random.
That's it for this post. The next post explores possible avenues for fixing <random>
and making it usable by more people in more domains.