David Ljung Madison Stellar

Verification Engineer Success Stories

Random Number Generators and Verification

Menu: Background | Initial Settings | The Problem | The Solution

Background

Random Number Generators (RNGs) are deterministic calculations, but they (generally) seem random because they use a very "chaotic" calculation to pick the next random number/state based on the current state.

That means that just about any RNG needs to be "seeded" with a starting state. As a simple example, the basic random number generator you find on most systems takes a number (in a fairly massive range) as a seed, and then the next number it gives you is based on that. And the next number, and so on. If you're looking for a number from 1 to 10, then you can just take the last digit of that number, and you're going to end up with a seemingly random sequence of single digit numbers. Then if you want the same "random" sequence, you can start with the same seed and you'll be able to replay those random events.

As you get into better RNGs, the state ends up often being more complex than a single number. As it calculates new numbers, it updates the state that is behind-the-scenes.

But what if it's still seeded with just a single number?

What generally happens is that only a portion of the initial state is set by the seed, and then the amount of "random information" that is kept in the random state grows over time.

And this was where our problem started.

Initial Settings

Very early in my career I was working with a test generator that used an RNG to help create a random test. If you gave it a particular seed, you'd get a particular test - that was one of the advantages of the seeds, you could recreate tests with the same test generator and the same seed.

The first thing this test generator did was setup a bunch of global settings for the test. For example, maybe the system had a bunch of clock settings or I/O settings that were determined by some system register values, and the test generator just came up with random values for these registers so our settings were completely randomized...

The Problem

..except they weren't.

I generated massive amounts of tests and then I'd look at coverage, which helped me understand what I was testing.

And certain pairs of settings would never happen.

In other words, I'd never have, for example, system setting "A" turned on when system setting "B" was turned off. This was with millions of tests where A and B were randomly set and were on and off about half the time, but I never saw specific combinations.

To make things more confusing, I did a random analysis on the random numbers it was giving us, and they were random.

Basically.

So what's going on?

It turns out the problem is in the size of the seed as compared to the size of the random state. The random library was the highest level of random generation that came in the standard library with this OS. The numbers it generated had an excellent random distribution eventually. The problem was that it took a single (32b) number as the seed, and it needed to setup the internal seed state of an array of numbers. So obviously our starting point was limited.

I ended up hacking into this library to watch its initial state evolve over time, and what I saw was that it started with only the last number in the array set to the seed and the rest of the array was zeros. Then as each random number was calculated, more and more bits in the array would be set randomly, but it wasn't until about the 17th random number that one could fully rely on "randomness" in the random state.

So the "chaotic" mathematical calculation that would turn the random state into a single random number was actually initially only use a lot of zeros for it's calculation. You can think of it as this big, complex equation to create a seemingly random number, but many parts of the equation would be turned "off" until those zeros went away.

The consequence was that while the numbers coming out of it seemed random, on the face of it, they had direct bit interdependencies between two different calls - at least until the random state was finally built up.

As an example, bit 14 of the 3rd number, was always the opposite of bit 17 of the 5th number, for every sequence of random numbers that started with any new seed. And this was part of the base RNG that came with the operating system.

And if bit 14 of the 3rd number would turn some feature on, and bit 17 of the 5th number was turning some other feature on, then I would never see both features on at the same time. And hence I had a serious problem with what we were testing.

The Solution

Once I understood what was going on, the solution was pretty simple. I considered making a more complex seed and hacking into the RNG state, but I wanted to use something a little cleaner and a little more portable. So instead I simply started up the RNG by setting the seed, and then asked for a random number a few dozen times, and threw all the results away. At that point the random state had finally become interesting, and our dependency issues were effectively gone, and it was off to the races!

Eventually I switched to better RNGs, such as the mersenne twister, and was able to sleep easily at night, knowing that my tests were going to definitely get a good collection of features tested.

Interested in having someone with devotion to understanding complex problems at your company? Consider hiring me to see how it can be done.