Distribution of random numbers and pseudo random generators / Computer Science / Forums

Forums

4hv.org :: Forums :: Computer Science

« Previous topic | Next topic »

Distribution of random numbers and pseudo random generators

1 2 next

Move Thread

LAN_403

Bjørn

Mon Jan 01 2007, 03:47PM

Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058

I have made a program that searches for efficient (very fast) algorithms for generating pseudo random numbers. For this to work I need a fast way of detecting good generators and rejecting almost all bad ones.

The method I have used is to take the histogram of ten million random random bytes and sort the histogram. Then compare the sorted histograms resuls from millions of generators that are generated by another program. Then I sort the algorithms based on how well they fit the sorted histogram curve. The best fit should then be a usable algorithm.

What I am wondering is, what is the chance of a very bad pseudo random generator having a very good fit? My feeling is that no bad ones will end up at the top of the list but I can't explain why.

This is the curve I am using to judge the fitness against.

Steve Conner

Mon Jan 01 2007, 04:40PM

Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706

Surely the chance of a very bad generator having a very good fit is about one in ten million?

Bjørn

Mon Jan 01 2007, 07:42PM

Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058

I can't get that to fit. There is a lot more than 10 million sorted histograms. With just 6 values there are already 11 possible histograms if I counted correctly. So if the resulting histogram was random then there would be very little chance of getting a false match (or a real match).

The resulting histogram will often be fairly flat (simple counters) so that increases the probability a lot, The "shape" of the search space would also affect the result in some way, maybe all except a few algorithms turns out to just flip between two values depending on the properties of the algorithm generator.

...

Tue Jan 02 2007, 03:16AM

Registered Member #56 Joined: Thu Feb 09 2006, 05:02AM
Location: Southern Califorina, USA
Posts: 2445

For a random number generator, wouldn't you want the histogram to be flat? If it has a curve then all of the numbers don't have their fair share... But then again I have been awake for about 40hrs, so take that with a pile of salt

Bjørn

Tue Jan 02 2007, 06:35AM

Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058

If the curve is perfectly flat then the numbers are not completely random, because then you know that each number occurs exactly the same number of times.

So if you have a sequence of numbers between 1 and 9 like this "6 8 2 3 5 1 7 4 x" and the histogram is perfectly flat you know that the last digit x is 9. That means the sequence is compressible and not completely random.

You might be thinking about the generator having equal probability of generating each number. In that case you are right, but since it is random the resulting diagram is on average very flat but noisy and when it is sorted you get that curve.

Steve Conner

Tue Jan 02 2007, 11:12AM

Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706

Well, my reasoning was that if they are random number generators, then the chance of any two of them producing the same sequence is roughly equal to 1 in (however many bits there are in the sequence). So if I then assume that one of them were good, and the other were bad, that is also roughly the chance of a bad generator masquerading as a good one. (Roughly, because a bad generator might not obey the same statistics: if it continuously output the number 50, say, it wouldn't have any chance of matching any other.)

Actually I guess I got it wrong: the answer should be 1 in (2^(however many bits there are in the sequence)) which if there are 10 million 32-bit numbers in a sequence is a very small chance indeed!

About the curve not being flat: I guess that is a result of the central limit theorem. If you average enough random numbers you get a Gaussian distribution centred on the middle of the range of your generator. If you sort the bins of a Gaussian distribution, you will fold it in half and get a S-shaped curve like in Bjoern's diagram.

About the compressibility thing: The sequence you generated is exactly what you get from a simple linear feedback shift register type of generator. A 16-bit LFSR generates all the numbers between 0 and 65534 exactly once before repeating itself. And the algorithm is very compressible- it only takes a few bytes to specify what the size of the register is and where to place the taps. But if you take a sample of the sequence, instead of the sequence as a whole, the randomness of that sample is very good. Even though it is still very compressible: to recreate it you just need the specification of the LFSR, the 16-bit initial state, and the length.

There's a good explanation of this in The Art Of Electronics. In one chapter, they test these PRBS generators and show that the random numbers are good quality. Another way of explaining it is to say that LFSRs are often used in cryptography, and the compressibility I mentioned above is the reason why the code can be cracked, as opposed to a one-time pad which is truly random and hence unbreakable.

Bjørn

Tue Jan 02 2007, 12:21PM

Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058

I agree with that.

I have tested all possible two tap LFSR's up to and including 42 bit and some of them are very nice, but the code is usually slow compared to the quality of the numbers. On a FPGA or other gate logic it is reverse and they are very cheap and efficient.

My program has now tried all possible single cycle generators and it worked far better than imagined. The next step is all two cycle generators and that will take one year unless I figure out a way to detect really bad generators very fast. Also the testing of possible good ones must be fast and automatic because there will be a million of them.

Carbon_Rod

Tue Jan 02 2007, 02:39PM

Registered Member #65 Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155

Don't forget seed pools are an indication of a pseudo random generator. You will see cycles form from the % function often used to make a variable fit.

The standard C random function is far from chaotic. This is why people use those awful unassigned global variables in a seed pool to make sure whatever memory garbage left over from other processes adds an element of chaos to the pool (especially with a memory paging system.)

Some statistical software still uses a less than random system to achieve expected certainties in distribution models.

I like the Lava lamp random number generator project. (

)

Good luck,

WaveRider

Thu Jan 04 2007, 10:31PM

Registered Member #29 Joined: Fri Feb 03 2006, 09:00AM
Location: Hasselt, Belgium
Posts: 500

Hi,
I use this package often for generating good random numbers (used in Monte-Carlo and simulated annealing simulators)....if it's any help.

Also, check out the work of Donald Knuth on random number generation...especially "what makes a good random number generator."

]1167949872_29_FT19411_ranlib.c.tar.gz[/file]

Bjørn

Sat Jan 06 2007, 10:55PM

Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058

I have made a program that shows the output of the fastest and simplest pseudo random generators possible. Since they are so simple there are not many of them, but surprisingly a handful of them turns out to be interesting.
]1168124129_27_FT19411_random.zip[/file]

1 2 next

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob