Welcome
Username or Email:

Password:


Missing Code




[ ]
[ ]
Online
  • Guests: 18
  • Members: 0
  • Newest Member: omjtest
  • Most ever online: 396
    Guests: 396, Members: 0 on 12 Jan : 12:51
Members Birthdays:
One birthday today, congrats!
Vaxian (17)


Next birthdays
05/21 Dalus (34)
05/21 Kizmo (37)
05/22 Skynet (32)
Contact
If you need assistance, please send an email to forum at 4hv dot org. To ensure your email is not marked as spam, please include the phrase "4hv help" in the subject line. You can also find assistance via IRC, at irc.shadowworld.net, room #hvcomm.
Support 4hv.org!
Donate:
4hv.org is hosted on a dedicated server. Unfortunately, this server costs and we rely on the help of site members to keep 4hv.org running. Please consider donating. We will place your name on the thanks list and you'll be helping to keep 4hv.org alive and free for everyone. Members whose names appear in red bold have donated recently. Green bold denotes those who have recently donated to keep the server carbon neutral.


Special Thanks To:
  • Aaron Holmes
  • Aaron Wheeler
  • Adam Horden
  • Alan Scrimgeour
  • Andre
  • Andrew Haynes
  • Anonymous000
  • asabase
  • Austin Weil
  • barney
  • Barry
  • Bert Hickman
  • Bill Kukowski
  • Blitzorn
  • Brandon Paradelas
  • Bruce Bowling
  • BubeeMike
  • Byong Park
  • Cesiumsponge
  • Chris F.
  • Chris Hooper
  • Corey Worthington
  • Derek Woodroffe
  • Dalus
  • Dan Strother
  • Daniel Davis
  • Daniel Uhrenholt
  • datasheetarchive
  • Dave Billington
  • Dave Marshall
  • David F.
  • Dennis Rogers
  • drelectrix
  • Dr. John Gudenas
  • Dr. Spark
  • E.TexasTesla
  • eastvoltresearch
  • Eirik Taylor
  • Erik Dyakov
  • Erlend^SE
  • Finn Hammer
  • Firebug24k
  • GalliumMan
  • Gary Peterson
  • George Slade
  • GhostNull
  • Gordon Mcknight
  • Graham Armitage
  • Grant
  • GreySoul
  • Henry H
  • IamSmooth
  • In memory of Leo Powning
  • Jacob Cash
  • James Howells
  • James Pawson
  • Jeff Greenfield
  • Jeff Thomas
  • Jesse Frost
  • Jim Mitchell
  • jlr134
  • Joe Mastroianni
  • John Forcina
  • John Oberg
  • John Willcutt
  • Jon Newcomb
  • klugesmith
  • Leslie Wright
  • Lutz Hoffman
  • Mads Barnkob
  • Martin King
  • Mats Karlsson
  • Matt Gibson
  • Matthew Guidry
  • mbd
  • Michael D'Angelo
  • Mikkel
  • mileswaldron
  • mister_rf
  • Neil Foster
  • Nick de Smith
  • Nick Soroka
  • nicklenorp
  • Nik
  • Norman Stanley
  • Patrick Coleman
  • Paul Brodie
  • Paul Jordan
  • Paul Montgomery
  • Ped
  • Peter Krogen
  • Peter Terren
  • PhilGood
  • Richard Feldman
  • Robert Bush
  • Royce Bailey
  • Scott Fusare
  • Scott Newman
  • smiffy
  • Stella
  • Steven Busic
  • Steve Conner
  • Steve Jones
  • Steve Ward
  • Sulaiman
  • Thomas Coyle
  • Thomas A. Wallace
  • Thomas W
  • Timo
  • Torch
  • Ulf Jonsson
  • vasil
  • Vaxian
  • vladi mazzilli
  • wastehl
  • Weston
  • William Kim
  • William N.
  • William Stehl
  • Wesley Venis
The aforementioned have contributed financially to the continuing triumph of 4hv.org. They are deserving of my most heartfelt thanks.
Forums
4hv.org :: Forums :: Computer Science
« Previous topic | Next topic »   

Distribution of random numbers and pseudo random generators

1 2 
Move Thread LAN_403
Bjørn
Mon Jan 01 2007, 03:47PM Print
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
I have made a program that searches for efficient (very fast) algorithms for generating pseudo random numbers. For this to work I need a fast way of detecting good generators and rejecting almost all bad ones.

The method I have used is to take the histogram of ten million random random bytes and sort the histogram. Then compare the sorted histograms resuls from millions of generators that are generated by another program. Then I sort the algorithms based on how well they fit the sorted histogram curve. The best fit should then be a usable algorithm.

What I am wondering is, what is the chance of a very bad pseudo random generator having a very good fit? My feeling is that no bad ones will end up at the top of the list but I can't explain why.

This is the curve I am using to judge the fitness against.


1167666422 27 FT0 Histogram
Back to top
Steve Conner
Mon Jan 01 2007, 04:40PM
Steve Conner Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706
Surely the chance of a very bad generator having a very good fit is about one in ten million? smile
Back to top
Bjørn
Mon Jan 01 2007, 07:42PM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
I can't get that to fit. There is a lot more than 10 million sorted histograms. With just 6 values there are already 11 possible histograms if I counted correctly. So if the resulting histogram was random then there would be very little chance of getting a false match (or a real match).

The resulting histogram will often be fairly flat (simple counters) so that increases the probability a lot, The "shape" of the search space would also affect the result in some way, maybe all except a few algorithms turns out to just flip between two values depending on the properties of the algorithm generator.
Back to top
...
Tue Jan 02 2007, 03:16AM
... Registered Member #56 Joined: Thu Feb 09 2006, 05:02AM
Location: Southern Califorina, USA
Posts: 2445
For a random number generator, wouldn't you want the histogram to be flat? If it has a curve then all of the numbers don't have their fair share... But then again I have been awake for about 40hrs, so take that with a pile of salt wink
Back to top
Bjørn
Tue Jan 02 2007, 06:35AM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
If the curve is perfectly flat then the numbers are not completely random, because then you know that each number occurs exactly the same number of times.

So if you have a sequence of numbers between 1 and 9 like this "6 8 2 3 5 1 7 4 x" and the histogram is perfectly flat you know that the last digit x is 9. That means the sequence is compressible and not completely random.

You might be thinking about the generator having equal probability of generating each number. In that case you are right, but since it is random the resulting diagram is on average very flat but noisy and when it is sorted you get that curve.
Back to top
Steve Conner
Tue Jan 02 2007, 11:12AM
Steve Conner Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706
Well, my reasoning was that if they are random number generators, then the chance of any two of them producing the same sequence is roughly equal to 1 in (however many bits there are in the sequence). So if I then assume that one of them were good, and the other were bad, that is also roughly the chance of a bad generator masquerading as a good one. (Roughly, because a bad generator might not obey the same statistics: if it continuously output the number 50, say, it wouldn't have any chance of matching any other.)

Actually I guess I got it wrong: the answer should be 1 in (2^(however many bits there are in the sequence)) which if there are 10 million 32-bit numbers in a sequence is a very small chance indeed! suprised

About the curve not being flat: I guess that is a result of the central limit theorem. If you average enough random numbers you get a Gaussian distribution centred on the middle of the range of your generator. If you sort the bins of a Gaussian distribution, you will fold it in half and get a S-shaped curve like in Bjoern's diagram.

About the compressibility thing: The sequence you generated is exactly what you get from a simple linear feedback shift register type of generator. A 16-bit LFSR generates all the numbers between 0 and 65534 exactly once before repeating itself. And the algorithm is very compressible- it only takes a few bytes to specify what the size of the register is and where to place the taps. But if you take a sample of the sequence, instead of the sequence as a whole, the randomness of that sample is very good. Even though it is still very compressible: to recreate it you just need the specification of the LFSR, the 16-bit initial state, and the length.

There's a good explanation of this in The Art Of Electronics. In one chapter, they test these PRBS generators and show that the random numbers are good quality. Another way of explaining it is to say that LFSRs are often used in cryptography, and the compressibility I mentioned above is the reason why the code can be cracked, as opposed to a one-time pad which is truly random and hence unbreakable.
Back to top
Bjørn
Tue Jan 02 2007, 12:21PM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
I agree with that.

I have tested all possible two tap LFSR's up to and including 42 bit and some of them are very nice, but the code is usually slow compared to the quality of the numbers. On a FPGA or other gate logic it is reverse and they are very cheap and efficient.

My program has now tried all possible single cycle generators and it worked far better than imagined. The next step is all two cycle generators and that will take one year unless I figure out a way to detect really bad generators very fast. Also the testing of possible good ones must be fast and automatic because there will be a million of them.
Back to top
Carbon_Rod
Tue Jan 02 2007, 02:39PM
Carbon_Rod Registered Member #65 Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155
Don't forget seed pools are an indication of a pseudo random generator. You will see cycles form from the % function often used to make a variable fit.

The standard C random function is far from chaotic. This is why people use those awful unassigned global variables in a seed pool to make sure whatever memory garbage left over from other processes adds an element of chaos to the pool (especially with a memory paging system.)

Some statistical software still uses a less than random system to achieve expected certainties in distribution models.

I like the Lava lamp random number generator project. ( Link2 )

Good luck,
Back to top
WaveRider
Thu Jan 04 2007, 10:31PM
WaveRider Registered Member #29 Joined: Fri Feb 03 2006, 09:00AM
Location: Hasselt, Belgium
Posts: 500
Hi,
I use this package often for generating good random numbers (used in Monte-Carlo and simulated annealing simulators)....if it's any help.

Also, check out the work of Donald Knuth on random number generation...especially "what makes a good random number generator."


]1167949872_29_FT19411_ranlib.c.tar.gz[/file]
Back to top
Bjørn
Sat Jan 06 2007, 10:55PM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
I have made a program that shows the output of the fastest and simplest pseudo random generators possible. Since they are so simple there are not many of them, but surprisingly a handful of them turns out to be interesting.
]1168124129_27_FT19411_random.zip[/file]
Back to top
1 2 

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob

Go to:

Powered by e107 Forum System
 
Legal Information
This site is powered by e107, which is released under the GNU GPL License. All work on this site, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. By submitting any information to this site, you agree that anything submitted will be so licensed. Please read our Disclaimer and Policies page for information on your rights and responsibilities regarding this site.