I’ve always been interested in the idea of truly random phenomenon, and in particular, the quantum effects like avalanche noise that one can take advantage of to generate truly random data. Note that I say truly random, as compared to the pseudo-random data one would get from /dev/random or /dev/urandom.
Most computers get their entropy from timing physical parameters available to them. Timings of ethernet device drivers, intervals between keyboard key presses, or mouse movements are a few options there. These are all based on inputs of some kind available to a computer, which if known, can potentially cause predictable “random” numbers, or be externally influenced, to again, potentially cause predictable “random” numbers.
Now, take the above with a grain of salt. A lot of very smart people have spent a lot of time working on making the pseudo-random number generators that most computers rely on, for example, when you visit a HTTPS enabled web site, very secure. However, they are inherently based on an algorithm fed by given inputs, and will never be TRULY random.
Recently, I burned through enough of the personal project backlog, and had been thinking about it enough, that I decided to make a device, say a USB key, that would generate truly random numbers, that one could use to feed a computer’s entropy pool and, in theory, increase the security of any cryptographic functions on that computer.
So I did.
As a side note, this is the first device I’ve made to directly support USB. It’s really cool to see a USB device show up in the info list with my details!
This device, relies on quantum tunneling to create a truly random noise source, which is sampled by a microcontroller, delivered to the computer, whitened, and passed into the kernel’s entropy pool.
Specifically, a transistor’s P-N junction is reverse biased (the voltage is hooked up the wrong way), and normally, this means no current will flow. However, quantum tunneling will occasionally allow electrons to jump across the barrier, which creates a voltage on the other side. As the electrons are randomly jumping across the barrier, the voltage on the other side also randomly fluctuates. If you sample that voltage every so often, you can use that to generate random data.
However, random doesn’t mean that the data isn’t biased. In the case of my device, data is biased towards 1’s. But imagine a geiger counter. Most of the time, it’s not reading anything, and will be heavily biased towards 0’s, until a particle of radiation comes in, and you get a 1. It’s still very random data, but it’s biased. Unfortunately, computers want the data to be BOTH random and unbiased, so we need to do a little bit of processing on the random data to remove the bias.
Now, as if John von Neumann wasn’t smart enough, he came up with a very simple way to remove bias from an input stream. If the bits are the same (1 & 1 or 0 & 0), throw them out. If the bits are different, use the first bit as the output. So in my case, the preponderance of 1’s gets thrown away, and we start caring about where the 1’s and 0’s meet.
After you take care of that, you now have an unbiased AND random stream of binary data, which can be fed into the kernel entropy pool. BUT!!!!! Before you do so, you should make sure that the data really is random. The whole project is for nothing if it isn’t!
Let’s use a helpful little tool called “ent” to test the entropy of our output data! (There are other tools like dieharder that are excellent at testing data, but their output is less blog post friendly, needless to say the data sees good results with these as well).
Entropy = 1.000000 bits per bit.
Optimum compression would reduce the size
of this 2215723008 bit file by 0 percent.
Arithmetic mean value of data bits is 0.5000 (0.5 = random).
Monte Carlo value for Pi is 3.129546186 (error 0.38 percent).
Serial correlation coefficient is 0.017544 (totally uncorrelated = 0.0).
So, we can see through a number of tests here, that the data looks to be pretty random, and the program estimates that there is one bit of entropy for every bit in the data stream! Very random!
Also, here is an (enormous) plot a buddy of mine threw together with the random data. It *also* looks pretty random!
EDIT: Additionally, check out the post regarding the new revision here: New RNG Revision