Algorithmic randomness


Algorithmic Information Theory is the study of the amount of information in a sequence of symbols in terms of the smallest program (algorithm) for generating the sequence. For convenience the symbols that make up the sequence are typically limited to 1's and 0's. For example, consider the sequence of 1's and 0's that makes up the binary representation of this web page. Now consider a sequence of one billion 0's. While the second sequence is much longer, we would say that it contains less algorithmic information, because the shortest program that outputs it is much shorter than the shortest program that outputs the binary code of this web page.

Now consider a binary sequence corresponding to the tosses of a fair coin. If we toss the coin one hundred times, we would expect to see a very complicated pattern. Producing this very pattern with a computer program will typically require a program that is pretty much one hundred bits long. In the worst case, we could always have a program like:

print "01011011010010111001010100101101101110110010011101000111110"

which simply has the appropriate sequence built into it, and for most randomly produced sequences this is pretty much the best we could do. This has led to a characterization of randomness in terms of algorithmic information theory. We can say that a sequence is algorithmically random if it has an amount of algorithmic information approximately equal to its length. Note that this is related to, but not exactly the same as our intuitive conception of randomness. Intuitively, we apply the term to processes (like coin tossing) rather than the results of such processes (like the resulting sequence). We would naturally call the process random even if it (freakishly) ended up producing a long string of heads. However, there is clearly a strong connection between intuitive randomness and algorithmic randomness and the name is quite appropriate.

The above definition is even more natural when we move to a discussion of infinite bit strings. There are then several benefits. Firstly, we can drop of the word 'approximately' and replace it with a more precise concept. Secondly, we can do the same for a 'typical sequence'. Finally, while I have avoided mentioning it, the above definitions depend quite intimately on which programming language we use, but when applied to infinite sequences the definition becomes programming language independent.

Greg Chaitin (one of the founders of AIT) has shown how to mathematically construct an algorithmically random sequence and has shown that it possesses a number of remarkable properties. The way to form it is roughly as follows. Take a way of representing programs in binary such they are self-delimiting. The binary representation of a C program would work because it marks its own end with a byte consisting of eight zeros -- something that occurs nowhere else in the program, and thus lets us know that we have reached the end. Then we imagine the process of generating a random program by flipping a coin to generate each bit in turn, stopping when the program ends (i.e. when we form a byte of eight zeros). Now consider the probability that a randomly generated sequence encodes a valid program which eventually halts. This probability is called Ω and if expressed in binary (or in decimal for that matter), its digits form an infinite algorithmically random sequence.

One of the most startling properties of Ω is its unpredictability. Consider a program that takes a natural number n as input and either makes a guess as to the nth bit of Omega ('it is 0' or 'it is 1'), or it declines to comment. Now it is of course easy enough to design such a program to guess a finite number of bits correctly and decline to answer for the rest (the program could just have the first hundred bits encoded into it). But what about programs that make infinitely many guesses? It turns out that any program which makes infinitely many guesses only gets half of them right. Even though it is able to decline to guess on 999 bits out of every 1000, this doesn't help it at all. We could do just as well by flipping a coin or always guessing '0'. As Chaitin says, Ω is 'violently noncomputable'.

There is also an interesting connection between Ω and Hilbert's Tenth Problem. This famous problem asks for a way to show whether any particular Diophantine equation (a polynomial equation where the variables are integers) has a solution. In 1970, Matyasevich built upon the results of Davis, Putnam and Robinson to show that it is recursively unsolvable: there is no program which can solve it. Chaitin has since shown that a closely related question is even harder to solve. There is an exponential Diophantine equation which encodes the bits of Ω: it has a parameter n such that the equation has infinitely many solutions if the nth bit of Ω is 1 and finitely many if it is 0. Thus, no program can do better than chance in determining whether the number of solutions is finite or not.

I should also mention that there is a connection between my work on hypercomputation and on algorithmic randomness. Throughout this summary, I've used the term 'program' to refer to a program in a recursive language, that is, one with the same power as the Turing machine. Many of the theoretical machines (or programming languages) that are more powerful than this can compute the bits of Ω. However, they have their own higher Ω's above them with respect to which they are powerless. To compute these, one would need an even more powerful machine and so on. Thus, we may still see a physical machine in the future that will output the bits of Ω: we don't know whether the laws of physics allows such things or not.


On the existence of a new family of Diophantine equations for Ω
  In 2002, Tien Kieu and I discovered this alternative method of finding Ω (and thus algorithmic randomness) within number theory. We were actually trying to reconstruct Greg Chaitin's proof when we stumbled upon this different, and very illuminating technique. The paper is quite readable, presenting the details of Ω precisely and showing a simple way to construct it within number theory. I am happy to be able to say that this technique has now been included in the latest books by Chaitin and Matyasevich, the latter of whom has further extended our result.

Representations of Ω in number theory: finitude versus parity
  In this companion piece (also written in 2002), we look at things from a slightly different angle, focusing on the similarities and differences between our method and Chaitin's.