intro

Computational Techniques in Theoretical Physics Section 9 Random Variables of Special Distribution

Discrete distribution

Let's say that we want to generate a random variable i taking value 1 with probability p₁, value 2 with probability p₂, ... , value n with probability p_n. The sum of the probabilities is one,

sum _{i=1 to n} p_i = 1

Let xi be the uniformly distributed random number in the interval (0,1). The random variable i distributed according to probability p₁, p₂, ..., p_n can be generated by taking a random number xi and deciding a value for i:

if xi < p₁, then i=1;
if p₁ < xi < p₁ + p_2, then i=2;
if sum _{k=1 to m-1} p_k < xi < sum _{k=1 to m} p_k, then i = m.

The most common case is n=2.

Another special case p₁ = p₂= ... = p_n = 1/n (the random variable i taking value 1 to n with equal probability) can be obtained with a simple operation:

i =[xi] + 1,

where [ ] represents floor or truncation to integer.

Let's look at another example. We want to sample the geometric distribution:

p_i = 2^-i, i = 1,2,^...

According to the above method, we should set:

i = 1 if xi < 1/2;
i = 2 if 1/2 < xi < 1/2 + 1/4;
i = k if sum _{j=1 to k-1} 2^-j< xi < sum _{j=1 to k} 2^-j.

We can express the condition more concisely as:

i = - [ ln xi / ln 2 ] + 1

The random variable i may also be generated by the following C code:

f = 0.5;
i = 1;
r = drand48( );
while( r < f) {
f = 0.5 * f;
++i;
}

You may convence yourself that this indeed generates desired distribution. Since the probabilities decrease rapidly for large i, the chances are good that the loop terminates quickly. In fact two tests are required, on the average, before it does. The decision on which method to use to sample the geometric distribution depends on the computer used. If evaluations of logarithms are slow, it is better to use the second multiplicative method.

Continuous distribution

Consider the distribution of a single random variable x taking real values in some domain. Its probability density is p(x). Probability density is defined such that the probability for random variable x taking values between x and x+dx is p(x),dx. The distribution function F(x) is defined by the probability that the random variable x is less than or equal to a given value x0,

P(x < x0) = F(x0) = int _{-infinity
to x0} p(x) dx.

Since F(x) is a probability, we have 0 < F(x) < 1. The distribution function is a nondecreasing function of its argument.

The random variable x with probability density p(x) can be generated with the formula

x = F^-1(xi),

where F^-1(xi) is the inverse function of the distribution function F(x), and xi is a uniformly distributed random number in (0,1).

Let's prove that x is indeed distributed according to p(x). Since there is a one-to-one correspondence between x and xi, the probability for x taking values between x and x + dx is the same as for xi between xi and xi + dxi. Since xi is uniformly distributed, the probability in this interval is just dxi. And since xi = F(x), we have the probability for x between x and x+dx as

1 * dxi = dF(x) = F'(x) dx = p(x) dx.

Example:

Generate x according to exponential distribution

              e^-x,   if x > 0
p(x) = {
               0 ,     if x < 0 .
The distribution function

F(x) = int _{-infinity to x} p(x) dx
        = int _{0
to x} e^-x dx
        = 1 - e^-x, x > 0.
The inverse function is x = - ln( 1- xi). You can also equivalently generate the exponential distribution with the formula x = -ln xi. Why?

Homework