Discrete distribution
Let's say that we want to generate a random variable i
taking value 1 with probability p1, value 2 with probability
p2, ... , value n with probability pn. The sum of
the probabilities is one,
sum i=1 to n pi = 1Let xi be the uniformly distributed random number in the interval (0,1). The random variable i distributed according to probability p1, p2, ..., pn can be generated by taking a random number xi and deciding a value for i:
if xi < p1, then i=1;The most common case is n=2.if p1 < xi < p1 + p2, then i=2;
if sum k=1 to m-1 pk < xi < sum k=1 to m pk, then i = m.
Another special case p1 = p2 = ...
= pn = 1/n (the random variable i taking value 1 to n with equal
probability) can be obtained with a simple operation:
i =[xi] + 1,where [ ] represents floor or truncation to integer.
Let's look at another example. We want to sample the geometric
distribution:
pi = 2-i, i = 1,2,...According to the above method, we should set:
i = 1 if xi < 1/2;We can express the condition more concisely as:i = 2 if 1/2 < xi < 1/2 + 1/4;
i = k if sum j=1 to k-1 2-j < xi < sum j=1 to k 2-j.
i = - [ ln xi / ln 2 ] + 1The random variable i may also be generated by the following C code:
f = 0.5;You may convence yourself that this indeed generates desired distribution. Since the probabilities decrease rapidly for large i, the chances are good that the loop terminates quickly. In fact two tests are required, on the average, before it does. The decision on which method to use to sample the geometric distribution depends on the computer used. If evaluations of logarithms are slow, it is better to use the second multiplicative method.
i = 1;
r = drand48( );
while( r < f) {
f = 0.5 * f;
++i;
}
Continuous distribution
Consider the distribution of a single random variable
x taking real values in some domain. Its probability density is p(x). Probability
density is defined such that the probability for random variable x taking
values between x and x+dx is p(x),dx. The distribution function F(x) is
defined by the probability that the random variable x is less than or equal
to a given value x0,
P(x < x0) = F(x0) = int -infinity to x0 p(x) dx.Since F(x) is a probability, we have 0 < F(x) < 1. The distribution function is a nondecreasing function of its argument.
The random variable x with probability density p(x) can
be generated with the formula
where F-1(xi) is the inverse function of the distribution function F(x), and xi is a uniformly distributed random number in (0,1).x = F-1(xi),
Let's prove that x is indeed distributed according to
p(x). Since there is a one-to-one correspondence between x and xi, the
probability for x taking values between x and x + dx is the same as for
xi between xi and xi + dxi. Since xi is uniformly distributed, the probability
in this interval is just dxi. And since xi = F(x), we have the probability
for x between x and x+dx as
1 * dxi = dF(x) = F'(x) dx = p(x) dx.Example:
Generate x according to exponential distribution
e-x, if x > 0The distribution function
p(x) = {
0 , if x < 0 .
F(x) = int -infinity to x p(x) dxThe inverse function is x = - ln( 1- xi). You can also equivalently generate the exponential distribution with the formula x = -ln xi. Why?= int 0 to x e-x dx
= 1 - e-x, x > 0.