intro

Computational Techniques in Theoretical Physics Section 7 Computer simulation of site percolation and random resistor network

Simulation techniques:

What is the problem?
The most important part is physical idea and insight.
Mathematical model:
Physical problem to Computer language starts from a mathematical formula.
Computer algorithm:

There are many ways to send an instruction. The most efficient one saves you time and money!

Master the computer languages (C, Fortran etc.).

Efficient instruction, experience is essential.

Example: Random number generator
Mathematical formula:
Xn+1 = (a Xn + c) mod m, n > 0,
Efficient C algorithm by choosing m=2^e:
x = a * x + c;

Any mistakes in your instructions?
Debugging tools and methods.

What is the conclusion?
Analysis and interpretation of output data.

Computer Simulation of Site Percolation

Issues of algorithm development:

End effect

Labelling scheme

Formula for physical quantities

Generation of random configurations

Statistical errors

Debugging

(1). Periodic boundary condition:
Periodic boundary condition is very often used when dealing with a finite system.

Usually reduces the influence of finite system.

But it can not really eliminate the effect of finite sizes.

In one dimension:
let us label the sites as 1, 2, 3, ... L. Then the left neighbor of site 1 is site L, and the right neighbor of L is 1. You can view the system as on a ring, rather than on a finite straight line segment.

In two or higher dimensions:
Each direction is treated just like that of one dimension.
In two dimensions, the topology due to periodic boundary conditions is that of a torus.

(2). Cluster labeling algorithms

Percolation deals with clusters in a random pattern. Given a configuration which specifies the occupation of sites, we need to know which set of sites forms a single cluster.
If we count the clusters of size s, we can estimate ns. The maximum size s can be used to estimate P.
Let's first concentrate on the problem of labeling the cluster.
We are given a configuration or state stored in a one-dimensional array:

s(1), ... , s(2), ... , s(3), ... , s(N-1)
N is some fixed number, which is the total number of sites of the system.
Our task is to give a label to each site, such that each cluster has a unique label, as indicated in the figure:

This is more or less a standard problem of graph traversal.
There are at least two basic algorithms to the problem. We look at the simplest one:

Recursive function calls

Tends to be short and elegant (easy to program)

Less efficient both in computer time and memory (because of the overhead in handling the stack)

Algorithm: Cluster identification with recursive function.

#define Z 4                                              the coordination number is twice the dimension
#define L 16                                            the system linear size
#define N (L*L)                                     the total number of sites (two-dimensional)

void label(int s[], int number[])        This function goes over each site
{                                                                   and checks whether the site has been labeled.
   int i, inc = 0;                                         If it is not and is not empty,
                                                                      it calls color to label the site
   for(i = 0; i < N; ++i)                           and those sites connected to it.
   number[i] = 0;
   for(i = 0; i < N; ++i)
     if(s[i] == 1 && number[i] == 0)
       color(s, number, i, ++inc);
}

void color(int s[], int number[], int i, int inc)   The color function labels
{                                                                                        the site i if it is occupied
   int nn[Z], j;                                                                and labels its neighbors too.
                                                                                          The function is used recursively.

if (s[i] == 1 && number[i] == 0 ) { Empty sites have label zero.

number[i] = inc;
neighbor(i, nn);
for(j = 0; j < Z; ++j)
color(s, number, nn[j], inc);

}
}

void neighbor(int i, int nn[ ])                                   Find the neighbors of site i
{                                                                                         Sites are named by number
                                                                                           from 0 to N-1
int j, r, p, q;                                                      This function works for any
                                                                                           hypercubic lattice.
                                                                                           The Z neighbors of site i
                                                                                           are returned

r = i;                                                                   in the array nn
p = 1 - L;
q = 1;
for(j = 0; j < Z; j += 2) {
       nn[j] = (r + 1) % L == 0 ? i + p : i + q;
       nn[j+1] = r % L == 0 ? i - p : i - q;
       r = r/L;
       p *= L;
       q *= L;
   }
}

(3). Physical quantities to calculate and finite-size effects:

What to calculate in a typical simulation is a difficult decision you have to make.
General guidelines:

You don't want to handle too many data.

In cases you don't know which quantity is the most appropriate to the problem at hand, you have to make a compromise.

Physical quantities for the site percolation problem:

cluster number function n_s

percolation probability P

the average cluster size S

pair-correlation function g(r)

Finite size effect on these quantities:

Since we cannot simulate an infinitely large system on a finite memory computer in finite time, we must do some extrapolation:

Calculating everything on a finite system, observing the changes of quantities as the system size changes, and finally trying to predict the value at infinite-size limit.
The finite-size scaling theory deals with this problem. Fortunately, many quantities tend to their infinite-size limits very quickly.
We can choose a large system (say L=100 or even 1000) to overcome the finite-size effect.
Is this good enough?
In many cases, simply taking a large system does not work.
What is a good estimator for P from a computational point of view?
We can approximate P as:

P_L = <s_max> /L^d
where s_max is the size of the largest cluster in the system.
The percolation probability is
P = lim_{L-> infinity} P_L.

Let us check on P_L:

For p < p_c we have:
P = 0,
P_L approaches zero as L^{- d/2}.

At the percolation threshold p=p_c,
the finite-size scaling theory predicts
P_L ~ L^-beta/nu -> 0.

Above the percolation threshold p > pc,
P_L approaches the infinite size limit P exponentially:
P - PL ~ exp(- c L^d)

The average cluster size can be calculated from:

S = 1/(p - P) L^d sum_i' c_i²
where c_i is the size of i-th cluster in the system.
The summation is carried out over all clusters except the largest, indicated by the prime '. Note that P = 0 for p < pc.

(4). Generation of random configurations:
A simulation procedure goes as follows:

We first generate a random percolation configuration by assigning 1 for occupation, and 0 for empty at each site.

Compute physical quantities from this configuration and store them in some places.

Generate the next configuration and repeat the computation and storage.

After N configurations, compute the average and statistical errors etc.

How to generate a site with occupency probability p?

This is achieved by comparing p with a uniformly distributed random number r.
If r < p, we occupy the site; otherwise we leave it empty.
The rest of the process:

When all of the L^d sites on the lattice have assigned values, we start to analyze the configuration.
This is one Monte Carlo step. When the analysis of the configuration is finished, we forget or erase the old configuration and generate a new configuration and repeat the process.
In each Monte Carlo step, we obtain some value A_i.
Our method is in fact a Monte Carlo method in its simplest form, namely a naive simple sampling method (as oppose to importance sampling method, e.g. Metropolis algorithm).

The average of any quantity to be calculated is:

<A> = 1/N sum_{i=1, ...,
N} A_i,

after N Monte Carlo steps.

(5). Accuracy of simulation method and statistical errors:

Why do we bother to generate many configurations?

The reason is accuracy.
The Monte Carlo method is a lousy method in terms of accuracy.
In order that the numerical estimate <A> approaches the exact value A, you should do infinite number of steps.

N -> infinity.
Unfortunately, we don't have infinite amount of computer time, and we cannot afford to wait. Thus the result you obtained is accurate up to an error of order N^-1/2

Statistical error:
<A> - sigma/N^-1/2 < A < <A> +sigma/sqrt{N}
when N >>1
where we should have written <A> as < A >_N since it is the estimate using N samples.
The quantity sigma² is the variance of the random variable A_i:
sigma² = <A²> - <A >² .
The actual value depends on the nature of A_i.
The above formulae are the basic results of probability theory. It tells us that if you want to improve your result by one more decimal digit, you'd better increase your computer efforts by a 100-fold.
It appears that Monte Carlo method is a very inefficient method.
But this method sometimes is the only method practical.
Do we have error-free alternative?
Taking our percolation problem again as an example, we can calculate the quantity A without error by the formula (assuming no roundoff error in your calculation)
<A> = sum_{all conf.} A({n(i,j)}) p^m (1 - p)^{Ld - m}
where Ld=L^d.
Unfortunately, there are just too many terms in the summation. It is impractical to use this formula!

Since each site has two states, either empty or occupied, the total number of possible configurations is 2Ld. If we have a system of 10 X 10, the number of configurations is 2¹⁰⁰ = 10³⁰.
Suppose that your program can run at a speed of one microsecond for each site, (including generating the state and performing multiplication), we can finish one configuration in 10^-4 second. It takes 10²⁶ seconds to finish the sum over all configurations. Since a year is about 3 X 10⁷ seconds, it takes about 10¹⁸ years. The age of the Universe is believed to be 10¹⁰ years. If you had already started your calculation when the Universe was born, you would only have finished 1/10⁸-th of the whole project by now!
Thus, we cannot even try a small size like 10 X 10 by a brute force exact enumeration. The largest size you can do this way is probably 4 X 4 or 5 X 5.

Because of the finite-size effect, small size results are not very useful anyway. Now you should believe that Monte Carlo method is very powerful.

(6). Debugging
Your simulation result is meaningless if your program does not work correctly. So, it is absolutely important that your program is bug free.
Beside studying your code carefully, you can test in various ways.

Test it against known result.

If the problem under study has exact solution in certain cases, use the exact result to test your program.

Try to calculate exact result on small systems (like 4 X 4), and compare it with simulation result.

Look for special limit at which series-expansion solution is easily obtainable. Use analytical result (may be approximate) to compare with simulation result.

Write a separate, independent simulation program in a way to avoid bugs

that is, use simplest possible method, and straightforward algorithm. In a typical production program, efficiency is a major concern. But here we sacrifice efficiency for correctness of the program.

Try different random number generators.

Write your code defensively.

More concretely, add a line of statement:

assert(condition);
where condition is some logical condition you want to check.
assert() is a macro definition in <assert.h>.
The assertion code can be turned off if you add:
#define NDEBUG
at the beginning of your program.

What to check in your program?
Example:
index i should be greater than or equal to zero but smaller than L.

Homework