Computational Techniques in Theoretical Physics
Section 7
 
Computer simulation of
site percolation and random resistor network
 
 
 

Simulation techniques:
 

What is the problem?
The most important part is physical idea and insight.
Mathematical model:
Physical problem to Computer language starts from a mathematical formula.
Computer algorithm:
 
 
Example: Random number generator

Mathematical formula:

Xn+1 =  (a Xn + c) mod m,     n > 0,

Efficient C algorithm by choosing m=2e:

x = a * x + c;

 

Any mistakes in your instructions?

Debugging tools and methods.
 

What is the conclusion?

Analysis and interpretation of output data.
 
 

Computer Simulation of Site Percolation
 
 

Issues of algorithm development:
 
(1). Periodic boundary condition:
Periodic boundary condition is very often used when dealing with a finite system.
   

In one dimension:

let us label the sites as 1, 2, 3, ... L. Then the left neighbor of site 1 is site L, and the right neighbor of L is 1. You can view the system as on a ring, rather than on a finite straight line segment.

 
 
 
In two or higher dimensions:

Each direction is treated just like that of one dimension.
In two dimensions, the topology due to periodic boundary conditions is that of a torus.

 
 
 
 
(2). Cluster labeling algorithms
 
Percolation deals with clusters in a random pattern. Given a configuration which specifies the occupation of sites, we need to know which set of sites forms a single cluster.

If we count the clusters of size s, we can estimate ns. The maximum size s can be used to estimate P.

Let's first concentrate on the problem of labeling the cluster.

We are given a configuration or state stored in a one-dimensional array:
 

s(1), ... , s(2), ... , s(3), ... , s(N-1)
N is some fixed number, which is the total number of sites of the system.

Our task is to give a label to each site, such that each cluster has a unique label, as indicated in the figure:
 


 

This is more or less a standard problem of graph traversal.

There are at least two basic algorithms to the problem. We look at the simplest one:
 

Recursive function calls

 
Algorithm: Cluster identification with recursive function.

#define Z 4                                              the coordination number is twice the dimension
#define L 16                                            the system linear size
#define N (L*L)                                     the total number of sites (two-dimensional)
 

void label(int s[], int number[])        This function goes over each site
{                                                                   and checks whether the site has been labeled.
   int i, inc = 0;                                         If it is not and is not empty,
                                                                      it calls color to label the site
   for(i = 0; i < N; ++i)                           and those sites connected to it.
   number[i] = 0;
   for(i = 0; i < N; ++i)
     if(s[i] == 1 && number[i] == 0)
       color(s, number, i, ++inc);
}
 

void color(int s[], int number[], int i, int inc)   The  color function labels
{                                                                                        the site  i if it is occupied
   int nn[Z], j;                                                                and labels its neighbors too.
                                                                                          The function is used recursively.

   if (s[i] == 1 && number[i] == 0 ) {                Empty sites have label zero.

number[i] = inc;
neighbor(i, nn);
for(j = 0; j < Z; ++j)
  color(s, number, nn[j], inc);
    }
}
 
 

void neighbor(int i, int nn[ ])                                   Find the neighbors of site i
{                                                                                         Sites are named by number
                                                                                           from 0 to N-1
  int j, r, p, q;                                                      This function works for any
                                                                                           hypercubic lattice.
                                                                                           The Z neighbors of site i
                                                                                           are returned

  r = i;                                                                   in the array nn
  p = 1 - L;
  q = 1;
  for(j = 0; j < Z; j += 2) {
       nn[j] = (r + 1) % L == 0 ? i + p : i + q;
       nn[j+1] = r % L == 0 ? i - p : i - q;
       r = r/L;
       p *= L;
       q *= L;
   }
}
 
 
 

(3). Physical quantities to calculate and finite-size effects:
 
What to calculate in a typical simulation is a difficult decision you have to make.

General guidelines:

 

Physical quantities for the site percolation problem:
 

 

Finite size effect on these quantities:
 

Since we cannot simulate an infinitely large system on a finite memory computer in finite time, we must do some extrapolation:
 
Calculating everything on a finite system, observing the changes of quantities as the system size changes, and finally trying to predict the value at infinite-size limit.
The finite-size scaling theory deals with this problem. Fortunately, many quantities tend to their infinite-size limits very quickly.

We can choose a large system (say L=100 or even 1000) to overcome the finite-size effect.

Is this good enough?

In many cases, simply taking a large system does not work.

What is a good estimator for P from a computational point of view?

We can approximate P as:
 

PL =  <smax> /Ld
where smax is the size of the largest cluster in the system.

The percolation probability is

P = limL-> infinity  PL .
 

Let us check on PL:
 

For p < pc we have:

P = 0,
PL approaches zero as L- d/2.

 
At the percolation threshold p=pc,
the finite-size scaling theory predicts

PL ~  L-beta/nu ->  0.

 
Above the percolation threshold p > pc,
PL approaches the infinite size limit P exponentially:

P - PL ~ exp(- c Ld)

 
 
The average cluster size can be calculated from:
 
S = 1/(p - P) Ld  sumi'  ci2

where ci is the size of i-th cluster in the system.

The summation is carried out over all clusters except the largest, indicated by the prime '. Note that P = 0 for p < pc.

 
(4). Generation of random configurations:
A simulation procedure goes as follows:
  How to generate a site with occupency probability p?
 
This is achieved by comparing p with a uniformly distributed random number r.

If r < p, we occupy the site; otherwise we leave it empty.

The rest of the process:
 
When all of the Ld sites on the lattice have assigned values, we start to analyze the configuration.

This is one Monte Carlo step. When the analysis of the configuration is finished, we forget or erase the old configuration and generate a new configuration and repeat the process.

In each Monte Carlo step, we obtain some value Ai.

Our method is in fact a Monte Carlo method in its simplest form, namely a naive simple sampling method (as oppose to importance sampling method, e.g. Metropolis algorithm).
 

The average of any quantity to be calculated is:
 
<A> = 1/N  sumi=1, ..., N Ai,
 
after N Monte Carlo steps.
 
 
 
(5). Accuracy of simulation method and statistical errors:
 
 
Why do we bother to generate many configurations?
 
The reason is accuracy.
The Monte Carlo method is a lousy method in terms of accuracy.

In order that the numerical estimate <A> approaches the exact value A, you should do infinite number of steps.
 

N  -> infinity.
Unfortunately, we don't have infinite amount of computer time, and we cannot afford to wait. Thus the result you obtained is accurate up to an error of order N-1/2
 

Statistical error:

<A> - sigma/N-1/2   < A  <    <A> +sigma/sqrt{N}

when N >>1

where we should have written <A> as < A >N since it is the estimate using N samples.

The quantity sigma2 is the variance of the random variable Ai:

sigma2 = <A2> - <A >2 .

The actual value depends on the nature of Ai.

The above formulae are the basic results of probability theory. It tells us that if you want to improve your result by one more decimal digit, you'd better increase your computer efforts by a 100-fold.

It appears that Monte Carlo method is a very inefficient method.

But this method sometimes is the only method practical.

Do we have error-free alternative?

Taking our percolation problem again as an example, we can calculate the quantity A without error by the formula (assuming no roundoff error in your calculation)

<A> = sumall conf. A({n(i,j)}) pm (1 - p)Ld - m

where Ld=Ld.

Unfortunately, there are just too many terms in the summation. It is impractical to use this formula!

 
Since each site has two states, either empty or occupied, the total number of possible configurations is 2Ld. If we have a system of 10 X 10, the number of configurations is 2100 = 1030.

Suppose that your program can run at a speed of one microsecond for each site, (including generating the state and performing multiplication), we can finish one configuration in 10-4 second. It takes 1026 seconds to finish the sum over all configurations. Since a year is about 3 X 107 seconds, it takes about 1018 years. The age of the Universe is believed to be 1010 years. If you had already started your calculation when the Universe was born, you would only have finished 1/108-th of the whole project by now!

Thus, we cannot even try a small size like 10 X 10 by a brute force exact enumeration. The largest size you can do this way is probably 4 X 4 or 5 X 5.

Because of the finite-size effect, small size results are not very useful anyway. Now you should believe that Monte Carlo method is very powerful.
 
 
(6). Debugging

Your simulation result is meaningless if your program does not work correctly. So, it is absolutely important that your program is bug free.

Beside studying your code carefully, you can test in various ways.
 

If the problem under study has exact solution in certain cases, use the exact result to test your program.
that is, use simplest possible method, and straightforward algorithm. In a typical production program, efficiency is a major concern. But here we sacrifice efficiency for correctness of the program.
 
More concretely, add a line of statement:
 
assert(condition);
where condition is some logical condition you want to check.

assert() is a macro definition in <assert.h>.

The assertion code can be turned off if you add:

#define NDEBUG

at the beginning of your program.
 
 

What to  check in your program?

Example:

index i should be greater than or equal to zero but smaller than L.
 
Homework