Simulation techniques:
What is the problem?The most important part is physical idea and insight.Mathematical model:Physical problem to Computer language starts from a mathematical formula.Computer algorithm:
There are many ways to send an instruction. The most efficient one saves you time and money!
- Master the computer languages (C, Fortran etc.).
- Efficient instruction, experience is essential.
Example: Random number generatorMathematical formula:
Xn+1 = (a Xn + c) mod m, n > 0,
Efficient C algorithm by choosing m=2e:
x = a * x + c;
Any mistakes in your instructions?
Debugging tools and methods.What is the conclusion?
Analysis and interpretation of output data.
Computer Simulation of Site Percolation
Issues of algorithm development:
- End effect
- Labelling scheme
- Formula for physical quantities
- Generation of random configurations
- Statistical errors
- Debugging
(1). Periodic boundary condition:Periodic boundary condition is very often used when dealing with a finite system.
- Usually reduces the influence of finite system.
- But it can not really eliminate the effect of finite sizes.
In one dimension:
let us label the sites as 1, 2, 3, ... L. Then the left neighbor of site 1 is site L, and the right neighbor of L is 1. You can view the system as on a ring, rather than on a finite straight line segment.
In two or higher dimensions:Each direction is treated just like that of one dimension.
In two dimensions, the topology due to periodic boundary conditions is that of a torus.
(2). Cluster labeling algorithmsAlgorithm: Cluster identification with recursive function.
Percolation deals with clusters in a random pattern. Given a configuration which specifies the occupation of sites, we need to know which set of sites forms a single cluster.If we count the clusters of size s, we can estimate ns. The maximum size s can be used to estimate P.
Let's first concentrate on the problem of labeling the cluster.
We are given a configuration or state stored in a one-dimensional array:
s(1), ... , s(2), ... , s(3), ... , s(N-1)N is some fixed number, which is the total number of sites of the system.Our task is to give a label to each site, such that each cluster has a unique label, as indicated in the figure:
This is more or less a standard problem of graph traversal.
There are at least two basic algorithms to the problem. We look at the simplest one:
Recursive function calls
- Tends to be short and elegant (easy to program)
- Less efficient both in computer time and memory (because of the overhead in handling the stack)
#define Z 4
the coordination number is twice the dimension
#define L 16
the system linear size
#define N (L*L)
the total number of sites (two-dimensional)
void label(int s[], int number[])
This function goes over each site
{
and checks whether the site has been labeled.
int i, inc = 0;
If it is not and is not empty,
it calls color to label the site
for(i = 0; i <
N; ++i)
and those sites connected to it.
number[i] = 0;
for(i = 0; i <
N; ++i)
if(s[i]
== 1 && number[i] == 0)
color(s, number, i, ++inc);
}
void color(int s[], int number[],
int i, int inc) The
color function labels
{
the site i if it is occupied
int nn[Z], j;
and labels its neighbors too.
The function is used recursively.
if (s[i] == 1 && number[i] == 0 ) { Empty sites have label zero.
number[i] = inc;}
neighbor(i, nn);
for(j = 0; j < Z; ++j)
color(s, number, nn[j], inc);
void neighbor(int i, int nn[ ])
Find the neighbors of site i
{
Sites are named by number
from 0 to N-1
int j, r, p, q;
This function works for any
hypercubic lattice.
The Z neighbors of site i
are returned
r = i;
in the array nn
p = 1 - L;
q = 1;
for(j = 0; j < Z; j +=
2) {
nn[j] = (r + 1) % L == 0 ? i + p : i + q;
nn[j+1] = r % L == 0 ? i - p : i - q;
r = r/L;
p *= L;
q *= L;
}
}
(3). Physical quantities to calculate and finite-size effects:
What to calculate in a typical simulation is a difficult decision you have to make.General guidelines:
- You don't want to handle too many data.
- In cases you don't know which quantity is the most appropriate to the problem at hand, you have to make a compromise.
Physical quantities for the site percolation problem:
- cluster number function ns
- percolation probability P
- the average cluster size S
- pair-correlation function g(r)
Finite size effect on these quantities:
Since we cannot simulate an infinitely large system on a finite memory computer in finite time, we must do some extrapolation:
Calculating everything on a finite system, observing the changes of quantities as the system size changes, and finally trying to predict the value at infinite-size limit.The finite-size scaling theory deals with this problem. Fortunately, many quantities tend to their infinite-size limits very quickly.We can choose a large system (say L=100 or even 1000) to overcome the finite-size effect.
Is this good enough?
In many cases, simply taking a large system does not work.
What is a good estimator for P from a computational point of view?
We can approximate P as:
PL = <smax> /Ldwhere smax is the size of the largest cluster in the system.The percolation probability is
P = limL-> infinity PL .
Let us check on PL:
For p < pc we have:P = 0,
PL approaches zero as L- d/2.At the percolation threshold p=pc,
the finite-size scaling theory predictsPL ~ L-beta/nu -> 0.
Above the percolation threshold p > pc,
PL approaches the infinite size limit P exponentially:P - PL ~ exp(- c Ld)
(4). Generation of random configurations:The average cluster size can be calculated from:
S = 1/(p - P) Ld sumi' ci2where ci is the size of i-th cluster in the system.
The summation is carried out over all clusters except the largest, indicated by the prime '. Note that P = 0 for p < pc.
A simulation procedure goes as follows:
How to generate a site with occupency probability p?
- We first generate a random percolation configuration by assigning 1 for occupation, and 0 for empty at each site.
- Compute physical quantities from this configuration and store them in some places.
- Generate the next configuration and repeat the computation and storage.
- After N configurations, compute the average and statistical errors etc.
This is achieved by comparing p with a uniformly distributed random number r.The rest of the process:If r < p, we occupy the site; otherwise we leave it empty.
When all of the Ld sites on the lattice have assigned values, we start to analyze the configuration.The average of any quantity to be calculated is:This is one Monte Carlo step. When the analysis of the configuration is finished, we forget or erase the old configuration and generate a new configuration and repeat the process.
In each Monte Carlo step, we obtain some value Ai.
Our method is in fact a Monte Carlo method in its simplest form, namely a naive simple sampling method (as oppose to importance sampling method, e.g. Metropolis algorithm).
<A> = 1/N sumi=1, ..., N Ai,after N Monte Carlo steps.
(5). Accuracy of simulation method and statistical errors:
Why do we bother to generate many configurations?
The reason is accuracy.The Monte Carlo method is a lousy method in terms of accuracy.In order that the numerical estimate <A> approaches the exact value A, you should do infinite number of steps.
N -> infinity.Unfortunately, we don't have infinite amount of computer time, and we cannot afford to wait. Thus the result you obtained is accurate up to an error of order N-1/2
Statistical error:
<A> - sigma/N-1/2 < A < <A> +sigma/sqrt{N}
when N >>1
where we should have written <A> as < A >N since it is the estimate using N samples.
The quantity sigma2 is the variance of the random variable Ai:
sigma2 = <A2> - <A >2 .
The actual value depends on the nature of Ai.
The above formulae are the basic results of probability theory. It tells us that if you want to improve your result by one more decimal digit, you'd better increase your computer efforts by a 100-fold.
It appears that Monte Carlo method is a very inefficient method.
But this method sometimes is the only method practical.
Do we have error-free alternative?
Taking our percolation problem again as an example, we can calculate the quantity A without error by the formula (assuming no roundoff error in your calculation)
<A> = sumall conf. A({n(i,j)}) pm (1 - p)Ld - m
where Ld=Ld.
Unfortunately, there are just too many terms in the summation. It is impractical to use this formula!
Since each site has two states, either empty or occupied, the total number of possible configurations is 2Ld. If we have a system of 10 X 10, the number of configurations is 2100 = 1030.Suppose that your program can run at a speed of one microsecond for each site, (including generating the state and performing multiplication), we can finish one configuration in 10-4 second. It takes 1026 seconds to finish the sum over all configurations. Since a year is about 3 X 107 seconds, it takes about 1018 years. The age of the Universe is believed to be 1010 years. If you had already started your calculation when the Universe was born, you would only have finished 1/108-th of the whole project by now!
Thus, we cannot even try a small size like 10 X 10 by a brute force exact enumeration. The largest size you can do this way is probably 4 X 4 or 5 X 5.
Because of the finite-size effect, small size results are not very useful anyway. Now you should believe that Monte Carlo method is very powerful.
(6). DebuggingYour simulation result is meaningless if your program does not work correctly. So, it is absolutely important that your program is bug free.
Beside studying your code carefully, you can test in various ways.
- Test it against known result.
If the problem under study has exact solution in certain cases, use the exact result to test your program.
- Try to calculate exact result on small systems (like 4 X 4), and compare it with simulation result.
- Look for special limit at which series-expansion solution is easily obtainable. Use analytical result (may be approximate) to compare with simulation result.
- Write a separate, independent simulation program in a way to avoid bugs
that is, use simplest possible method, and straightforward algorithm. In a typical production program, efficiency is a major concern. But here we sacrifice efficiency for correctness of the program.
- Try different random number generators.
- Write your code defensively.
More concretely, add a line of statement:What to check in your program?
assert(condition);where condition is some logical condition you want to check.assert() is a macro definition in <assert.h>.
The assertion code can be turned off if you add:
#define NDEBUG
at the beginning of your program.
Example:
index i should be greater than or equal to zero but smaller than L.