* A note about mid-term exam
1:
Exam question:
Write a C code with MPI for matrix-matrix multiplication A*B. Print the resulting matrix on processor 0 (source computer) and print its first row on processor 1, 2nd row on processor 2, 3rd row on processor 3, etc. Here A and B are N by N matrix.
Common Problems:
How to design a parallel computation by using MPI?
1. Load and initiate MPI:
2. Distribute the job to multiple processors:
Load MPI:
#include "mpi.h"Initiate MPI:
MPI_Init(&argc,&argv);Get to know your computer clusters (the number of processors):
MPI_Comm_size(MPI_COMM_WORLD,&numtasks);Asign an ID to each processor:MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
Use of if statement:
if (taskid == MASTER)
{
for (dest=1; dest<=numworkers; dest++)
{
rows = (dest <= extra) ? averow+1 : averow;
printf(" sending %d rows to task %d\n",rows,dest);
MPI_Send(&offset, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&a[offset][0], rows*NCA, MPI_DOUBLE, dest, mtype,
MPI_COMM_WORLD);
MPI_Send(&b, NCA*NCB, MPI_DOUBLE, dest, mtype,
MPI_COMM_WORLD);
offset = offset + rows;
}
}
if (taskid > MASTER)
{
mtype = FROM_MASTER;
MPI_Recv(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD,
&status);
MPI_Recv(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD,
&status);
MPI_Recv(&a, rows*NCA, MPI_DOUBLE, MASTER, mtype,
MPI_COMM_WORLD, &status);
MPI_Recv(&b, NCA*NCB, MPI_DOUBLE, MASTER, mtype,
MPI_COMM_WORLD, &status);for (k=0; k<NCB; k++)
for (i=0; i<rows; i++)
{
c[i][k] = 0.0;
for (j=0; j<NCA; j++)
c[i][k] = c[i][k] + a[i][j] * b[j][k];
}
mtype = FROM_WORKER;
MPI_Send(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
MPI_Send(&c, rows*NCB, MPI_DOUBLE, MASTER, mtype,
MPI_COMM_WORLD);
}
MPI_Finalize();
}
3. Return the result to source computer (processor 0)
and post-process to result:
Use of if statement:
if (taskid == MASTER)
{
for (i=1; i<=numworkers; i++)
{
source = i;
MPI_Recv(&offset, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&rows, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&c[offset][0], rows*NCB, MPI_DOUBLE, source, mtype,
MPI_COMM_WORLD, &status);
}
}
if (taskid > MASTER)
{
mtype = FROM_WORKER;
MPI_Send(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
MPI_Send(&c, rows*NCB, MPI_DOUBLE, MASTER, mtype,
MPI_COMM_WORLD);
}
MPI_Finalize();
}
4. Output the results:
To print the same results on every processor:
printf("Here is the result matrix\n");
for (i=0; i<NRA; i++)
{
printf("\n");
for (j=0; j<NCB; j++)
printf("%6.2f ", c[i][j]);
}
printf ("\n");
To print different results on different proceesors:
Use
of if statement:
if (taskid == MASTER)
{
printf("Here is the result matrix\n");}
for (i=0; i<NRA; i++)
{
printf("\n");
for (j=0; j<NCB; j++)
printf("%6.2f ", c[i][j]);
}
printf ("\n");
if (taskid == MASTER)
{/* print each the taskid-th row of c */}
Sample
program (as an example answer to the exam question).
Preview of Collective Communications
in MPI.
Synchronization:Homework 4.MPI_Barrier(MPI_Comm comm)
Function blocks untill all processes (in comm) have reached this routine (i.e, have called it).
Data movement:
Schematic representation of collective data movement in MPI
Collective computations:
Schematic representation of collective data movement in MPI
MPI Collective Routines
MPI_AllgatherAll versions deliver results to all participating processes.MPI_Allgatherv
MPI_Allreduce
MPI_Alltoall
MPI_Alltoallv
MPI_Bcast
MPI_Gather
MPI_Gatherv
MPI_Reduce
MPI_ReduceScatter
MPI_Scan
MPI_Scatter
MPI_Scatterv
V versions allow the chunks to have different sizes.
Allreduce, Reduce, ReduceScatter, and Scan take both built-in and user-defined combination functions.