Part II. USING MPI
 
3. How to design a parallel computation by using MPI?
 
 

* A note about mid-term exam 1:
 

Exam question:
 

Write a C code with MPI for matrix-matrix multiplication A*B. Print the resulting matrix on processor 0 (source computer) and print its first row on processor 1, 2nd row on processor 2, 3rd row on processor 3, etc. Here A and B are N by N matrix.
 

Common Problems:
 

 

How to design a parallel computation by using MPI?
 
 

   1. Load and initiate MPI:

 
Load MPI:
 
#include "mpi.h"
Initiate MPI:
 
MPI_Init(&argc,&argv);
Get to know your computer clusters (the number of processors):
 
MPI_Comm_size(MPI_COMM_WORLD,&numtasks);
Asign an ID to each processor:
MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
 
 
   2. Distribute the job to multiple processors:
 
Use of if statement:
 
  if (taskid == MASTER)
   {
      for (dest=1; dest<=numworkers; dest++)
      {
         rows = (dest <= extra) ? averow+1 : averow;
         printf("   sending %d rows to task %d\n",rows,dest);
         MPI_Send(&offset, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
         MPI_Send(&rows, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
         MPI_Send(&a[offset][0], rows*NCA, MPI_DOUBLE, dest, mtype,
                   MPI_COMM_WORLD);
         MPI_Send(&b, NCA*NCB, MPI_DOUBLE, dest, mtype,
                           MPI_COMM_WORLD);
         offset = offset + rows;
      }
   }
 
 if (taskid > MASTER)
   {
      mtype = FROM_MASTER;
      MPI_Recv(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD,
                          &status);
      MPI_Recv(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD,
                          &status);
      MPI_Recv(&a, rows*NCA, MPI_DOUBLE, MASTER, mtype,
                          MPI_COMM_WORLD, &status);
      MPI_Recv(&b, NCA*NCB, MPI_DOUBLE, MASTER, mtype,
                          MPI_COMM_WORLD, &status);

      for (k=0; k<NCB; k++)
         for (i=0; i<rows; i++)
         {
            c[i][k] = 0.0;
            for (j=0; j<NCA; j++)
               c[i][k] = c[i][k] + a[i][j] * b[j][k];
         }
      mtype = FROM_WORKER;
      MPI_Send(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
      MPI_Send(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
      MPI_Send(&c, rows*NCB, MPI_DOUBLE, MASTER, mtype,
                          MPI_COMM_WORLD);
   }
   MPI_Finalize();
}

 

   3. Return the result to source computer (processor 0) and post-process to result:
 
 

Use of if statement:
 
   if (taskid == MASTER)
   {
      for (i=1; i<=numworkers; i++)
      {
         source = i;
         MPI_Recv(&offset, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
         MPI_Recv(&rows, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
         MPI_Recv(&c[offset][0], rows*NCB, MPI_DOUBLE, source, mtype,
                            MPI_COMM_WORLD, &status);
      }
    }
 
 
 if (taskid > MASTER)
   {
 
      mtype = FROM_WORKER;
      MPI_Send(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
      MPI_Send(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
      MPI_Send(&c, rows*NCB, MPI_DOUBLE, MASTER, mtype,
                          MPI_COMM_WORLD);
   }
   MPI_Finalize();
}
 

   4. Output the results:
 

    To print the same results on every processor:
 

      printf("Here is the result matrix\n");
      for (i=0; i<NRA; i++)
      {
         printf("\n");
         for (j=0; j<NCB; j++)
            printf("%6.2f   ", c[i][j]);
      }
      printf ("\n");
 

    To print different results on different proceesors:
 

           Use  of if statement:
 

   if (taskid == MASTER)
   {
 
printf("Here is the result matrix\n");
      for (i=0; i<NRA; i++)
      {
         printf("\n");
         for (j=0; j<NCB; j++)
            printf("%6.2f   ", c[i][j]);
      }
      printf ("\n");
    }
 

 if (taskid == MASTER)
   {

/* print each the taskid-th row of c */
    }
 
 
 
 
 

Sample program (as an example answer to the exam question).
 
 
 
 
Preview of Collective Communications in MPI. 

 Three classes of collective operations:
   
Synchronization:
MPI_Barrier(MPI_Comm comm)
 
Function blocks untill all processes (in comm) have reached this routine (i.e, have called it).
 
 

Data movement:

Schematic representation of collective data movement in MPI


 
 
 
 

Collective computations:
 

Schematic representation of collective data movement in MPI


 
 

MPI Collective Routines
 
 

MPI_Allgather

MPI_Allgatherv

MPI_Allreduce

MPI_Alltoall

MPI_Alltoallv

MPI_Bcast

MPI_Gather

MPI_Gatherv

MPI_Reduce

MPI_ReduceScatter

MPI_Scan

MPI_Scatter

MPI_Scatterv

All versions deliver results to all participating processes.

 V versions allow the chunks to have different sizes.

Allreduce, Reduce, ReduceScatter, and Scan take both built-in and user-defined combination functions.
 
 

Homework 4.