[mvapich-discuss] mvapich2 and openmp

Ranjit Noronha noronha at cse.ohio-state.edu
Tue May 29 15:59:57 EDT 2007


Hi Dominique,

Thanks for sending the code. We built mvapich2-0.9.8p2 with --enable-threads=multiple. We modified your program a bit to measure the loop time. I have attached
the modified version. The program was compiled with icc version 9.1 as follows:

[bash: noronha at i2-2 /tmp/mvapich2-0.9.8p2/osu_benchmarks]$pwd
/tmp/mvapich2-0.9.8p2/osu_benchmarks

icc -O3   -openmp ./openmp.c -lmpich -lpthread -libverbs -libumad -I../src/include -L../lib -L /usr/local/ofed/lib64/
./openmp.c(42) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.

The program was run on a dual core four CPU Intel Clovertown (8 cores). We got the 
following results:

Number of threads = 8 Loop time: 0.43 s
Number of threads = 4 Loop time: 0.84 s
Number of threads = 2 Loop time: 1.68 s
Number of threads = 1 Loop time: 3.37 s

This seems to indicate that there is no serialization happening. top
shows that the cores are being utilized 100% when the loop is running. 
I have attached the complete trace of the output.

What kind of system are you using? Is it an Intel or Opteron based system.

thanks,
--ranjit


> >
> >>I am trying to use mvapich2-0.9.8p2, built with 
> >>--enable-threads=multiple, with a program where the various threads
> >>are created by OpenMP. When running the program, the threads
> >>are correctly created and executed, but always SEQUENTIALLY
> >>(each node is a two dual-core processors, and could thus accomodate
> >>4 threads). When threads are created manually (not with OpenMP),
> >>everything works fine. Any idea of what could be wrong?
> >>Each node runs a Fedora Core 5 distribution with a SMP 2.6.21.3
> >>linux kernel (similar behaviour observed for various kernels).
> >>I tried with gcc, version 4.2, and the Intel C compiler, version 9.1,
> >>with similar results.
> >>
> >
> >Can you let us know the application you are using? If possible can we get 
> >a copy of this application. 
> >
> 
> The application is very simple and short, it is only for diagonising
> what the problem might be.
> Here it is:
> 
> 
> ---------------------------------------
> #include "mpi.h"
> #include <stdio.h>
> #include <math.h>
> #include <omp.h>
> 
> int main( int argc, char *argv[])
> {
>   int numprocs,myid;
>   int  namelen,provided;
>   int n,nit,mythread;
>   int i,it,j,k;
>   double x,y;
>   char processor_name[MPI_MAX_PROCESSOR_NAME];
>   FILE *pipo;
>   MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provided);
>   fprintf(stderr,"%d %d %d %d 
> %d\n",MPI_THREAD_SINGLE,MPI_THREAD_FUNNELED,MPI_THREAD_SERIALIZED,MPI_THREAD_MULTIPLE,provided);
>   MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
>   MPI_Comm_rank(MPI_COMM_WORLD,&myid);
>   MPI_Get_processor_name(processor_name,&namelen);
>   fprintf(stderr,"Process %d on %s\n",myid, processor_name);
>   if (myid==0) {
>     pipo=fopen("pipo","r");
>     fscanf(pipo,"%d", &n);
>     fscanf(pipo,"%d", &nit);
>     fprintf(stderr,"%d %d\n",n,nit);
>   }
>   MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
>   MPI_Bcast(&nit, 1, MPI_INT, 0, MPI_COMM_WORLD);
>   MPI_Barrier(MPI_COMM_WORLD);
>   x=0.0;
>   y=1.5*(n+1.0);
>   omp_set_num_threads(4);
> #pragma omp parallel for default(shared) private(it,mythread,x,i,j,k)
>   for (it=1;it<=nit;it++) {
>     for (i=1;i<=n;i++) {
>       for (j=1;j<=n;j++) {
>         for (k=1;k<=n;k++) {
>           x+=i+j+k+it+myid-y;
>         }
>       }
>     }
>     x=x/(n*n*n);
>     mythread=omp_get_thread_num();
>     fprintf(stderr,"%d %d %d %lf\n",myid,mythread,it,x);
>   }
>   MPI_Finalize();
>   return 0;
> }
> -------------------------------------
> 
> As you can see, it is a simple loop which can be ran in parallel
> over the index "it".
> I also tried to set OMP_NUM_THREADS to 4 and commenting out the
>   omp_set_num_threads(4);
> line, with the same result.
> The program is compiled with the "-openmp" option.
> The output of the first print is that the variable
> "provided" has value "3", that is MPI_THREAD_MULTIPLE, as expected.
> Configuring mvapich2 with --enable-threads=funneled for example,
> gives "provided" equal to "1", as expected, but not more parallelism.
> 
> 
> >To allow us to better diagnose the problem can you also give us some 
> >details
> >about the OpenMP parallel directives you are using to create the threads 
> >in your program?  Did you set the environment variable OMP_NUM_THREADS to
> >4?
> >
> >
> >Also, what kind of operations are you using in the parallel
> >sections? Do you have any critical sections or locks in the parallel 
> >section
> >that might be serializing things?
> >
> >A code snippet of the OpenMP parallel sections as well the version where 
> >threads are created manually will be helpful.
> >
-------------- next part --------------

#include "mpi.h"
#include <stdio.h>
#include <math.h>
#include <omp.h>


int main( int argc, char *argv[])
{
  int ki;
  int numprocs,myid=0;
  int  namelen,provided;
  int n,nit,mythread;
  int i,it,j,k;
  double x,y;
  double latency,t_start,t_end;
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  FILE *pipo;
  MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE, &provided);
  fprintf(stderr,"%d %d %d %d %d\n",MPI_THREAD_SINGLE,MPI_THREAD_FUNNELED,MPI_THREAD_SERIALIZED,MPI_THREAD_MULTIPLE,provided);
  MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD,&myid);
  MPI_Get_processor_name(processor_name,&namelen);
  fprintf(stderr,"Process %d on %s\n",myid, processor_name);
  if (myid==0) {
#if 0
    pipo=fopen("pipo","r");
    fscanf(pipo,"%d", &n);
    fscanf(pipo,"%d", &nit);
#endif
  n=100;nit=1000;
    fprintf(stderr,"%d %d\n",n,nit);
  }
  MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
  MPI_Bcast(&nit, 1, MPI_INT, 0, MPI_COMM_WORLD);
  MPI_Barrier(MPI_COMM_WORLD);
  x=0.0;
  y=1.5*(n+1.0);
  for(ki=8;ki>=1;ki=ki/2){
  omp_set_num_threads(ki);
  if(myid==0) t_start = MPI_Wtime();
#pragma omp parallel for default(shared) private(it,mythread,x,i,j,k)
  for (it=1;it<=nit;it++) {
    for (i=1;i<=n;i++) {
      for (j=1;j<=n;j++) {
        for (k=1;k<=n;k++) {
          x+=i+j+k+it+myid-y;
        }
      }
    }
    x=x/(n*n*n);
    mythread=omp_get_thread_num();
    fprintf(stderr,"%d %d %d %lf\n",myid,mythread,it,x);
  }
  if (myid==0) {
  t_end = MPI_Wtime();
  latency = (t_end - t_start);
  fprintf(stdout, "Number of threads = %d Loop time: %0.2f s \n",ki,  latency);
  }
  }
  MPI_Finalize();
  return 0;
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: openmp.out.gz
Type: application/x-gzip
Size: 15748 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070529/fd3519f2/openmp.out-0001.bin


More information about the mvapich-discuss mailing list