[mvapich-discuss] Problems running MPI jobs with large (?) numbers of processors

Webb, Michael Michael.Webb at atk.com
Tue Jan 23 10:03:44 EST 2007


Sayantan,

Sorry it took so long; I was managing other projects.

The C version does not work correctly; it has the same symptoms of error
as the C++ version. The FORTRAN version works fine.

Here's the C code I implemented:

#include "mpi.h"
#include <stdio.h>

int main ( int argc, char *argv[] )

{
    MPI_Init(&argc, &argv);

    printf(".");

    MPI_Barrier(MPI_COMM_WORLD);

    MPI_Finalize();


}



Michael 

> -----Original Message-----
> From: Sayantan Sur [mailto:surs at cse.ohio-state.edu] 
> Sent: Friday, January 19, 2007 1:20 PM
> To: Webb, Michael
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] Problems running MPI jobs with 
> large (?) numbers of processors
> 
> Michael,
> 
> > The original (much more complicated) code that spurred this report 
> > does have an MPI::Finalize() at the end. I wrote the simple example 
> > code to illustrate the problem. I neglected to include 
> MPI::Finalize() 
> > in the sample program, but that oversight isn't important, at least 
> > operationally speaking, it doesn't matter. I just added it 
> to the C++ 
> > code and I get the same problem--the code quits before getting past 
> > MPI::Init(), and produces no output.
> 
> I'm just wondering -- does the C version work correctly? 
> Ideally, there shouldn't be any difference at all based on 
> which language you used to write that very simple code snippet.
> 
> If the problem continues to show up only with C++ code, could 
> you check if all the nodes have the same version of C++ 
> libraries installed?
> 
> > This leads me to believe that there is an issue with the 
> comm backbone 
> > of the cluster, but our cluster administrators assure me 
> this is not 
> > the case. I am new to cluster work and have no idea how to prove or 
> > disprove their contention.
> 
> You could try to run the Intel MPI benchmarks on this cluster 
> to see if large runs with lot of communication are able to 
> execute successfully.
> Please let us know if this works on your cluster.
> 
> Thanks,
> Sayantan.
> 
> --
> http://www.cse.ohio-state.edu/~surs
> 



More information about the mvapich-discuss mailing list