[mvapich-discuss] Problems running MPI jobs with large (?) numbers of processors

Sayantan Sur surs at cse.ohio-state.edu
Fri Jan 19 15:20:28 EST 2007


Michael,

> The original (much more complicated) code that spurred this report does
> have an MPI::Finalize() at the end. I wrote the simple example code to
> illustrate the problem. I neglected to include MPI::Finalize() in the
> sample program, but that oversight isn't important, at least
> operationally speaking, it doesn't matter. I just added it to the C++
> code and I get the same problem--the code quits before getting past
> MPI::Init(), and produces no output.

I'm just wondering -- does the C version work correctly? Ideally, there
shouldn't be any difference at all based on which language you used to
write that very simple code snippet.

If the problem continues to show up only with C++ code, could you check
if all the nodes have the same version of C++ libraries installed?

> This leads me to believe that there is an issue with the comm backbone
> of the cluster, but our cluster administrators assure me this is not
> the case. I am new to cluster work and have no idea how to prove or
> disprove their contention.

You could try to run the Intel MPI benchmarks on this cluster to see if
large runs with lot of communication are able to execute successfully.
Please let us know if this works on your cluster.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs


More information about the mvapich-discuss mailing list