[mvapich-discuss] Errors spawning processes with mpirun_rsh

Rafael Arco Arredondo rafaarco at ugr.es
Mon Feb 23 12:08:35 EST 2009


Hi Jaidev,

Thank you for your prompt reply.

> The message indicates that the application terminated  with a non zero 
> error code or crashed after launching. Can you check if it leaves any 
> core files? You may need to set ulimit to  unlimited. For example, add 
> ulimit -c unlimited in your ~/.bashrc.

Yes, a core file is generated after adding 'ulimit -c unlimited' to
$HOME/.bashrc.

> Can you also give us details of the cluster and any options you've 
> enabled with MVAPICH / MVAPICH2?

It is a cluster of servers with AMD64 Opteron processors, an Infiniband
network and Sun Grid Engine 6.2 as batch scheduler (anyway this error is
reported both when SGE controls the jobs and when it doesn't, when
mpirun_rsh is directly executed from the command line).

In order to compile MVAPICH, the PathScale compiler was used (for which
the make.mvapich.gen2 script was accordingly edited), shared library
support was enabled and the flag -DXRC was removed. The rest of the
options, including the configuration files in $MVAPICH_HOME/etc, wasn't
modified (i.e., default values are used).

As for MVAPICH2, it was compiled by invoking the configure script this
way:

./configure --enable-sharedlibs=gcc CC=pathcc F77=pathf90 F90=pathf90
CXX=pathCC

And then plain 'make' and 'make install'. Again, the other options
weren't changed.

MVAPICH and MVAPICH2 compile with no problems, so do programs compiled
with mpicc. However, programs crash on the initialization stage after
launching as you said.

Any ideas?

Thanks again,

Rafa

> On 02/23/2009 04:45 AM, Rafael Arco Arredondo wrote:
> > Hello,
> > 
> > I'm having some issues with mpirun_rsh within both MVAPICH 1.1 and
> > MVAPICH2 1.2p1. As I commented in another email to the list some time
> > ago, mpirun_rsh is the only mechanism we can use to create MPI processes
> > in our configuration.
> > 
> > The command issued is:
> > mpirun_rsh -ssh -np 2 -hostfile ./machines ./mpihello
> > 
> > And the error reported by mpirun_rsh is:
> > 
> > Exit code -5 signaled from localhost
> > MPI process terminated unexpectedly
> > Killing remote processes...DONE
> > 
> > We also got this on some of our machines:
> > 
> > Child exited abnormally!
> > Killing remote processes...DONE
> > 
> > mpihello is a simple hello world and this happens even when the
> > processes are launched on localhost only.
> > 
> > OFED 1.2 is used as the underlying Infiniband libraries, and both
> > MVAPICH and MVAPICH2 were compiled with the OpenFabrics/Gen2 single-rail
> > option, without XRC as indicated in the user's guide for OFED libraries
> > prior to version 1.3.
> > 
> > Any help will be kindly appreciated.
> > 
> > Thank you in advance,
> > 
> > Rafa
> >
> > 




More information about the mvapich-discuss mailing list