[mvapich-discuss] MVAPICH2 issue with CH3 gen2 channel

Juan Vercellone juanjov at gmail.com
Tue May 3 07:10:46 EDT 2011


Hello, list.
I am having some trouble executing my MPI applications with MVAPICH2
v1.6 (latest stable release).

This is what I get when launching my programs:
mpirun_rsh -np 4 -hostfile ./hostso mat_prod 3
MPI process (rank: 1) terminated unexpectedly on compute-0-4.local
MPI process (rank: 0) terminated unexpectedly on compute-0-3.local
child_handler: Error in init phase...wait for cleanup! (1/2mpispawn connections)
child_handler: Error in init phase...wait for cleanup! (1/2mpispawn connections)
Failed in initilization phase, cleaned up all the mpispawn!

The application doesn't even start.

The MVAPICH2 instance was compiled using the following configuration:
./configure --enable-threads=multiple --disable-f90

Using MVAPICH2 v1.5rc1 with the same configuration options, I can get
my application to work with some communication schemes, but get error
messages when attempting to use one-sided active synchronization
calls.
Here are these errors:

(FOR ACTIVE SYNC WITH POST/WAIT/START/COMPLETE)
mpirun_rsh -np 4 -hostfile ./hostso mat_prod 4
send desc error
[0] Abort: [] Got completion with error 10, vendor code=88, dest rank=1
 at line 580 in file ibv_channel_manager.c
send desc error
[2] Abort: [] Got completion with error 10, vendor code=88, dest rank=3
 at line 580 in file ibv_channel_manager.c
[3] Abort: Got FATAL event 3
 at line 935 in file ibv_channel_manager.c
MPI process (rank: 0) terminated unexpectedly on compute-0-3.local
Exit code -5 signaled from compute-0-3
[1] Abort: Got FATAL event 3
 at line 935 in file ibv_channel_manager.c
MPI process (rank: 3) terminated unexpectedly on compute-0-4.local
make: *** [orun] Error 1


(FOR ACTIVE SYNC WITH FENCE)
mpirun_rsh -np 4 -hostfile ./hostso mat_prod 6
send desc error
[2] Abort: [] Got completion with error 10, vendor code=88, dest rank=3
 at line 580 in file ibv_channel_manager.c
send desc error
[0] Abort: [] Got completion with error 10, vendor code=88, dest rank=1
 at line 580 in file ibv_channel_manager.c
MPI process (rank: 2) terminated unexpectedly on compute-0-3.local
Exit code -5 signaled from compute-0-3
[3] Abort: Got FATAL event 3
 at line 935 in file ibv_channel_manager.c
[1] Abort: Got FATAL event 3
 at line 935 in file ibv_channel_manager.c
MPI process (rank: 3) terminated unexpectedly on compute-0-4.local
make: *** [orun] Error 1

In order to proceed, I need you to please specify a list of all the
necessary information you would need me to provide in order to
continue solving this issue.

Thank you very much.
Regards,

P.S.: Everything is working fine with (and just with) Nemesis for
InfiniBand (--with-device=ch3:nemesis:ib).

-- 
---------- .-
VERCELLONE, Juan.
(also known as 1010ad1c97efb4734854b6ffd0899401)


More information about the mvapich-discuss mailing list