[mvapich-discuss] MVAPICH2 issue with CH3 gen2 channel

sreeram potluri potluri at cse.ohio-state.edu
Tue May 3 10:12:17 EDT 2011


Dear Juan,

Thanks for reporting your problem. We will need information about your
system, build and application to debug this further

1) Configuration of the system (node and network) and organization of
the host file (1 process/node?)

2) Regarding failures with v1.6, I see that you have built it
with --enable-threads=multiple. Does your application use multiple threads
that make MPI calls?

Were you able run other tests (a hello world or the OSU benchmarks ) with
this build? If not there could be a more basic issue with it.

3) Regarding failures with 1.5rc1, is the test you are running an
application benchmark or a full application? Will it be possible for you to
send us your code? It will be easiest to debug if we can reproduce your
issue locally.

If not, excerpts from the code where it uses one-sided calls that are
causing the issue can give some insights (window creation, communication and
synchronization calls).

Thank you
Sreeram Potluri

On Tue, May 3, 2011 at 7:10 AM, Juan Vercellone <juanjov at gmail.com> wrote:

> Hello, list.
> I am having some trouble executing my MPI applications with MVAPICH2
> v1.6 (latest stable release).
>
> This is what I get when launching my programs:
> mpirun_rsh -np 4 -hostfile ./hostso mat_prod 3
> MPI process (rank: 1) terminated unexpectedly on compute-0-4.local
> MPI process (rank: 0) terminated unexpectedly on compute-0-3.local
> child_handler: Error in init phase...wait for cleanup! (1/2mpispawn
> connections)
> child_handler: Error in init phase...wait for cleanup! (1/2mpispawn
> connections)
> Failed in initilization phase, cleaned up all the mpispawn!
>
> The application doesn't even start.
>
> The MVAPICH2 instance was compiled using the following configuration:
> ./configure --enable-threads=multiple --disable-f90
>
> Using MVAPICH2 v1.5rc1 with the same configuration options, I can get
> my application to work with some communication schemes, but get error
> messages when attempting to use one-sided active synchronization
> calls.
> Here are these errors:
>
> (FOR ACTIVE SYNC WITH POST/WAIT/START/COMPLETE)
> mpirun_rsh -np 4 -hostfile ./hostso mat_prod 4
> send desc error
> [0] Abort: [] Got completion with error 10, vendor code=88, dest rank=1
>  at line 580 in file ibv_channel_manager.c
> send desc error
> [2] Abort: [] Got completion with error 10, vendor code=88, dest rank=3
>  at line 580 in file ibv_channel_manager.c
> [3] Abort: Got FATAL event 3
>  at line 935 in file ibv_channel_manager.c
> MPI process (rank: 0) terminated unexpectedly on compute-0-3.local
> Exit code -5 signaled from compute-0-3
> [1] Abort: Got FATAL event 3
>  at line 935 in file ibv_channel_manager.c
> MPI process (rank: 3) terminated unexpectedly on compute-0-4.local
> make: *** [orun] Error 1
>
>
> (FOR ACTIVE SYNC WITH FENCE)
> mpirun_rsh -np 4 -hostfile ./hostso mat_prod 6
> send desc error
> [2] Abort: [] Got completion with error 10, vendor code=88, dest rank=3
>  at line 580 in file ibv_channel_manager.c
> send desc error
> [0] Abort: [] Got completion with error 10, vendor code=88, dest rank=1
>  at line 580 in file ibv_channel_manager.c
> MPI process (rank: 2) terminated unexpectedly on compute-0-3.local
> Exit code -5 signaled from compute-0-3
> [3] Abort: Got FATAL event 3
>  at line 935 in file ibv_channel_manager.c
> [1] Abort: Got FATAL event 3
>  at line 935 in file ibv_channel_manager.c
> MPI process (rank: 3) terminated unexpectedly on compute-0-4.local
> make: *** [orun] Error 1
>
> In order to proceed, I need you to please specify a list of all the
> necessary information you would need me to provide in order to
> continue solving this issue.
>
> Thank you very much.
> Regards,
>
> P.S.: Everything is working fine with (and just with) Nemesis for
> InfiniBand (--with-device=ch3:nemesis:ib).
>
> --
> ---------- .-
> VERCELLONE, Juan.
> (also known as 1010ad1c97efb4734854b6ffd0899401)
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110503/70ef696a/attachment.html


More information about the mvapich-discuss mailing list