[mvapich-discuss] Mvapich2-1.2 for OpenFabrics IB/iWARP : Jobterminates with error

Dhabaleswar Panda panda at cse.ohio-state.edu
Wed Feb 18 16:24:45 EST 2009


Vivek,

Do you see this error always when you run this application? Do you see
this error when you run your application on different set of nodes? If
this happens always (irrespective of runs and nodes), will it be possible
for you to send us a code snippet which reproduces this problem. This will
help us to investigate this issue further.

Thanks,

DK

> Sir,
>     Thank you for the reply but the cable and switch seems to be fine. Is
> there any other reason/solution for the errors. And also the application
> program is giving complete and correct output except for the errors at the
> end.
>
> Thanks.
> --
> Regards,
> Vivek Gavane
>
> Member Technical Staff
> Bioinformatics team,
> Scientific & Engineering Computing Group,
> National PARAM Supercomputing Facility,
> Centre for Development of Advanced Computing,
> Pune-411007.
>
> Phone:       +91 20 25704100 ext. 195
> Direct Line: +91 20 25704195
>
> On Tue, Feb 17, 2009, Dhabaleswar Panda <panda at cse.ohio-state.edu> said:
>
> > Code 12 is a timeout -- could be a bad cable/HCA/switch leaf. If the
> > system is really large then it could be congestion.
> >
> > Thanks,
> >
> > DK
> >
> > On Tue, 17 Feb 2009, Vivek Gavane wrote:
> >
> >> Hello,
> >>      I have mvapich2-1.2 compiled with the following options:
> >>
> >>
> >> /configure --with-rdma=gen2 --enable-sharedlibs=gcc --enable-g=dbg
> >> --enable-debuginfo --with-ib-include=/opt/OFED/include
> >> --with-ib-libpath=/opt/OFED/lib64 --prefix=/home/apps/mvapich2-1.2
> >>
> >> After I submit a job, the job completes but the following errors are
> >> reported on the console:
> >>
> >> -------------------------------------------------------------
> >> send desc error
> >> Exit code -5 signaled from ibc0-16
> >> Killing remote processes...[14] Abort: [] Got completion with error 12,
> >> vendor code=81, dest rank=0
> >>  at line 553 in file ibv_channel_manager.c
> >> MPI process terminated unexpectedly
> >> DONE
> >> ------------------------------------------------------------
> >>
> >> And in the redirected output file, following errors are reported at the
> >> end:
> >> -----------------------------------------
> >> cleanupSignal 15 received.
> >> Signal 15 received.
> >> Signal 15 received.
> >> Signal 15 received.
> >> -----------------------------------------
> >>
> >> Do anyone know the reason for this?
> >>
> >> Thanks in advance.
> >> --
> >> Regards,
> >> Vivek Gavane
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >
>
>



More information about the mvapich-discuss mailing list