[mvapich-discuss] Mvapich2-1.2 for OpenFabrics IB/iWARP : Jobterminates with error

Vivek Gavane vivekg at cdac.in
Fri Feb 20 01:02:30 EST 2009


Sir,
      I have tried for different set of nodes for various runs, the same
error is reported. But when I tried for small number of cores i.e 8 the
job never came out even though it was complete and the output file was
generated. Also the processes were showing 99.9% CPU usage even after
complete output was generated.

The application code I am using is MEME version meme3.0.3
http://meme.nbcr.net/downloads/old_versions/

Also I installed the newer version of MEME version meme_4.1.0
http://meme.nbcr.net/downloads/

It is also giving the following error everytime on different set of nodes:
-----------------------------------
Exit code -5 signaled from ibc0-27
Killing remote processes...MPI process terminated unexpectedly
DONE
-----------------------------------

The redirected output file of the application contains:
-----------------------------
cleanupSignal 15 received.
-----------------------------

Thanks.
-- 
Regards,
Vivek Gavane

Member Technical Staff
Bioinformatics team,
Scientific & Engineering Computing Group,
National PARAM Supercomputing Facility,
Centre for Development of Advanced Computing,
Pune-411007.

Phone:       +91 20 25704100 ext. 195
Direct Line: +91 20 25704195

On Thu, Feb 19, 2009, Dhabaleswar Panda <panda at cse.ohio-state.edu> said:

> Vivek,
> 
> Do you see this error always when you run this application? Do you see
> this error when you run your application on different set of nodes? If
> this happens always (irrespective of runs and nodes), will it be possible
> for you to send us a code snippet which reproduces this problem. This will
> help us to investigate this issue further.
> 
> Thanks,
> 
> DK
> 
>> Sir,
>>     Thank you for the reply but the cable and switch seems to be fine. Is
>> there any other reason/solution for the errors. And also the application
>> program is giving complete and correct output except for the errors at the
>> end.
>>
>> Thanks.
>> --
>> Regards,
>> Vivek Gavane
>>
>> Member Technical Staff
>> Bioinformatics team,
>> Scientific & Engineering Computing Group,
>> National PARAM Supercomputing Facility,
>> Centre for Development of Advanced Computing,
>> Pune-411007.
>>
>> Phone:       +91 20 25704100 ext. 195
>> Direct Line: +91 20 25704195
>>
>> On Tue, Feb 17, 2009, Dhabaleswar Panda <panda at cse.ohio-state.edu> said:
>>
>> > Code 12 is a timeout -- could be a bad cable/HCA/switch leaf. If the
>> > system is really large then it could be congestion.
>> >
>> > Thanks,
>> >
>> > DK
>> >
>> > On Tue, 17 Feb 2009, Vivek Gavane wrote:
>> >
>> >> Hello,
>> >>      I have mvapich2-1.2 compiled with the following options:
>> >>
>> >>
>> >> /configure --with-rdma=gen2 --enable-sharedlibs=gcc --enable-g=dbg
>> >> --enable-debuginfo --with-ib-include=/opt/OFED/include
>> >> --with-ib-libpath=/opt/OFED/lib64 --prefix=/home/apps/mvapich2-1.2
>> >>
>> >> After I submit a job, the job completes but the following errors are
>> >> reported on the console:
>> >>
>> >> -------------------------------------------------------------
>> >> send desc error
>> >> Exit code -5 signaled from ibc0-16
>> >> Killing remote processes...[14] Abort: [] Got completion with error 12,
>> >> vendor code=81, dest rank=0
>> >>  at line 553 in file ibv_channel_manager.c
>> >> MPI process terminated unexpectedly
>> >> DONE
>> >> ------------------------------------------------------------
>> >>
>> >> And in the redirected output file, following errors are reported at the
>> >> end:
>> >> -----------------------------------------
>> >> cleanupSignal 15 received.
>> >> Signal 15 received.
>> >> Signal 15 received.
>> >> Signal 15 received.
>> >> -----------------------------------------
>> >>
>> >> Do anyone know the reason for this?
>> >>
>> >> Thanks in advance.
>> >> --
>> >> Regards,
>> >> Vivek Gavane
>> >> _______________________________________________
>> >> mvapich-discuss mailing list
>> >> mvapich-discuss at cse.ohio-state.edu
>> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >>
>> >
>>
>>
> 






More information about the mvapich-discuss mailing list