[mvapich-discuss] Mvapich2-1.2 for OpenFabrics IB/iWARP :
Jobterminates with error
Vivek Gavane
vivekg at cdac.in
Fri Feb 20 01:02:30 EST 2009
Sir,
I have tried for different set of nodes for various runs, the same
error is reported. But when I tried for small number of cores i.e 8 the
job never came out even though it was complete and the output file was
generated. Also the processes were showing 99.9% CPU usage even after
complete output was generated.
The application code I am using is MEME version meme3.0.3
http://meme.nbcr.net/downloads/old_versions/
Also I installed the newer version of MEME version meme_4.1.0
http://meme.nbcr.net/downloads/
It is also giving the following error everytime on different set of nodes:
-----------------------------------
Exit code -5 signaled from ibc0-27
Killing remote processes...MPI process terminated unexpectedly
DONE
-----------------------------------
The redirected output file of the application contains:
-----------------------------
cleanupSignal 15 received.
-----------------------------
Thanks.
--
Regards,
Vivek Gavane
Member Technical Staff
Bioinformatics team,
Scientific & Engineering Computing Group,
National PARAM Supercomputing Facility,
Centre for Development of Advanced Computing,
Pune-411007.
Phone: +91 20 25704100 ext. 195
Direct Line: +91 20 25704195
On Thu, Feb 19, 2009, Dhabaleswar Panda <panda at cse.ohio-state.edu> said:
> Vivek,
>
> Do you see this error always when you run this application? Do you see
> this error when you run your application on different set of nodes? If
> this happens always (irrespective of runs and nodes), will it be possible
> for you to send us a code snippet which reproduces this problem. This will
> help us to investigate this issue further.
>
> Thanks,
>
> DK
>
>> Sir,
>> Thank you for the reply but the cable and switch seems to be fine. Is
>> there any other reason/solution for the errors. And also the application
>> program is giving complete and correct output except for the errors at the
>> end.
>>
>> Thanks.
>> --
>> Regards,
>> Vivek Gavane
>>
>> Member Technical Staff
>> Bioinformatics team,
>> Scientific & Engineering Computing Group,
>> National PARAM Supercomputing Facility,
>> Centre for Development of Advanced Computing,
>> Pune-411007.
>>
>> Phone: +91 20 25704100 ext. 195
>> Direct Line: +91 20 25704195
>>
>> On Tue, Feb 17, 2009, Dhabaleswar Panda <panda at cse.ohio-state.edu> said:
>>
>> > Code 12 is a timeout -- could be a bad cable/HCA/switch leaf. If the
>> > system is really large then it could be congestion.
>> >
>> > Thanks,
>> >
>> > DK
>> >
>> > On Tue, 17 Feb 2009, Vivek Gavane wrote:
>> >
>> >> Hello,
>> >> I have mvapich2-1.2 compiled with the following options:
>> >>
>> >>
>> >> /configure --with-rdma=gen2 --enable-sharedlibs=gcc --enable-g=dbg
>> >> --enable-debuginfo --with-ib-include=/opt/OFED/include
>> >> --with-ib-libpath=/opt/OFED/lib64 --prefix=/home/apps/mvapich2-1.2
>> >>
>> >> After I submit a job, the job completes but the following errors are
>> >> reported on the console:
>> >>
>> >> -------------------------------------------------------------
>> >> send desc error
>> >> Exit code -5 signaled from ibc0-16
>> >> Killing remote processes...[14] Abort: [] Got completion with error 12,
>> >> vendor code=81, dest rank=0
>> >> at line 553 in file ibv_channel_manager.c
>> >> MPI process terminated unexpectedly
>> >> DONE
>> >> ------------------------------------------------------------
>> >>
>> >> And in the redirected output file, following errors are reported at the
>> >> end:
>> >> -----------------------------------------
>> >> cleanupSignal 15 received.
>> >> Signal 15 received.
>> >> Signal 15 received.
>> >> Signal 15 received.
>> >> -----------------------------------------
>> >>
>> >> Do anyone know the reason for this?
>> >>
>> >> Thanks in advance.
>> >> --
>> >> Regards,
>> >> Vivek Gavane
>> >> _______________________________________________
>> >> mvapich-discuss mailing list
>> >> mvapich-discuss at cse.ohio-state.edu
>> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >>
>> >
>>
>>
>
More information about the mvapich-discuss
mailing list