[mvapich-discuss] mpi error?

sk sdk0084 at yahoo.com
Wed May 8 07:28:45 EDT 2013


Hi Devendar,

I have tried what you suggested:
./mpichversion 
MVAPICH2 Version:         1.9rc1
MVAPICH2 Release date:    Tue Apr 16 12:35:17 EDT 2013
MVAPICH2 Device:          ch3:mrail
MVAPICH2 configure:       --prefix=/mnt/raid5/mvapich2/intel --disable-mcast --enable-g=all --enable-fast=none --enable-error-checking=all
MVAPICH2 CC:      icc    -g
MVAPICH2 CXX:     icpc   -g
MVAPICH2 F77:     ifort -L/lib -L/lib   -g
MVAPICH2 FC:      ifort   -g


mpirun -np 64 -f hosts -env MV2_DEBUG_SHOW_BACKTRACE 1 ./wrf.exe

but unfortunately the result is the same:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 152
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at sandy1.localdomain] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
[proxy:0:0 at sandy1.localdomain] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at sandy1.localdomain] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec at sandy1.localdomain] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec at sandy1.localdomain] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at sandy1.localdomain] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[mpiexec at sandy1.localdomain] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion


I have tried with mpirun_rsh but it doesn't work: 

sk at sandy1ib's password: connect [mt_checkin]: Connection refused
[sandy1.localdomain:mpirun_rsh][child_handler] Error in init phase, aborting! (0/2 mpispawn connections)
sk at sandy1ib's password: 
The following processes may have not been killed:
sandy1IB: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cat: rsl.out.0000: No such file or directory
tail: cannot open `rsl.out.0000' for reading: No such file or directory


Is there some log file where I can find eventually some clue what is going on?

Regards,
SK



________________________________
 From: Devendar Bureddy <bureddy at cse.ohio-state.edu>
To: sk <sdk0084 at yahoo.com> 
Cc: "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-state.edu> 
Sent: Wednesday, May 1, 2013 5:05 PM
Subject: Re: [mvapich-discuss] mpi error?
 


Hi Sk

It is little hard to say what is going wrong with given error message. Can you please add "--enable-g=all --enable-fast=none --enable-error-checking=all" to the configuration and run with MV2_DEBUG_SHOW_BACKTRACE=1 to see if this show any better error message. 
Can you also give a try with mpirun_rsh launcher?

-Devendar


On Wed, May 1, 2013 at 3:16 AM, sk <sdk0084 at yahoo.com> wrote:

Hi there,
>My WRF simulation crashed with the following  error message:
>=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>=   EXIT CODE: 152
>=   CLEANING UP REMAINING PROCESSES
>=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
>
>
>
>some details:
>
>
>mpirun -np 64 -f hosts ./wrf.exe
>
>
>
>./mpich2version
>MVAPICH2 Version:         1.9a2
>MVAPICH2 Release date:    Thu Nov  8 11:43:52 EST 2012
>MVAPICH2 Device:          ch3:mrail
>MVAPICH2 configure:      
 --prefix=/mnt/raid5/mvapich2/intel --disable-mcast
>MVAPICH2 CC:      icc    -DNDEBUG -DNVALGRIND -O2
>MVAPICH2 CXX:     icpc   -DNDEBUG -DNVALGRIND -O2
>MVAPICH2 F77:     ifort -L/lib -L/lib   -O2
>MVAPICH2 FC:      ifort   -O2
>
>
>
>Scientific Linux release 6.3 (Carbon) with infiniband network 
>
>Linux sandy1.localdomain 2.6.32-279.el6.x86_64 #1 SMP Thu Jun 21 07:08:44 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux
>
>
>the model doesn't report any error message. what could be the problem?
>
>
>Thanks!SK
>_______________________________________________
>mvapich-discuss mailing list
>mvapich-discuss at cse.ohio-state.edu
>http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Devendar 
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130508/3ab24fb9/attachment-0001.html


More information about the mvapich-discuss mailing list