[mvapich-discuss] Weird "BAD TERMINATION" error when running with BLCR

Arjun J Rao rectangle.king at gmail.com
Wed Feb 5 07:57:05 EST 2014


I have two 12-core machines in my little mini-cluster.  Installed MVAPICH
on both with the --enable-ckpt option. Both machines can do passwordless
logins to each other. Also, inserted the BLCR kernel module so that lsmod
shows blcr has been installed.
After compiling my "Hello World from many processes" MPI program and
running it on 1 machine, I get fine output. But on running it on 2
machines, I get the following error :


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at abc3] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:902):
assert (!closed) failed
[proxy:0:0 at abc3] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0 at abc3] main (pm/pmiserv/pmip.c:206): demux engine error waiting
for event
[mpiexec at abc3] HYDT_bscu_wait_for_completion
(tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at abc3] HYDT_bsci_wait_for_completion
(tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[mpiexec at abc3] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
completion
[mpiexec at abc3] main (ui/mpich/mpiexec.c:331): process manager error waiting
for completion


Frustrating.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140205/9667d327/attachment.html>


More information about the mvapich-discuss mailing list