[mvapich-discuss] Segmentation fault

Srikanth Gumma sri4mailing at gmail.com
Tue Jun 17 00:42:12 EDT 2014


Hi,

I have been trying to work with a simple cpi.c program with mvapich2-1.9b
and having trouble executing in multiple nodes.

I tried all the options given in FAQs and on-line forums without any
success.

I got the below error message when I executed the command with mpirun -v

I'm sure I can get some help from some of the experts.I Installed mvapich2
in several other customer places and I never faced this strange issue.

Thanks in Advance.

[mpiexec at atlas4-c77] Launch arguments:
/app1/centos6.3/gnu/mvapich2-1.9/bin/hydra_pmi_proxy --control-port
172.18.185.212:45735 --debug --rmk user --launcher ssh --demux poll --iface
eth1 --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at atlas4-c77] Launch arguments: /usr/bin/ssh -x atlas4-c78
"/app1/centos6.3/gnu/mvapich2-1.9/bin/hydra_pmi_proxy" --control-port
172.18.185.212:45735 --debug --rmk user --launcher ssh --demux poll --iface
eth1 --pgid 0 --retries 10 --usize -2 --proxy-id 1
[proxy:0:0 at atlas4-c77] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at atlas4-c77] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:0 at atlas4-c77] got pmi command (from 0): get_maxes

[proxy:0:0 at atlas4-c77] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at atlas4-c77] got pmi command (from 0): get_appnum

[proxy:0:0 at atlas4-c77] PMI response: cmd=appnum appnum=0
[proxy:0:0 at atlas4-c77] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at atlas4-c77] PMI response: cmd=my_kvsname kvsname=kvs_20136_0
[proxy:0:0 at atlas4-c77] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at atlas4-c77] PMI response: cmd=my_kvsname kvsname=kvs_20136_0
[proxy:0:0 at atlas4-c77] got pmi command (from 0): get
kvsname=kvs_20136_0 key=PMI_process_mapping
[proxy:0:0 at atlas4-c77] PMI response: cmd=get_result rc=0 msg=success
value=(vector,(0,2,1))
[proxy:0:0 at atlas4-c77] got pmi command (from 0): put
kvsname=kvs_20136_0 key=MVAPICH2_0000 value=000000cd:002e0406:002e0407:
[proxy:0:0 at atlas4-c77] we don't understand this command put; forwarding
upstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=put kvsname=kvs_20136_0
key=MVAPICH2_0000 value=000000cd:002e0406:002e0407:
[mpiexec at atlas4-c77] PMI response to fd 6 pid 0: cmd=put_result rc=0
msg=success
[proxy:0:0 at atlas4-c77] we don't understand the response put_result;
forwarding downstream
[proxy:0:0 at atlas4-c77] got pmi command (from 0): barrier_in

[proxy:0:0 at atlas4-c77] forwarding command (cmd=barrier_in) upstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at atlas4-c78] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at atlas4-c78] PMI response: cmd=response_to_init pmi_version=1
pmi_subversion=1 rc=0
[proxy:0:1 at atlas4-c78] got pmi command (from 4): get_maxes

[proxy:0:1 at atlas4-c78] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:1 at atlas4-c78] got pmi command (from 4): get_appnum

[proxy:0:1 at atlas4-c78] PMI response: cmd=appnum appnum=0
[proxy:0:1 at atlas4-c78] got pmi command (from 4): get_my_kvsname

[proxy:0:1 at atlas4-c78] PMI response: cmd=my_kvsname kvsname=kvs_20136_0
[proxy:0:1 at atlas4-c78] got pmi command (from 4): get_my_kvsname

[proxy:0:1 at atlas4-c78] PMI response: cmd=my_kvsname kvsname=kvs_20136_0
[proxy:0:1 at atlas4-c78] got pmi command (from 4): get
kvsname=kvs_20136_0 key=PMI_process_mapping
[proxy:0:1 at atlas4-c78] PMI response: cmd=get_result rc=0 msg=success
value=(vector,(0,2,1))
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=put kvsname=kvs_20136_0
key=MVAPICH2_0001 value=00000129:002b0405:002b0406:
[mpiexec at atlas4-c77] PMI response to fd 7 pid 4: cmd=put_result rc=0
msg=success
[proxy:0:1 at atlas4-c78] got pmi command (from 4): put
kvsname=kvs_20136_0 key=MVAPICH2_0001 value=00000129:002b0405:002b0406:
[proxy:0:1 at atlas4-c78] we don't understand this command put; forwarding
upstream
[proxy:0:1 at atlas4-c78] we don't understand the response put_result;
forwarding downstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at atlas4-c77] PMI response to fd 6 pid 4: cmd=barrier_out
[mpiexec at atlas4-c77] PMI response to fd 7 pid 4: cmd=barrier_out
[proxy:0:1 at atlas4-c78] got pmi command (from 4): barrier_in

[proxy:0:1 at atlas4-c78] forwarding command (cmd=barrier_in) upstream
[proxy:0:0 at atlas4-c77] PMI response: cmd=barrier_out
[proxy:0:0 at atlas4-c77] got pmi command (from 0): get
kvsname=kvs_20136_0 key=MVAPICH2_0001
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0001
[mpiexec at atlas4-c77] PMI response to fd 6 pid 0: cmd=get_result rc=0
msg=success value=00000129:002b0405:002b0406:
[proxy:0:0 at atlas4-c77] forwarding command (cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0001) upstream
[proxy:0:0 at atlas4-c77] we don't understand the response get_result;
forwarding downstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0000
[mpiexec at atlas4-c77] PMI response to fd 7 pid 4: cmd=get_result rc=0
msg=success value=000000cd:002e0406:002e0407:
[proxy:0:0 at atlas4-c77] got pmi command (from 0): get
kvsname=kvs_20136_0 key=MVAPICH2_0001
[proxy:0:0 at atlas4-c77] forwarding command (cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0001) upstream
[proxy:0:1 at atlas4-c78] PMI response: cmd=barrier_out
[proxy:0:1 at atlas4-c78] got pmi command (from 4): get
kvsname=kvs_20136_0 key=MVAPICH2_0000
[proxy:0:1 at atlas4-c78] forwarding command (cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0000) upstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0001
[mpiexec at atlas4-c77] PMI response to fd 6 pid 0: cmd=get_result rc=0
msg=success value=00000129:002b0405:002b0406:
[proxy:0:0 at atlas4-c77] we don't understand the response get_result;
forwarding downstream
[proxy:0:0 at atlas4-c77] got pmi command (from 0): barrier_in

[proxy:0:0 at atlas4-c77] forwarding command (cmd=barrier_in) upstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at atlas4-c78] we don't understand the response get_result;
forwarding downstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0000
[mpiexec at atlas4-c77] PMI response to fd 7 pid 4: cmd=get_result rc=0
msg=success value=000000cd:002e0406:002e0407:
[proxy:0:1 at atlas4-c78] got pmi command (from 4): get
kvsname=kvs_20136_0 key=MVAPICH2_0000
[proxy:0:1 at atlas4-c78] forwarding command (cmd=get kvsname=kvs_20136_0
key=MVAPICH2_0000) upstream
[proxy:0:1 at atlas4-c78] we don't understand the response get_result;
forwarding downstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at atlas4-c77] PMI response to fd 6 pid 4: cmd=barrier_out
[mpiexec at atlas4-c77] PMI response to fd 7 pid 4: cmd=barrier_out
[proxy:0:0 at atlas4-c77] PMI response: cmd=barrier_out
[proxy:0:1 at atlas4-c78] got pmi command (from 4): barrier_in

[proxy:0:1 at atlas4-c78] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at atlas4-c78] PMI response: cmd=barrier_out
[proxy:0:0 at atlas4-c77] got pmi command (from 0): barrier_in

[proxy:0:0 at atlas4-c77] forwarding command (cmd=barrier_in) upstream
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at atlas4-c77] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at atlas4-c77] PMI response to fd 6 pid 4: cmd=barrier_out
[mpiexec at atlas4-c77] PMI response to fd 7 pid 4: cmd=barrier_out
[proxy:0:0 at atlas4-c77] PMI response: cmd=barrier_out
[proxy:0:1 at atlas4-c78] got pmi command (from 4): barrier_in

[proxy:0:1 at atlas4-c78] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at atlas4-c78] PMI response: cmd=barrier_out
[atlas4-c77:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
(signal 11)
[atlas4-c78:mpi_rank_1][error_sighandler] Caught error: Segmentation fault
(signal 11)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at atlas4-c77] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
[proxy:0:0 at atlas4-c77] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at atlas4-c77] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[mpiexec at atlas4-c77] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at atlas4-c77] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[mpiexec at atlas4-c77] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
completion
[mpiexec at atlas4-c77] main (./ui/mpich/mpiexec.c:331): process manager error
waiting for completion


Regards
Srikanth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140617/78434155/attachment-0001.html>


More information about the mvapich-discuss mailing list