[mvapich-discuss] mpiexec.hydra error - unable to connect from compute nodes to master port 42773

bright.yang at vaisala.com bright.yang at vaisala.com
Tue Sep 13 18:56:50 EDT 2011


You are right. The port # is randomly selected each time. It was working before but broken now. I need to google how to check firewall...

Bright Yang

-----Original Message-----
From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu] 
Sent: Tuesday, September 13, 2011 4:48 PM
To: Yang Bright BRYA
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mpiexec.hydra error - unable to connect from compute nodes to master port 42773

Hi, is this a new installation or an installation that was previously
working?  The error message suggests checking for firewalls which may
prevent the launched processes from connecting back to the head node.
I believe this port number is randomly chosen and may be different in
between consecutive runs.

On Tue, Sep 13, 2011 at 5:24 PM,  <bright.yang at vaisala.com> wrote:
> Here is what I got when trying to run a parallel job -
>
> # mpiexec.hydra -f 2_12hosts -n 24 ./wrf.exe
>
> [proxy:0:1 at compute-0-1.local] HYDU_sock_connect (./utils/sock/sock.c:188):
> unable to connect from "compute-0-1.local" to "kratos.vaisala.com"
> (Connection timed out)
>
> [proxy:0:1 at compute-0-1.local] main (./pm/pmiserv/pmip.c:205): unable to
> connect to server kratos.vaisala.com at port 42773 (check for firewalls!)
>
> [proxy:0:0 at compute-0-0.local] HYDU_sock_connect (./utils/sock/sock.c:188):
> unable to connect from "compute-0-0.local" to "kratos.vaisala.com"
> (Connection timed out)
>
> [proxy:0:0 at compute-0-0.local] main (./pm/pmiserv/pmip.c:205): unable to
> connect to server kratos.vaisala.com at port 42773 (check for firewalls!)
>
>
>
> Where the port 42773 is specified as a configure file? I tried a netstat,
> there is no listener for that port #
>
>
>
> netstat --tcp --udp --listening --program
>
> Active Internet connections (only servers)
>
> Proto Recv-Q Send-Q Local Address               Foreign Address
> State       PID/Program name
>
> tcp        0      0 *:40000                     *:*
> LISTEN      5394/mysqld
>
> tcp        0      0 *:nfs                       *:*
> LISTEN      -
>
> tcp        0      0 *:805                       *:*
>    LISTEN      5290/rpc.mountd
>
> tcp        0      0 *:54438                     *:*
> LISTEN      6994/pgroupd
>
> tcp        0      0 localhost.localdomain:smux  *:*
> LISTEN      5152/snmpd
>
> tcp        0      0 *:8649                      *:*
> LISTEN      6224/gmond
>
> tcp        0      0 *:52010                     *:*
> LISTEN      -
>
> tcp        0      0 *:8651                      *:*
> LISTEN      4652/gmetad
>
> tcp        0      0 localhost.localdomain:5900  *:*
> LISTEN      5953/Xorg
>
> tcp        0      0 *:8652                      *:*
> LISTEN      4652/gmetad
>
> tcp        0      0 *:941                       *:*
> LISTEN      4575/rpc.statd
>
> tcp        0      0 *:sunrpc                    *:*
> LISTEN      4510/portmap
>
> tcp        0      0 10.10.120.21:domain         *:*
> LISTEN      4481/named
>
> tcp        0      0 kratos.local:domain         *:*
> LISTEN      4481/named
>
> tcp        0      0 localhost.localdomai:domain *:*
> LISTEN      4481/named
>
> tcp        0      0 *:27000                     *:*
>                   LISTEN      6993/lmgrd
>
> tcp        0      0 *:opalis-rdv                *:*
> LISTEN      4997/sge_qmaster
>
> tcp        0      0 *:smtp                      *:*
> LISTEN      5472/master
>
> tcp        0      0 localhost.localdomain:rndc  *:*
> LISTEN      4481/named
>
> tcp        0      0 *:734                       *:*
> LISTEN      5218/rpc.rquotad
>
> tcp        0      0 *:http                      *:*
>               LISTEN      5484/httpd
>
> tcp        0      0 *:ssh                       *:*
> LISTEN      29581/sshd
>
> tcp        0      0 *:https                     *:*
> LISTEN      5484/httpd
>
> udp        0      0 *:nfs
> *:*                                     -
>
> udp        0      0 *:syslog
> *:*                                     4362/syslogd
>
> udp        0      0 *:9632
> *:*                                     5685/tracker-server
>
> udp        0      0 *:snmp
> *:*                                     5152/snmpd
>
> udp        0      0 *:802
> *:*                                     5290/rpc.mountd
>
> udp        0      0 *:935
>                  *:*                                     4575/rpc.statd
>
> udp        0      0 *:938
> *:*                                     4575/rpc.statd
>
> udp        0      0 10.10.120.21:domain
> *:*                                     4481/named
>
> udp        0      0 kratos.local:domain
> *:*                                     4481/named
>
> udp        0      0 localhost.locald:domain
> *:*                                     4481/named
>
> udp        0      0 *:bootps
>         *:*                                     5509/dhcpd
>
> udp        0      0 *:tftp
> *:*                                     5179/xinetd
>
> udp        0      0 *:8649
> *:*                                     6224/gmond
>
> udp        0      0 *:47439
> *:*                                     -
>
> udp        0      0 *:netviewdm3
> *:*                                     5218/rpc.rquotad
>
> udp        0      0 *:sunrpc                    *:*
>                              4510/portmap
>
> udp        0      0 10.10.120.21:ntp
> *:*                                     20331/ntpd
>
> udp        0      0 kratos.local:ntp
> *:*                                     20331/ntpd
>
> udp        0      0 localhost.localdomain:ntp
> *:*                                     20331/ntpd
>
> udp        0      0 *:ntp
> *:*                                     20331/ntpd
>
> udp        0      0 fe80::225:90ff:fe19:cdf:ntp
> *:*                                     20331/ntpd
>
> udp        0      0 fe80::225:90ff:fe19:cde:ntp
> *:*                                     20331/ntpd
>
> udp        0      0 localhost:ntp               *:*
>                               20331/ntpd
>
> udp        0      0 *:ntp
> *:*                                     20331/ntpd
>
>
>
> Thanks.
>
> Bright Yang
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list