[mvapich-discuss] mpiexec.hydra error - unable to connect from compute nodes to master port 42773

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Sep 13 18:58:48 EDT 2011


You may want to ask your System Administrator if there have been any
system changes that could have caused this as well.

On Tue, Sep 13, 2011 at 6:56 PM,  <bright.yang at vaisala.com> wrote:
> You are right. The port # is randomly selected each time. It was working before but broken now. I need to google how to check firewall...
>
> Bright Yang
>
> -----Original Message-----
> From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
> Sent: Tuesday, September 13, 2011 4:48 PM
> To: Yang Bright BRYA
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] mpiexec.hydra error - unable to connect from compute nodes to master port 42773
>
> Hi, is this a new installation or an installation that was previously
> working?  The error message suggests checking for firewalls which may
> prevent the launched processes from connecting back to the head node.
> I believe this port number is randomly chosen and may be different in
> between consecutive runs.
>
> On Tue, Sep 13, 2011 at 5:24 PM,  <bright.yang at vaisala.com> wrote:
>> Here is what I got when trying to run a parallel job -
>>
>> # mpiexec.hydra -f 2_12hosts -n 24 ./wrf.exe
>>
>> [proxy:0:1 at compute-0-1.local] HYDU_sock_connect (./utils/sock/sock.c:188):
>> unable to connect from "compute-0-1.local" to "kratos.vaisala.com"
>> (Connection timed out)
>>
>> [proxy:0:1 at compute-0-1.local] main (./pm/pmiserv/pmip.c:205): unable to
>> connect to server kratos.vaisala.com at port 42773 (check for firewalls!)
>>
>> [proxy:0:0 at compute-0-0.local] HYDU_sock_connect (./utils/sock/sock.c:188):
>> unable to connect from "compute-0-0.local" to "kratos.vaisala.com"
>> (Connection timed out)
>>
>> [proxy:0:0 at compute-0-0.local] main (./pm/pmiserv/pmip.c:205): unable to
>> connect to server kratos.vaisala.com at port 42773 (check for firewalls!)
>>
>>
>>
>> Where the port 42773 is specified as a configure file? I tried a netstat,
>> there is no listener for that port #
>>
>>
>>
>> netstat --tcp --udp --listening --program
>>
>> Active Internet connections (only servers)
>>
>> Proto Recv-Q Send-Q Local Address               Foreign Address
>> State       PID/Program name
>>
>> tcp        0      0 *:40000                     *:*
>> LISTEN      5394/mysqld
>>
>> tcp        0      0 *:nfs                       *:*
>> LISTEN      -
>>
>> tcp        0      0 *:805                       *:*
>>    LISTEN      5290/rpc.mountd
>>
>> tcp        0      0 *:54438                     *:*
>> LISTEN      6994/pgroupd
>>
>> tcp        0      0 localhost.localdomain:smux  *:*
>> LISTEN      5152/snmpd
>>
>> tcp        0      0 *:8649                      *:*
>> LISTEN      6224/gmond
>>
>> tcp        0      0 *:52010                     *:*
>> LISTEN      -
>>
>> tcp        0      0 *:8651                      *:*
>> LISTEN      4652/gmetad
>>
>> tcp        0      0 localhost.localdomain:5900  *:*
>> LISTEN      5953/Xorg
>>
>> tcp        0      0 *:8652                      *:*
>> LISTEN      4652/gmetad
>>
>> tcp        0      0 *:941                       *:*
>> LISTEN      4575/rpc.statd
>>
>> tcp        0      0 *:sunrpc                    *:*
>> LISTEN      4510/portmap
>>
>> tcp        0      0 10.10.120.21:domain         *:*
>> LISTEN      4481/named
>>
>> tcp        0      0 kratos.local:domain         *:*
>> LISTEN      4481/named
>>
>> tcp        0      0 localhost.localdomai:domain *:*
>> LISTEN      4481/named
>>
>> tcp        0      0 *:27000                     *:*
>>                   LISTEN      6993/lmgrd
>>
>> tcp        0      0 *:opalis-rdv                *:*
>> LISTEN      4997/sge_qmaster
>>
>> tcp        0      0 *:smtp                      *:*
>> LISTEN      5472/master
>>
>> tcp        0      0 localhost.localdomain:rndc  *:*
>> LISTEN      4481/named
>>
>> tcp        0      0 *:734                       *:*
>> LISTEN      5218/rpc.rquotad
>>
>> tcp        0      0 *:http                      *:*
>>               LISTEN      5484/httpd
>>
>> tcp        0      0 *:ssh                       *:*
>> LISTEN      29581/sshd
>>
>> tcp        0      0 *:https                     *:*
>> LISTEN      5484/httpd
>>
>> udp        0      0 *:nfs
>> *:*                                     -
>>
>> udp        0      0 *:syslog
>> *:*                                     4362/syslogd
>>
>> udp        0      0 *:9632
>> *:*                                     5685/tracker-server
>>
>> udp        0      0 *:snmp
>> *:*                                     5152/snmpd
>>
>> udp        0      0 *:802
>> *:*                                     5290/rpc.mountd
>>
>> udp        0      0 *:935
>>                  *:*                                     4575/rpc.statd
>>
>> udp        0      0 *:938
>> *:*                                     4575/rpc.statd
>>
>> udp        0      0 10.10.120.21:domain
>> *:*                                     4481/named
>>
>> udp        0      0 kratos.local:domain
>> *:*                                     4481/named
>>
>> udp        0      0 localhost.locald:domain
>> *:*                                     4481/named
>>
>> udp        0      0 *:bootps
>>         *:*                                     5509/dhcpd
>>
>> udp        0      0 *:tftp
>> *:*                                     5179/xinetd
>>
>> udp        0      0 *:8649
>> *:*                                     6224/gmond
>>
>> udp        0      0 *:47439
>> *:*                                     -
>>
>> udp        0      0 *:netviewdm3
>> *:*                                     5218/rpc.rquotad
>>
>> udp        0      0 *:sunrpc                    *:*
>>                              4510/portmap
>>
>> udp        0      0 10.10.120.21:ntp
>> *:*                                     20331/ntpd
>>
>> udp        0      0 kratos.local:ntp
>> *:*                                     20331/ntpd
>>
>> udp        0      0 localhost.localdomain:ntp
>> *:*                                     20331/ntpd
>>
>> udp        0      0 *:ntp
>> *:*                                     20331/ntpd
>>
>> udp        0      0 fe80::225:90ff:fe19:cdf:ntp
>> *:*                                     20331/ntpd
>>
>> udp        0      0 fe80::225:90ff:fe19:cde:ntp
>> *:*                                     20331/ntpd
>>
>> udp        0      0 localhost:ntp               *:*
>>                               20331/ntpd
>>
>> udp        0      0 *:ntp
>> *:*                                     20331/ntpd
>>
>>
>>
>> Thanks.
>>
>> Bright Yang
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list