[mvapich-discuss] connect [mt_checkin]: Connection refused

Hari Subramoni subramoni.1 at osu.edu
Fri Feb 10 08:02:17 EST 2017


Can you please try using the hydra job launcher (mpiexec.hydra) to see if
that works? You should be able to find it in the same place as mpirun_rsh.

Thx,
Hari.

On Feb 10, 2017 3:19 AM, "201621070526 at std.uestc.edu.cn" <
201621070526 at std.uestc.edu.cn> wrote:

> hi,  Hari
>
> I have already setup the ssh password less login for both nodes even
> localhost have been setup as well.  and I am not root user.  as you
> mentioned it might caused by firewall, I think it might not be the case
> ,because I have tested Openmpi and it works well.
> so I suspect the is there anything wrong about* /etc/hosts/*  setting ?
>
> as XIAOYI mentioned in the following Discussion he slolve the problem by
> properly set */etc/hosts* , but I have NO idea about the detais....
>
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/
> 2017-January/006289.html
>
>
> sincerely, HD
>
> ------------------------------
> 201621070526 at std.uestc.edu.cn
>
>
> *From:* Hari Subramoni <subramoni.1 at osu.edu>
> *Date:* 2017-02-10 08:46
> *To:* 201621070526 at std.uestc.edu.cn
> *CC:* mvapich-discuss <mvapich-discuss at cse.ohio-state.edu>
> *Subject:* Re: [mvapich-discuss] connect [mt_checkin]: Connection refused
> It looks like a system issue. It could be that password less ssh is not
> setup. This is very likely if the user is root. There could be some
> firewalls blocking access to the nodes in the host file. Can you please
> check on these?
>
> Regards,
> Hari.
>
>
> On Feb 9, 2017 6:43 PM, "201621070526 at std.uestc.edu.cn" <
> 201621070526 at std.uestc.edu.cn> wrote:
>
> hi, I use the MVAPICH2.2-GDR got the same problem.
>
> mpirun_rsh -ssh -export -np 10 -hostfile mf ../get_local_ran
> k collective/osu_allreduce D D
> connect [mt_checkin]: Connection refused
> [root0-SCW4350-220:mpirun_rsh][child_handler] Error in init
> phase, aborting! (1/2 mpispawn connections)
> huang at root0-SCW4350-220:~/program/mvapich2-gdr/libexec/osu-
> micro-benchmarks/mpi$ [root0-SCW4350-220:mpispawn_0][report
> _error] connect() failed: Connection refused (111)
>
>
> *here is the set of my /etc/hosts"*
> 127.0.0.1 localhost
> 127.0.1.1 root0-SCW4350-220
> 172.16.18.220 node1
> 172.16.18.158 node2
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
>
>
> * here is the info of my MVAPICH and IB:*
>
> mpiname -a
> MVAPICH2-GDR 2.2 Tue Oct 25 22:00:00 EST 2016 ch3:mrail
>
> Compilation
> CC: gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexcepti
> ons -fstack-protector-strong --param=ssp-buffer-size=4 -
> grecord-gcc-switches   -m64 -mtune=generic   -DNDEBUG -DNVALGRIND -O2
> CXX: g++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexcept
> ions -fstack-protector-strong --param=ssp-buffer-size=4 -
> grecord-gcc-switches   -m64 -mtune=generic  -DNDEBUG -DNVALGRIND -O2
> F77: gfortran -L/lib -L/lib -O2 -g -pipe -Wall -Wp,-D_FORTIF
> Y_SOURCE=2 -fexceptions -fstack-protector-strong --para
> m=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=
> generic -I/opt/mvapich2/gdr/2.2/cuda8.0/gnu/lib64/gfortran/modules  -O2
> FC: gfortran -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fex
> ceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -I/opt/mvapich2/
> gdr/2.2/cuda8.0/gnu/lib64/gfortran/modules  -O2
> Configuration
> --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-
> gnu --program-prefix= --disable-dependency-tracking --prefix
> =/opt/mvapich2/gdr/2.2/cuda8.0/gnu --exec-prefix=/opt/
> mvapich2/gdr/2.2/cuda8.0/gnu --bindir=/opt/mvapich2/gdr/2.2/
> cuda8.0/gnu/bin --sbindir=/opt/mvapich2/gdr/2.2/cuda8.0/
> gnu/sbin --sysconfdir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/
> etc --datadir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/share --
> includedir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/include --
> libdir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/lib64 --
> libexecdir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/libexec --
> localstatedir=/var --sharedstatedir=/var/lib --mand
> ir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/share/man --infodir=/
> opt/mvapich2/gdr/2.2/cuda8.0/gnu/share/info --disable-
> rpath --disable-static --enable-shared --disable-rdma-
> cm --disable-mcast --without-hydra-ckpointlib --with-core-
> direct --enable-cuda CPPFLAGS=-I/usr/local/cuda-8.0/include
> LDFLAGS=-L/usr/local/cuda-8.0/lib64 -Wl,-rpath,/usr/local/cu
> da-8.0/lib64 -Wl,-rpath,XORIGIN/placeholder -Wl,--build-id
>
> ibstat
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 1
> Firmware version: 2.34.5000
> Hardware version: 1
> Node GUID: 0xe41d2d0300bf45c0
> System image GUID: 0xe41d2d0300bf45c3
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 40
> Base lid: 13
> LMC: 0
> SM lid: 7
> Capability mask: 0x02514868
> Port GUID: 0xe41d2d0300bf45c1
> Link layer: InfiniBand
>
> any help would be grately appericate...
>
>
> ------------------------------
> 201621070526 at std.uestc.edu.cn
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170210/67b73dc8/attachment-0001.html>


More information about the mvapich-discuss mailing list