[mvapich-discuss] connect [mt_checkin]: Connection refused

201621070526 at std.uestc.edu.cn 201621070526 at std.uestc.edu.cn
Thu Feb 9 21:53:29 EST 2017


and mvapich-gdr works well on single node>>>



201621070526 at std.uestc.edu.cn
 
From: 201621070526 at std.uestc.edu.cn
Date: 2017-02-10 09:54
To: Hari Subramoni
CC: mvapich-discuss
Subject: Re: Re: [mvapich-discuss] connect [mt_checkin]: Connection refused
hi,  Hari

I have already setup the ssh password less login for both nodes even localhost have been setup as well.  and I am not root user.  as you mentioned it might caused by firewall, I think it might not be the case ,because I have tested Openmpi and it works well.
so I suspect the is there anything wrong about /etc/hosts/  setting ?

as XIAOYI mentioned in the following Discussion he slolve the problem by properly set /etc/hosts , but I have NO idea about the detais....

http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2017-January/006289.html 


sincerely, HD 



201621070526 at std.uestc.edu.cn
 
From: Hari Subramoni
Date: 2017-02-10 08:46
To: 201621070526 at std.uestc.edu.cn
CC: mvapich-discuss
Subject: Re: [mvapich-discuss] connect [mt_checkin]: Connection refused
It looks like a system issue. It could be that password less ssh is not setup. This is very likely if the user is root. There could be some firewalls blocking access to the nodes in the host file. Can you please check on these?

Regards, 
Hari. 


On Feb 9, 2017 6:43 PM, "201621070526 at std.uestc.edu.cn" <201621070526 at std.uestc.edu.cn> wrote:
hi, I use the MVAPICH2.2-GDR got the same problem.

mpirun_rsh -ssh -export -np 10 -hostfile mf ../get_local_rank collective/osu_allreduce D D
connect [mt_checkin]: Connection refused
[root0-SCW4350-220:mpirun_rsh][child_handler] Error in init phase, aborting! (1/2 mpispawn connections)
huang at root0-SCW4350-220:~/program/mvapich2-gdr/libexec/osu-micro-benchmarks/mpi$ [root0-SCW4350-220:mpispawn_0][report_error] connect() failed: Connection refused (111)


here is the set of my /etc/hosts"
127.0.0.1 localhost
127.0.1.1 root0-SCW4350-220
172.16.18.220 node1
172.16.18.158 node2

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


 here is the info of my MVAPICH and IB:

mpiname -a
MVAPICH2-GDR 2.2 Tue Oct 25 22:00:00 EST 2016 ch3:mrail

Compilation
CC: gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic   -DNDEBUG -DNVALGRIND -O2
CXX: g++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -DNDEBUG -DNVALGRIND -O2
F77: gfortran -L/lib -L/lib -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -I/opt/mvapich2/gdr/2.2/cuda8.0/gnu/lib64/gfortran/modules  -O2
FC: gfortran -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -I/opt/mvapich2/gdr/2.2/cuda8.0/gnu/lib64/gfortran/modules  -O2
Configuration
--build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/opt/mvapich2/gdr/2.2/cuda8.0/gnu --exec-prefix=/opt/mvapich2/gdr/2.2/cuda8.0/gnu --bindir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/bin --sbindir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/sbin --sysconfdir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/etc --datadir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/share --includedir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/include --libdir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/lib64 --libexecdir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/share/man --infodir=/opt/mvapich2/gdr/2.2/cuda8.0/gnu/share/info --disable-rpath --disable-static --enable-shared --disable-rdma-cm --disable-mcast --without-hydra-ckpointlib --with-core-direct --enable-cuda CPPFLAGS=-I/usr/local/cuda-8.0/include LDFLAGS=-L/usr/local/cuda-8.0/lib64 -Wl,-rpath,/usr/local/cuda-8.0/lib64 -Wl,-rpath,XORIGIN/placeholder -Wl,--build-id

ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.34.5000
Hardware version: 1
Node GUID: 0xe41d2d0300bf45c0
System image GUID: 0xe41d2d0300bf45c3
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 13
LMC: 0
SM lid: 7
Capability mask: 0x02514868
Port GUID: 0xe41d2d0300bf45c1
Link layer: InfiniBand

any help would be grately appericate...




201621070526 at std.uestc.edu.cn

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170210/6575dc80/attachment-0001.html>


More information about the mvapich-discuss mailing list