[mvapich-discuss] connect [mt_checkin]: Connection refused

201621070526 at std.uestc.edu.cn 201621070526 at std.uestc.edu.cn
Tue Feb 28 03:26:05 EST 2017


Hi,I got a problem, while runing multi nodes mvapich program, that “connect [mt_checkin]: Connection refused”. it works well on single node.
to clearfy, I have already set password less login properly and the same piece of code can be run on multi nodes by using OPENMPI. here is more details about my seting and hardware infomantion.
any help will be grately appericated. 


prince at root-220:~$ cat single_hosts 172.16.18.220prince at root-200:~$ mpirun_rsh -n 2 -hostfile single_hosts MV2_SMP_USE_CMA=0 ./cpiProcess 1 on root-200Process 0 on root-200pi is approximately 3.1416009869231241, Error is 0.0000083333333309wall clock time = 0.000165prince at root-200:~$ cat hosts 172.16.18.220172.16.18.158prince at root-200:~$ mpirun_rsh -n 2 -hostfile hosts MV2_SMP_USE_CMA=0 ./cpiconnect [mt_checkin]: Connection refused[root-200:mpirun_rsh][child_handler] Error in init phase, aborting! (1/2 mpispawn connections)prince at root-200:~$ prince at root-200:~$ mpiname -aMVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:mrailCompilationCC: gcc    -DNDEBUG -DNVALGRIND -O2CXX: g++   -DNDEBUG -DNVALGRIND -O2F77: gfortran -L/lib -L/lib   -O2CA 'mlx4_0' CA type: MT4099
Number of ports: 1
Firmware version: 2.36.5000
Hardware version: 1
Node GUID: 0xe41d2d0300bf45c0
System image GUID: 0xe41d2d0300bf45c3
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 13
LMC: 0
SM lid: 7
Capability mask: 0x02514868
Port GUID: 0xe41d2d0300bf45c1
Link layer: InfiniBand
prince at root-200:~$ FC: gfortran   -O2

Configuration
--prefix=/usr/local/mvapich2 --with-cuda --with-device=ch3:mrail --with-rdma=gen2

prince at root-200:~$ ibstat

Configuration--prefix=/usr/local/mvapich2 --with-cuda --with-device=ch3:mrail --with-rdma=gen2
prince at root-200:~$ ibstat
Configuration--prefix=/usr/local/mvapich2 --with-cuda --with-device=ch3:mrail --with-rdma=gen2
prince at root-200:~$ ibstat
 

regrads!
prince.



201621070526 at std.uestc.edu.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170228/14d51f30/attachment.html>


More information about the mvapich-discuss mailing list