[mvapich-discuss] Running MPI jobs on multiple nodes

Ji Wan wanjime at gmail.com
Sat Jun 21 05:36:47 EDT 2014


Hello,

I am currently trying to run MPI jobs on multiple nodes but encountered the
following errors:

[cli_0]: [cli_1]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device

aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device

[cli_2]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device

[cli_3]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device

This is the command I used to start the MPI job:

MV2_ENABLE_AFFINITY=0 MV2_USE_CUDA=1 GLOG_logtostderr=1 mpirun_rsh -ssh
-hostfile hosts -n 4 ./a.out xxx

and this is the *hosts* file:

192.168.1.1:2
192.168.1.2:2

The job was started on node 192.168.1.1, and I can connect to 192.168.1.2
via ssh without password.

Can anyone help me? Thanks!



*--Best regards,Wan Ji*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140621/fb970d76/attachment.html>


More information about the mvapich-discuss mailing list