[mvapich-discuss] Running MPI jobs on multiple nodes
Ji Wan
wanjime at gmail.com
Sat Jun 21 05:36:47 EDT 2014
Hello,
I am currently trying to run MPI jobs on multiple nodes but encountered the
following errors:
[cli_0]: [cli_1]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device
aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device
[cli_2]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device
[cli_3]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(483).......:
MPID_Init(367)..............: channel initialization failed
MPIDI_CH3_Init(362).........:
MPIDI_CH3I_RDMA_init(170)...:
rdma_setup_startup_ring(389): cannot open hca device
This is the command I used to start the MPI job:
MV2_ENABLE_AFFINITY=0 MV2_USE_CUDA=1 GLOG_logtostderr=1 mpirun_rsh -ssh
-hostfile hosts -n 4 ./a.out xxx
and this is the *hosts* file:
192.168.1.1:2
192.168.1.2:2
The job was started on node 192.168.1.1, and I can connect to 192.168.1.2
via ssh without password.
Can anyone help me? Thanks!
*--Best regards,Wan Ji*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140621/fb970d76/attachment.html>
More information about the mvapich-discuss
mailing list