[mvapich-discuss] problems with MVAPICH2 over 10GbE
Konz, Jeffrey (SSA Solution Centers)
jeffrey.konz at hp.com
Wed Aug 17 11:17:29 EDT 2011
I am running on a cluster with the Mellanox LOM that supports both IB and 10 GbE.
Both ports on the interface are active, one is on IB network the other on 10 GbE network.
I built mvapich2-1.7rc1 with these options : "--with-device=ch3:mrail --with-rdma=gen2"
Running over IB works fine.
When I try to run over the 10GbE network with the "MV2_USE_RDMAOE=1" option I get this error:
Fatal error in MPI_Init:
Internal MPI error!
[atl3-13:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[atl3-13:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[atl3-13:mpispawn_0][child_handler] MPI process (rank: 0, pid: 23500) exited with status 1
[atl3-13:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 10.10.0.149 aborted: Error while reading a PMI socket (4)
In the hostfile I specified the IP addresses of the 10 GbE ports.
I am running incorrectly or have I not built mvapich with the correct options?
Thanks,
-Jeff
More information about the mvapich-discuss
mailing list