[mvapich-discuss] problems with MVAPICH2 over 10GbE

Konz, Jeffrey (SSA Solution Centers) jeffrey.konz at hp.com
Wed Aug 17 11:17:29 EDT 2011


I am running on a cluster with the Mellanox LOM that supports both IB and 10 GbE.
Both ports on the interface are active, one is on IB network the other on 10 GbE network.

I built mvapich2-1.7rc1 with these options : "--with-device=ch3:mrail --with-rdma=gen2"

Running over IB works fine.

When I try to run over the 10GbE network with the "MV2_USE_RDMAOE=1" option I get this error:

Fatal error in MPI_Init:
Internal MPI error!

[atl3-13:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[atl3-13:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[atl3-13:mpispawn_0][child_handler] MPI process (rank: 0, pid: 23500) exited with status 1
[atl3-13:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 10.10.0.149 aborted: Error while reading a PMI socket (4)

In the hostfile I specified the IP addresses of the 10 GbE ports.

I am running incorrectly or have I not built mvapich with the correct options?

Thanks,

-Jeff




More information about the mvapich-discuss mailing list