[mvapich-discuss] Unable to run multiple coprocessors with MVAPICH2-MIC
Bryant Lam
blam at hcs.ufl.edu
Fri May 22 00:16:50 EDT 2015
I'm experimenting with MVAPICH2-MIC on a server with multiple Intel Xeon
Phi coprocessors in it. I can successfully execute single-device MPI
runs (e.g., only on the localhost, mic0, mic1, etc.), but when I pair
any of the two together, I run into startup issues:
> ${MV2MIC_PATH}/intel64/bin/mpirun_rsh -config config -hostfile hosts
# From the host.
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 4096
[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(483)....:
MPID_Init(363)...........: channel initialization failed
MPIDI_CH3_Init(438)......:
MPIDI_CH3I_RDMA_init(325):
rdma_iba_hca_init(879)...: Attributes failed sanity check
[servername:mpispawn_0][readline] Unexpected End-Of-File on file
descriptor 5. MPI process died?
[servername:mpispawn_0][mtpmi_processops] Error while reading PMI
socket. MPI process died?
[servername:mpispawn_0][child_handler] MPI process (rank: 0, pid: 15209)
exited with status 1
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 4096
[cli_1]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(483)....:
MPID_Init(363)...........: channel initialization failed
MPIDI_CH3_Init(438)......:
MPIDI_CH3I_RDMA_init(325):
rdma_iba_hca_init(879)...: Attributes failed sanity check
[servername-mic0:mpispawn_1][readline] Unexpected End-Of-File on file
descriptor 5. MPI process died?
[servername-mic0:mpispawn_1][mtpmi_processops] Error while reading PMI
socket. MPI process died?
[servername-mic0:mpispawn_1][child_handler] MPI process (rank: 1, pid:
33510) exited with status 1
> cat config
-n 1 : $PWD/exe.host
-n 1 : $PWD/exe
> cat hosts
localhost:1
mic0:1
The README file included with MVAPICH2-MIC states that
MV2_IBA_HCA=mlx4_0 needs to be set in the environment (i.e., export
MV2_IBA_HCA=mlx4_0), but this server does not have an InfiniBand card. I
intend to only connect via Intel SCIF.
1. Does MVAPICH2_MIC work without an InfiniBand card if I intend to
only communicate within a node? (e.g., export MV2_IBA_HCA=scif0)
2. Is my startup error "Attributes failed sanity check" related to #1?
Thanks,
Bryant
More information about the mvapich-discuss
mailing list