[mvapich-discuss] Unable to run multiple coprocessors with MVAPICH2-MIC

khaled hamidouche hamidouc at cse.ohio-state.edu
Sun May 24 21:30:48 EDT 2015


Hi Bryant,

MVAPICH2-MIC requires the availability  of IB HCA to work even for
intranode Jobs.

Thanks

On Fri, May 22, 2015 at 12:16 AM, Bryant Lam <blam at hcs.ufl.edu> wrote:

> I'm experimenting with MVAPICH2-MIC on a server with multiple Intel Xeon
> Phi coprocessors in it. I can successfully execute single-device MPI
> runs (e.g., only on the localhost, mic0, mic1, etc.), but when I pair
> any of the two together, I run into startup issues:
>
>  > ${MV2MIC_PATH}/intel64/bin/mpirun_rsh -config config -hostfile hosts
> # From the host.
>
> Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
> Max MV2_SRQ_SIZE is 0, set to 4096
> [cli_0]: aborting job:
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(483)....:
> MPID_Init(363)...........: channel initialization failed
> MPIDI_CH3_Init(438)......:
> MPIDI_CH3I_RDMA_init(325):
> rdma_iba_hca_init(879)...: Attributes failed sanity check
>
> [servername:mpispawn_0][readline] Unexpected End-Of-File on file
> descriptor 5. MPI process died?
> [servername:mpispawn_0][mtpmi_processops] Error while reading PMI
> socket. MPI process died?
> [servername:mpispawn_0][child_handler] MPI process (rank: 0, pid: 15209)
> exited with status 1
>
> Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
> Max MV2_SRQ_SIZE is 0, set to 4096
> [cli_1]: aborting job:
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(483)....:
> MPID_Init(363)...........: channel initialization failed
> MPIDI_CH3_Init(438)......:
> MPIDI_CH3I_RDMA_init(325):
> rdma_iba_hca_init(879)...: Attributes failed sanity check
>
> [servername-mic0:mpispawn_1][readline] Unexpected End-Of-File on file
> descriptor 5. MPI process died?
> [servername-mic0:mpispawn_1][mtpmi_processops] Error while reading PMI
> socket. MPI process died?
> [servername-mic0:mpispawn_1][child_handler] MPI process (rank: 1, pid:
> 33510) exited with status 1
>
>  > cat config
> -n 1 : $PWD/exe.host
> -n 1 : $PWD/exe
>
>  > cat hosts
> localhost:1
> mic0:1
>
> The README file included with MVAPICH2-MIC states that
> MV2_IBA_HCA=mlx4_0 needs to be set in the environment (i.e., export
> MV2_IBA_HCA=mlx4_0), but this server does not have an InfiniBand card. I
> intend to only connect via Intel SCIF.
>
> 1.  Does MVAPICH2_MIC work without an InfiniBand card if I intend to
> only communicate within a node? (e.g., export MV2_IBA_HCA=scif0)
>
> 2.  Is my startup error "Attributes failed sanity check" related to #1?
>
> Thanks,
>
> Bryant
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150524/84b57168/attachment.html>


More information about the mvapich-discuss mailing list