[mvapich-discuss] Unable to run multiple coprocessors with MVAPICH2-MIC

Bryant Lam blam at hcs.ufl.edu
Sun May 24 23:28:13 EDT 2015


Thanks for the heads up. I appreciate the response.

Bryant

On 05/24/2015 09:30 PM, khaled hamidouche wrote:
> Hi Bryant,
>
> MVAPICH2-MIC requires the availability  of IB HCA to work even for 
> intranode Jobs.
>
> Thanks
>
> On Fri, May 22, 2015 at 12:16 AM, Bryant Lam <blam at hcs.ufl.edu 
> <mailto:blam at hcs.ufl.edu>> wrote:
>
>     I'm experimenting with MVAPICH2-MIC on a server with multiple
>     Intel Xeon
>     Phi coprocessors in it. I can successfully execute single-device MPI
>     runs (e.g., only on the localhost, mic0, mic1, etc.), but when I pair
>     any of the two together, I run into startup issues:
>
>      > ${MV2MIC_PATH}/intel64/bin/mpirun_rsh -config config -hostfile
>     hosts
>     # From the host.
>
>     Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
>     Max MV2_SRQ_SIZE is 0, set to 4096
>     [cli_0]: aborting job:
>     Fatal error in MPI_Init:
>     Other MPI error, error stack:
>     MPIR_Init_thread(483)....:
>     MPID_Init(363)...........: channel initialization failed
>     MPIDI_CH3_Init(438)......:
>     MPIDI_CH3I_RDMA_init(325):
>     rdma_iba_hca_init(879)...: Attributes failed sanity check
>
>     [servername:mpispawn_0][readline] Unexpected End-Of-File on file
>     descriptor 5. MPI process died?
>     [servername:mpispawn_0][mtpmi_processops] Error while reading PMI
>     socket. MPI process died?
>     [servername:mpispawn_0][child_handler] MPI process (rank: 0, pid:
>     15209)
>     exited with status 1
>
>     Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
>     Max MV2_SRQ_SIZE is 0, set to 4096
>     [cli_1]: aborting job:
>     Fatal error in MPI_Init:
>     Other MPI error, error stack:
>     MPIR_Init_thread(483)....:
>     MPID_Init(363)...........: channel initialization failed
>     MPIDI_CH3_Init(438)......:
>     MPIDI_CH3I_RDMA_init(325):
>     rdma_iba_hca_init(879)...: Attributes failed sanity check
>
>     [servername-mic0:mpispawn_1][readline] Unexpected End-Of-File on file
>     descriptor 5. MPI process died?
>     [servername-mic0:mpispawn_1][mtpmi_processops] Error while reading PMI
>     socket. MPI process died?
>     [servername-mic0:mpispawn_1][child_handler] MPI process (rank: 1, pid:
>     33510) exited with status 1
>
>      > cat config
>     -n 1 : $PWD/exe.host
>     -n 1 : $PWD/exe
>
>      > cat hosts
>     localhost:1
>     mic0:1
>
>     The README file included with MVAPICH2-MIC states that
>     MV2_IBA_HCA=mlx4_0 needs to be set in the environment (i.e., export
>     MV2_IBA_HCA=mlx4_0), but this server does not have an InfiniBand
>     card. I
>     intend to only connect via Intel SCIF.
>
>     1.  Does MVAPICH2_MIC work without an InfiniBand card if I intend to
>     only communicate within a node? (e.g., export MV2_IBA_HCA=scif0)
>
>     2.  Is my startup error "Attributes failed sanity check" related
>     to #1?
>
>     Thanks,
>
>     Bryant
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150524/2bfe1722/attachment-0001.html>


More information about the mvapich-discuss mailing list