[Mvapich-discuss] [mvapich-discuss]Can MPI_Bcast support for RoCE V2 hardware UD-multicast

王龙钢 lgwang at cug.edu.cn
Fri May 26 01:53:42 EDT 2023


Hello,
I want to run MPI_Bcast collective communication based on mvapich2-2.3.7 version, using RoCE V2 and UD hardware multicast. Could you please let me know if the mvapich2/2.3.7 supports this? If it is supported, how should the environment variable parameters be set?
NIC card is mlx5_0 and link_layer is Ethernet.
The relevant environment variables were set as follows:
MV2_USE_RDMA_CM_MCAST=1 
MV2_USE_MCAST=1
But it doesn't work properly,The error message reads as follows:


""
[work1:mpi_rank_0][mv2_mcast_process_comm_init_req] MCAST process Comm init failed comm_id:1383 retries :129
[work1:mpi_rank_0][mv2_mcast_remove_comm_init_req] End MCAST Comm init comm_id:1383


Can the environment variables MV2_USE_RDMA_CM=1 and MV2_USE_MCAST=1 、MV2_USE_RDMA_CM_MCAST=1,be used together? When running with these variables enabled simultaneously, the following error occurs:
[work4:mpi_rank_0][mv2_mcast_prepare_ud_ctx] MCAST UD QP creation failedFatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(493):
MPID_Init(419).......: channel initialization failed
MPIDI_CH3_Init(804)..: Error in create multicast UD context for multicast


[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(493):
MPID_Init(419).......: channel initialization failed
MPIDI_CH3_Init(804)..: Error in create multicast UD context for multicast


[work4:mpispawn_0][mtpmi_processops] parse_str() failed with error code -105
""
Best regards 




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230526/b8cc1921/attachment-0005.html>


More information about the Mvapich-discuss mailing list