[mvapich-discuss] MPI_Alltoallv hangs in MVAPICH-GDR

Subramoni, Hari subramoni.1 at osu.edu
Mon Mar 18 10:16:28 EDT 2019


Hi, Augustin.

I just tested this on our local cluster and both intra-node and inter-node executions of osu_alltoallv works fine for several processes.

I have a very dumb question. In the set of commands pasted below, I see the following export.

export LD_LIBRARY_PATH=INSTALLPATH/opt/mvapich2/gdr/2.3.1/mcast/no-openacc/cuda10.0/mofed4.5/mpirun/gnu4.8.5/lib64/:$LD_LIBRARY_PATH

Under the assumption that these were the exact commands you used, could the issue be that the variable “INSTALLPATH” has not been defined or that a “$” was missed before “INSTALLPATH”?
Thx,
Hari.

From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of AUGUSTIN DEGOMME
Sent: Monday, March 18, 2019 5:04 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] MPI_Alltoallv hangs in MVAPICH-GDR

Hello,

With CUDA disabled (untested on the GPU version), MVAPICH2-GDR hangs forever on MPI_Alltoallv calls. This was reproduced on our code, but also using the osu benchmark in the installation folder of mvapich as well. Alltoall and all others works fine, only alltoallv hangs.

This was tested both with the released 2.3 and 2.3.1 versions of MVAPICH2-GDR, bith on a debian testing system and on an ubuntu 18.04 docker image. I was not able to reproduce when using mvapich2 from sources, only with GDR.

Steps to reproduce :

I installed http://mvapich.cse.ohio-state.edu/download/mvapich/gdr/2.3.1/mofed4.5/mvapich2-gdr-mcast.cuda10.0.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm on a debian system with rpm2cpio

rpm2cpio mvapich2-gdr-mcast.cuda10.0.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm.1 | cpio -id
export LD_LIBRARY_PATH=INSTALLPATH/opt/mvapich2/gdr/2.3.1/mcast/no-openacc/cuda10.0/mofed4.5/mpirun/gnu4.8.5/lib64/:$LD_LIBRARY_PATH
export MV2_USE_CUDA=0
export MV2_USE_GDRCOPY=0

./opt/mvapich2/gdr/2.3.1/mcast/no-openacc/cuda10.0/mofed4.5/mpirun/gnu4.8.5/bin/mpirun -np 2 opt/mvapich2/gdr/2.3.1/mcast/no-openacc/cuda10.0/mofed4.5/mpirun/gnu4.8.5/libexec/osu-micro-benchmarks/mpi/collectives/osu_alltoallv

[xxx:mpi_rank_0][MPIDI_CH3_Init] MVAPICH2 has been built with support for CUDA. But, MV2_USE_CUDA not set to 1. This can lead to errors in using GPU buffers. If you are running applications that use GPU buffers, please set MV2_USE_CUDA=1 and try again.
[xxx:mpi_rank_0][MPIDI_CH3_Init] To suppress this warning, please set MV2_SUPPRESS_CUDA_USAGE_WARNING to 1

# OSU MPI All-to-Allv Personalized Exchange Latency Test v5.6.1
# Size       Avg Latency(us)

... and nothing

every single other osu_* test in the folder works as expected (including ialltoallv). strace reports a massive amount of sched_yield() = 0 calls.

Best regards,

Augustin Degomme
CEA/IRIG





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190318/aeb02d3c/attachment-0001.html>


More information about the mvapich-discuss mailing list