[mvapich-discuss] MPI_Alltoallv hangs in MVAPICH-GDR

AUGUSTIN DEGOMME augustin.degomme at univ-grenoble-alpes.fr
Mon Mar 18 08:03:51 EDT 2019


Hello, 

With CUDA disabled (untested on the GPU version), MVAPICH2-GDR hangs forever on MPI_Alltoallv calls. This was reproduced on our code, but also using the osu benchmark in the installation folder of mvapich as well. Alltoall and all others works fine, only alltoallv hangs. 

This was tested both with the released 2.3 and 2.3.1 versions of MVAPICH2-GDR, bith on a debian testing system and on an ubuntu 18.04 docker image. I was not able to reproduce when using mvapich2 from sources, only with GDR. 

Steps to reproduce : 

I installed http://mvapich.cse.ohio-state.edu/download/mvapich/gdr/2.3.1/mofed4.5/mvapich2-gdr-mcast.cuda10.0.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm on a debian system with rpm2cpio 

rpm2cpio mvapich2-gdr-mcast.cuda10.0.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm.1 | cpio -id 
export LD_LIBRARY_PATH=INSTALLPATH/opt/mvapich2/gdr/2.3.1/mcast/no-openacc/cuda10.0/mofed4.5/mpirun/gnu4.8.5/lib64/:$LD_LIBRARY_PATH 
export MV2_USE_CUDA=0 
export MV2_USE_GDRCOPY=0 

./opt/mvapich2/gdr/2.3.1/mcast/no-openacc/cuda10.0/mofed4.5/mpirun/gnu4.8.5/bin/mpirun -np 2 opt/mvapich2/gdr/2.3.1/mcast/no-openacc/cuda10.0/mofed4.5/mpirun/gnu4.8.5/libexec/osu-micro-benchmarks/mpi/collectives/osu_alltoallv 

[xxx:mpi_rank_0][MPIDI_CH3_Init] MVAPICH2 has been built with support for CUDA. But, MV2_USE_CUDA not set to 1. This can lead to errors in using GPU buffers. If you are running applications that use GPU buffers, please set MV2_USE_CUDA=1 and try again. 
[xxx:mpi_rank_0][MPIDI_CH3_Init] To suppress this warning, please set MV2_SUPPRESS_CUDA_USAGE_WARNING to 1 

# OSU MPI All-to-Allv Personalized Exchange Latency Test v5.6.1 
# Size Avg Latency(us) 

... and nothing 

every single other osu_* test in the folder works as expected (including ialltoallv). strace reports a massive amount of sched_yield() = 0 calls. 

Best regards, 

Augustin Degomme 
CEA/IRIG 





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190318/a7d48c47/attachment.html>


More information about the mvapich-discuss mailing list