[mvapich-discuss] multi-threaded CUDA MPI_Allgatherv crash

Justin Luitjens jluitjens at nvidia.com
Wed Oct 16 16:32:48 EDT 2013


The attached reproducer crashes in mvapich 2-2.0a.  It appears that the GPU direct version of MPI_Allgatherv is not thread safe.

I compiled this as follows:

%> nvcc -c -arch=sm_20 -O3 -I/shared/devtechapps/mpi/gnu-4.7.3/mvapich2-2.0a/cuda-5.5.22/include -Xcompiler -fopenmp mpialltoall.cu -o mpialltoall.o
%> mpic++ -o alltoall mpialltoall.o -L/shared/apps/cuda/CUDA-v5.5.22/lib64 -lcuda -lcudart -fopenmp

I then set the following variables:

export MV2_USE_CUDA=1
export MV2_ENABLE_AFFINITY=0

Finally I ran with this:

%> mpirun -np 2 ./alltoall

This crashes with the following error:

[dt00:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[dt00:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

If I set the number of threads to 1 this example runs fine.
If I set the number of threads to 2 and use host memory the example also runs fine.
This only seems to crash if the data is in device memory and we use multiple threads.

Thanks,
Justin

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131016/7501e4ea/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpialltoall.cu
Type: application/octet-stream
Size: 2508 bytes
Desc: mpialltoall.cu
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131016/7501e4ea/attachment-0001.obj>


More information about the mvapich-discuss mailing list