[mvapich-discuss] MV2_USE_CUDA=1 gets ignored?

Christian Trott crtrott at sandia.gov
Wed Feb 6 11:44:51 EST 2013


Hi

I am trying to use GPU to GPU mpi communication on a new cluster of 
ours, and it always fails with segfaults. The funny thing is I get the 
same valgrind output whether I use MV2_USE_CUDA=1 or not (output comes 
further down). I downloaded the most recent 1.9a2 version and this is my 
current config line:

./configure --enable-cuda --with-cuda=/home/crtrott/lib/cuda-5.0/ 
--prefix=/home/crtrott/mpi/mvapich2-1.9/gcc/cuda50a --disable-rdmacm 
--disable-mcast --enable-g=dbg --disable-fast

This is my run command:

mpirun -np 2 env MV2_USE_CUDA=1 MV2_DEBUG_SHOW_BACKTRACE=1 valgrind 
./osu_bw D D

And this is the relevant valgrind output (and as :

==58800== Warning: set address range perms: large range [0x3d00000000, 
0x5e00000000) (noaccess)
==58801== Warning: set address range perms: large range [0x3d00000000, 
0x5e00000000) (noaccess)
==58800== Warning: set address range perms: large range [0x2d00000000, 
0x3100000000) (noaccess)
==58801== Warning: set address range perms: large range [0x2d00000000, 
0x3100000000) (noaccess)
# OSU MPI-CUDA Bandwidth Test
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size        Bandwidth (MB/s)
==58800== Invalid read of size 1
==58800==    at 0x4A08020: memcpy (mc_replace_strmem.c:628)
==58800==    by 0x4452D6: MPIUI_Memcpy (mpiimpl.h:146)
==58800==    by 0x44D41E: MPIDI_CH3I_SMP_writev (ch3_smp_progress.c:2895)
==58800==    by 0x5DA884: MPIDI_CH3_SMP_iSendv (ch3_isendv.c:108)
==58800==    by 0x5DAC39: MPIDI_CH3_iSendv (ch3_isendv.c:187)
==58800==    by 0x5D1BBA: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:632)
==58800==    by 0x42E09E: MPID_Isend (mpid_isend.c:220)
==58800==    by 0x40C1B3: PMPI_Isend (isend.c:122)
==58800==    by 0x406E85: main (osu_bw.c:242)
==58800==  Address 0x2d00200000 is not stack'd, malloc'd or (recently) 
free'd
==58800==
[k20-0001:mpi_rank_0][error_sighandler] Caught error: Segmentation fault 
(signal 11)
[k20-0001:mpi_rank_0][print_backtrace]   0: ./osu_bw() [0x4b65a2]
[k20-0001:mpi_rank_0][print_backtrace]   1: ./osu_bw() [0x4b66de]
[k20-0001:mpi_rank_0][print_backtrace]   2: /lib64/libpthread.so.0() 
[0x38b7a0f4a0]
[k20-0001:mpi_rank_0][print_backtrace]   3: 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so(_vgrZU_libcZdsoZa_memcpy+0x160) 
[0x4a08020]
[k20-0001:mpi_rank_0][print_backtrace]   4: ./osu_bw() [0x4452d7]
[k20-0001:mpi_rank_0][print_backtrace]   5: ./osu_bw() [0x44d41f]
[k20-0001:mpi_rank_0][print_backtrace]   6: ./osu_bw() [0x5da885]
[k20-0001:mpi_rank_0][print_backtrace]   7: ./osu_bw() [0x5dac3a]
[k20-0001:mpi_rank_0][print_backtrace]   8: ./osu_bw() [0x5d1bbb]
[k20-0001:mpi_rank_0][print_backtrace]   9: ./osu_bw() [0x42e09f]
[k20-0001:mpi_rank_0][print_backtrace]  10: ./osu_bw() [0x40c1b4]
[k20-0001:mpi_rank_0][print_backtrace]  11: ./osu_bw() [0x406e86]
[k20-0001:mpi_rank_0][print_backtrace]  12: 
/lib64/libc.so.6(__libc_start_main+0xfd) [0x38b6e1ecdd]
[k20-0001:mpi_rank_0][print_backtrace]  13: ./osu_bw() [0x4066a9]

Any suggestions would be greatly appreciated.

Christian




More information about the mvapich-discuss mailing list