[mvapich-discuss] MV2_USE_CUDA=1 gets ignored?

Justin Luitjens jluitjens at nvidia.com
Wed Feb 6 12:11:41 EST 2013


Hi Christian,

Have you called cudaSetDevice() prior to MPI_Init?  I have noticed that the current implementation requires this.  Hopefully future implementations do not require that.

Justin

-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Christian Trott
Sent: Wednesday, February 06, 2013 8:45 AM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] MV2_USE_CUDA=1 gets ignored?

Hi

I am trying to use GPU to GPU mpi communication on a new cluster of 
ours, and it always fails with segfaults. The funny thing is I get the 
same valgrind output whether I use MV2_USE_CUDA=1 or not (output comes 
further down). I downloaded the most recent 1.9a2 version and this is my 
current config line:

./configure --enable-cuda --with-cuda=/home/crtrott/lib/cuda-5.0/ 
--prefix=/home/crtrott/mpi/mvapich2-1.9/gcc/cuda50a --disable-rdmacm 
--disable-mcast --enable-g=dbg --disable-fast

This is my run command:

mpirun -np 2 env MV2_USE_CUDA=1 MV2_DEBUG_SHOW_BACKTRACE=1 valgrind 
./osu_bw D D

And this is the relevant valgrind output (and as :

==58800== Warning: set address range perms: large range [0x3d00000000, 
0x5e00000000) (noaccess)
==58801== Warning: set address range perms: large range [0x3d00000000, 
0x5e00000000) (noaccess)
==58800== Warning: set address range perms: large range [0x2d00000000, 
0x3100000000) (noaccess)
==58801== Warning: set address range perms: large range [0x2d00000000, 
0x3100000000) (noaccess)
# OSU MPI-CUDA Bandwidth Test
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size        Bandwidth (MB/s)
==58800== Invalid read of size 1
==58800==    at 0x4A08020: memcpy (mc_replace_strmem.c:628)
==58800==    by 0x4452D6: MPIUI_Memcpy (mpiimpl.h:146)
==58800==    by 0x44D41E: MPIDI_CH3I_SMP_writev (ch3_smp_progress.c:2895)
==58800==    by 0x5DA884: MPIDI_CH3_SMP_iSendv (ch3_isendv.c:108)
==58800==    by 0x5DAC39: MPIDI_CH3_iSendv (ch3_isendv.c:187)
==58800==    by 0x5D1BBA: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:632)
==58800==    by 0x42E09E: MPID_Isend (mpid_isend.c:220)
==58800==    by 0x40C1B3: PMPI_Isend (isend.c:122)
==58800==    by 0x406E85: main (osu_bw.c:242)
==58800==  Address 0x2d00200000 is not stack'd, malloc'd or (recently) 
free'd
==58800==
[k20-0001:mpi_rank_0][error_sighandler] Caught error: Segmentation fault 
(signal 11)
[k20-0001:mpi_rank_0][print_backtrace]   0: ./osu_bw() [0x4b65a2]
[k20-0001:mpi_rank_0][print_backtrace]   1: ./osu_bw() [0x4b66de]
[k20-0001:mpi_rank_0][print_backtrace]   2: /lib64/libpthread.so.0() 
[0x38b7a0f4a0]
[k20-0001:mpi_rank_0][print_backtrace]   3: 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so(_vgrZU_libcZdsoZa_memcpy+0x160) 
[0x4a08020]
[k20-0001:mpi_rank_0][print_backtrace]   4: ./osu_bw() [0x4452d7]
[k20-0001:mpi_rank_0][print_backtrace]   5: ./osu_bw() [0x44d41f]
[k20-0001:mpi_rank_0][print_backtrace]   6: ./osu_bw() [0x5da885]
[k20-0001:mpi_rank_0][print_backtrace]   7: ./osu_bw() [0x5dac3a]
[k20-0001:mpi_rank_0][print_backtrace]   8: ./osu_bw() [0x5d1bbb]
[k20-0001:mpi_rank_0][print_backtrace]   9: ./osu_bw() [0x42e09f]
[k20-0001:mpi_rank_0][print_backtrace]  10: ./osu_bw() [0x40c1b4]
[k20-0001:mpi_rank_0][print_backtrace]  11: ./osu_bw() [0x406e86]
[k20-0001:mpi_rank_0][print_backtrace]  12: 
/lib64/libc.so.6(__libc_start_main+0xfd) [0x38b6e1ecdd]
[k20-0001:mpi_rank_0][print_backtrace]  13: ./osu_bw() [0x4066a9]

Any suggestions would be greatly appreciated.

Christian


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------



More information about the mvapich-discuss mailing list