[EXTERNAL] RE: [mvapich-discuss] MV2_USE_CUDA=1 gets ignored?

Christian Trott crtrott at sandia.gov
Wed Feb 6 12:13:48 EST 2013


Yeah I do in miniMD but right now I am just running the osu_bw test code 
which is shipped with mvapich (and which works on my workstation).

Christian

On 02/06/2013 10:11 AM, Justin Luitjens wrote:
> Hi Christian,
>
> Have you called cudaSetDevice() prior to MPI_Init?  I have noticed that the current implementation requires this.  Hopefully future implementations do not require that.
>
> Justin
>
> -----Original Message-----
> From: mvapich-discuss-bounces at cse.ohio-state.edu [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Christian Trott
> Sent: Wednesday, February 06, 2013 8:45 AM
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: [mvapich-discuss] MV2_USE_CUDA=1 gets ignored?
>
> Hi
>
> I am trying to use GPU to GPU mpi communication on a new cluster of
> ours, and it always fails with segfaults. The funny thing is I get the
> same valgrind output whether I use MV2_USE_CUDA=1 or not (output comes
> further down). I downloaded the most recent 1.9a2 version and this is my
> current config line:
>
> ./configure --enable-cuda --with-cuda=/home/crtrott/lib/cuda-5.0/
> --prefix=/home/crtrott/mpi/mvapich2-1.9/gcc/cuda50a --disable-rdmacm
> --disable-mcast --enable-g=dbg --disable-fast
>
> This is my run command:
>
> mpirun -np 2 env MV2_USE_CUDA=1 MV2_DEBUG_SHOW_BACKTRACE=1 valgrind
> ./osu_bw D D
>
> And this is the relevant valgrind output (and as :
>
> ==58800== Warning: set address range perms: large range [0x3d00000000,
> 0x5e00000000) (noaccess)
> ==58801== Warning: set address range perms: large range [0x3d00000000,
> 0x5e00000000) (noaccess)
> ==58800== Warning: set address range perms: large range [0x2d00000000,
> 0x3100000000) (noaccess)
> ==58801== Warning: set address range perms: large range [0x2d00000000,
> 0x3100000000) (noaccess)
> # OSU MPI-CUDA Bandwidth Test
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size        Bandwidth (MB/s)
> ==58800== Invalid read of size 1
> ==58800==    at 0x4A08020: memcpy (mc_replace_strmem.c:628)
> ==58800==    by 0x4452D6: MPIUI_Memcpy (mpiimpl.h:146)
> ==58800==    by 0x44D41E: MPIDI_CH3I_SMP_writev (ch3_smp_progress.c:2895)
> ==58800==    by 0x5DA884: MPIDI_CH3_SMP_iSendv (ch3_isendv.c:108)
> ==58800==    by 0x5DAC39: MPIDI_CH3_iSendv (ch3_isendv.c:187)
> ==58800==    by 0x5D1BBA: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:632)
> ==58800==    by 0x42E09E: MPID_Isend (mpid_isend.c:220)
> ==58800==    by 0x40C1B3: PMPI_Isend (isend.c:122)
> ==58800==    by 0x406E85: main (osu_bw.c:242)
> ==58800==  Address 0x2d00200000 is not stack'd, malloc'd or (recently)
> free'd
> ==58800==
> [k20-0001:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
> (signal 11)
> [k20-0001:mpi_rank_0][print_backtrace]   0: ./osu_bw() [0x4b65a2]
> [k20-0001:mpi_rank_0][print_backtrace]   1: ./osu_bw() [0x4b66de]
> [k20-0001:mpi_rank_0][print_backtrace]   2: /lib64/libpthread.so.0()
> [0x38b7a0f4a0]
> [k20-0001:mpi_rank_0][print_backtrace]   3:
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so(_vgrZU_libcZdsoZa_memcpy+0x160)
> [0x4a08020]
> [k20-0001:mpi_rank_0][print_backtrace]   4: ./osu_bw() [0x4452d7]
> [k20-0001:mpi_rank_0][print_backtrace]   5: ./osu_bw() [0x44d41f]
> [k20-0001:mpi_rank_0][print_backtrace]   6: ./osu_bw() [0x5da885]
> [k20-0001:mpi_rank_0][print_backtrace]   7: ./osu_bw() [0x5dac3a]
> [k20-0001:mpi_rank_0][print_backtrace]   8: ./osu_bw() [0x5d1bbb]
> [k20-0001:mpi_rank_0][print_backtrace]   9: ./osu_bw() [0x42e09f]
> [k20-0001:mpi_rank_0][print_backtrace]  10: ./osu_bw() [0x40c1b4]
> [k20-0001:mpi_rank_0][print_backtrace]  11: ./osu_bw() [0x406e86]
> [k20-0001:mpi_rank_0][print_backtrace]  12:
> /lib64/libc.so.6(__libc_start_main+0xfd) [0x38b6e1ecdd]
> [k20-0001:mpi_rank_0][print_backtrace]  13: ./osu_bw() [0x4066a9]
>
> Any suggestions would be greatly appreciated.
>
> Christian
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information.  Any unauthorized review, use, disclosure or distribution
> is prohibited.  If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>




More information about the mvapich-discuss mailing list