[mvapich-discuss] Fw: run mvapich for GPU communication

Mon Aug 12 22:43:58 EDT 2013

To make more details,

I use cuda5.0, the latest mvapich1.9

configure by ./configure --prefix=/opt/mvapich2-1.9-gnu --enable-shared --enable-cuda --with-cuda=/home/liluo/lib/cuda_5.0 --disable-mcast

It runs well for osu_alltoallv :

[liluo at gpu2 osu_benchmarks]$ mpirun_rsh -np 2 gpu1-ib gpu2-ib MV2_USE_CUDA=1 get_local_rank ./osu_alltoallv D D
# OSU MPI All-to-Allv Personalized Exchange Latency Test
# Size         Avg Latency(us)
1                         4.56
2                         4.63
4                         4.59
8                         4.58
16                        4.58
32                        4.66
64                        6.36
128                       6.66
256                       7.24
512                       7.98
1024                      9.53
2048                     12.53
4096                     17.48
8192                     26.35
16384                    43.49
32768                    85.12
65536                   140.24
131072                  250.05
262144                  483.49
524288                  932.46
1048576                1866.31

A related issue can be found at http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-November/004119.html

It seems my cudaMemcpy failure is related to buffer detection.

Please see my problem in my last email below.

Thanks. 

-----Original Messages-----
From: li.luo at siat.ac.cn
Sent Time: Tuesday, August 13, 2013
To: mvapich at cse.ohio-state.edu
Cc:
Subject: run mvapich for GPU communication

Hi,

I want to use MPI_Alltoallv to communicate with 2 GPU cards by running:

mpirun_rsh -np 2 -hostfile hosts MV2_USE_CUDA=1  ./ex1 ...

It 's weird that when I use nvcc with debug options such as -g -G to compile, the program runs right.

But if I use nvcc with -O to compile, then it fails and returns

[gpu2:mpi_rank_1][cuda_stage_alloc_v] cudaMemcpy failed with 4 at 2020
[gpu1:mpi_rank_0][cuda_stage_alloc_v] cudaMemcpy failed with 4 at 2020
[gpu2:mpispawn_1][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[gpu2:mpispawn_1][mtpmi_processops] Error while reading PMI socket. MPI process died?
[gpu2:mpispawn_1][child_handler] MPI process (rank: 1, pid: 16996) exited with status 1
[gpu1:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[gpu1:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
make: *** [runex1_mvapich] Error 1

--
Li Luo
Shenzhen Institutes of Advanced Technology
Address: 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, P.R.China
Tel: +86-755-86392312£¬+86-15899753087
Email: li.luo at siat.ac.cn

--
Li Luo
Shenzhen Institutes of Advanced Technology
Address: 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, P.R.China
Tel: +86-755-86392312£¬+86-15899753087
Email: li.luo at siat.ac.cn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130813/8940f786/attachment-0001.html