[mvapich-discuss] problems with MV2_USE_CUDA=1

Fri Jun 29 10:02:24 EDT 2012

Hi,

Thanks for your post.

MVAPICH2 depends on UVA to detect whether a buffer used in MPI calls is on
the host or on the GPU and acts appropriately. If UVA is not available,
there is no way for it do differentiate between the host and device buffers
internally.

As long as host buffers are used in MPI communication, you do not have to
use the MV2_USE_CUDA=1 flag and MVAPICH2 should work fine with your GPU
applications.

Hope this helps. Let me know if you have further questions.

Regards
Sreeram Potluri

Summing it up, if UVA is not available, he cannot use MPI calls over GPU
buffers. This should be true with OpenMPI as well.

On Thu, Jun 28, 2012 at 12:54 PM, Igor Podladtchikov <
igor.podladtchikov at spectraseis.com> wrote:

>  Hi,
>
> I downloaded the latest mvapich version about two weeks ago and I'm having
> trouble using the CUDA stuff.
>
> I installed on a stand-alone node with 4 Tesla C1060's, and tried running
> the benchmarks, which error out.
> $ is the command and > the shell output:
>
> $ mpirun_rsh -np 2 guppy guppy MV2_USE_CUDA=1 ./osu_bw D D
> > [guppy:mpispawn_0][child_handler] MPI process (rank: 0, pid: 17710)
> exited with status 1
>
> I know C1060's don't support UVA but I kind of expect mvapich to resort to
> "regular" communication if the GPU doesn't support it.. The final goal is
> to install mvapich on our cluster with M2070's used for production but I
> kind of need a proof of concept first.
>
> I isolated the problem with dummy code:
>
> #include <stdio.h>
> #include "mpi.h"
>
> int main(int argc, char** argv){
>   // init mpi
>   MPI_Init(&argc, &argv);
>   int rank, size, len;
>   char name[1024];
>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>   MPI_Get_processor_name(name, &len);
>   // say hi
>   printf("%s: rank %d size %d\n", name, rank, size);
>   // finalize
>   MPI_Finalize();
>   return 0;
> }
>
>
> I compiled the code like this:
>
> mpicc mvapich_test.c -o mvtest
>
> And ran it like this:
>
> $ mpirun_rsh -np 2 guppy guppy ./mvtest
> > guppy: rank 1 size 2
> > guppy: rank 0 size 2
>
> So far so good, right?
>
> Then I add MV2_USE_CUDA=1 to my launch command:
>
> $ mpirun_rsh -np 2 guppy guppy MV2_USE_CUDA=1 ./mvtest
> > [cli_0]: [cli_1]: aborting job:
> > Fatal error in MPI_Init:
> > Other MPI error
> >
> > aborting job:
> > Fatal error in MPI_Init:
> > Other MPI error
> >
> > [guppy:mpispawn_0][readline] Unexpected End-Of-File on file descriptor
> 7. MPI process died?
> > [guppy:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
> process died?
> > [guppy:mpispawn_0][child_handler] MPI process (rank: 1, pid: 17755)
> exited with status 1
> > [guppy:mpispawn_0][child_handler] MPI process (rank: 0, pid: 17754)
> exited with status 1
>
>
> So I'm not doing anything with the GPU's yet, but if I understand
> correctly your MPI_Init implementation attempts to create a context on the
> gpu and fails for some reason?
> All my other CUDA apps run fine on this node, including mpi based gpu
> solvers. I can even run them with your mpirun.
>
> The only full example I was able to find on how to use your MV2_USE_CUDA=1
> was here:
> http://cudamusing.blogspot.com/
> and his stuff just works, so that doesn't help..
>
> I really hope this is something simple and I'm just plain stupid. I read
> your user guide, including the FAQ and Troubleshooting section, tried this
> and that for about a week, I hope you can give me some clues.
>
> Here's some system info:
>
> $ cat /etc/*release*
> > CentOS release 5.5 (Final)
>
> $ uname -a:
> > Linux guppy 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:54:20 EST 2010
> x86_64 x86_64 x86_64 GNU/Linux
>
> $ mpiname -a:
> > MVAPICH2 1.8 Mon Apr 30 14:56:40 EDT 2012 ch3:mrail
> >
> > Compilation
> > CC: gcc    -DNDEBUG -DNVALGRIND -O2
> > CXX: c++   -DNDEBUG -DNVALGRIND -O2
> > F77:
> > FC:
> >
> > Configuration
> > --enable-cuda --with-cuda-include=/usr/local/cuda/include
> --with-cuda-libpath=/usr/local/cuda/lib64 --enable-shared --disable-f77
> --disable-fc --without-hwloc
>
> $ cudaquery
> > Using cuda version 4020 (Driver API v2)
> > Using cuda runtime version 4020 (Runtime API v2)
> > Found 4 devices.
> > Tesla C1060 (id 0 cc 1.3) : 4294770688 bytes (4.000 GB)
> > Tesla C1060 (id 1 cc 1.3) : 4294770688 bytes (4.000 GB)
> > Tesla C1060 (id 2 cc 1.3) : 4294770688 bytes (4.000 GB)
> > Tesla C1060 (id 3 cc 1.3) : 4294770688 bytes (4.000 GB)
>
> $ ll /usr/lib64/libcuda*
> > lrwxrwxrwx 1 root root      12 Jun 22 14:42 /usr/lib64/libcuda.so ->
> libcuda.so.1
> > lrwxrwxrwx 1 root root      17 Jun 22 14:42 /usr/lib64/libcuda.so.1 ->
> libcuda.so.295.41
> > -rwxr-xr-x 1 root root 8612596 Jun 22 14:42 /usr/lib64/libcuda.so.295.41
>
> $ ll /usr/local/cuda/lib64
> > lrwxrwxrwx 1 root root        14 Jun 26 17:13 libcublas.so ->
> libcublas.so.4
> > lrwxrwxrwx 1 root root        18 Jun 26 17:13 libcublas.so.4 ->
> libcublas.so.4.2.9
> > -rwxr-xr-x 1 root root 109211936 Jun 26 17:13 libcublas.so.4.2.9
> > lrwxrwxrwx 1 root root        14 Jun 26 17:13 libcudart.so ->
> libcudart.so.4
> > lrwxrwxrwx 1 root root        18 Jun 26 17:13 libcudart.so.4 ->
> libcudart.so.4.2.9
> > -rwxr-xr-x 1 root root    369600 Jun 26 17:13 libcudart.so.4.2.9
> > lrwxrwxrwx 1 root root        13 Jun 26 17:13 libcufft.so ->
> libcufft.so.4
> > lrwxrwxrwx 1 root root        17 Jun 26 17:13 libcufft.so.4 ->
> libcufft.so.4.2.9
> > -rwxr-xr-x 1 root root  31161488 Jun 26 17:13 libcufft.so.4.2.9
> > lrwxrwxrwx 1 root root        13 Jun 26 17:13 libcuinj.so ->
> libcuinj.so.4
> > lrwxrwxrwx 1 root root        17 Jun 26 17:13 libcuinj.so.4 ->
> libcuinj.so.4.2.9
> > -rwxr-xr-x 1 root root    150480 Jun 26 17:13 libcuinj.so.4.2.9
> > lrwxrwxrwx 1 root root        14 Jun 26 17:13 libcurand.so ->
> libcurand.so.4
> > lrwxrwxrwx 1 root root        18 Jun 26 17:13 libcurand.so.4 ->
> libcurand.so.4.2.9
> > -rwxr-xr-x 1 root root  27315384 Jun 26 17:13 libcurand.so.4.2.9
> > lrwxrwxrwx 1 root root        16 Jun 26 17:13 libcusparse.so ->
> libcusparse.so.4
> > lrwxrwxrwx 1 root root        20 Jun 26 17:13 libcusparse.so.4 ->
> libcusparse.so.4.2.9
> > -rwxr-xr-x 1 root root 195959968 Jun 26 17:13 libcusparse.so.4.2.9
> > lrwxrwxrwx 1 root root        11 Jun 26 17:13 libnpp.so -> libnpp.so.4
> > lrwxrwxrwx 1 root root        15 Jun 26 17:13 libnpp.so.4 ->
> libnpp.so.4.2.9
> > -rwxr-xr-x 1 root root  55095288 Jun 26 17:13 libnpp.so.4.2.9
>
> $ nvidia-smi
> Thu Jun 28 10:37:37 2012
> +------------------------------------------------------+
> | NVIDIA-SMI 3.295.41   Driver Version: 295.41         |
>
> |-------------------------------+----------------------+----------------------+
> | Nb.  Name                     | Bus Id        Disp.  | Volatile ECC SB /
> DB |
> | Fan   Temp   Power Usage /Cap | Memory Usage         | GPU Util. Compute
> M. |
>
> |===============================+======================+======================|
> | 0.  Tesla C1060               | 0000:02:00.0  Off    |       N/A
> N/A |
> |  35%   69 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E.
> Thread  |
>
> |-------------------------------+----------------------+----------------------|
> | 1.  Tesla C1060               | 0000:03:00.0  Off    |       N/A
> N/A |
> |  35%   53 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E.
> Thread  |
>
> |-------------------------------+----------------------+----------------------|
> | 2.  Tesla C1060               | 0000:83:00.0  Off    |       N/A
> N/A |
> |  35%   61 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E.
> Thread  |
>
> |-------------------------------+----------------------+----------------------|
> | 3.  Tesla C1060               | 0000:84:00.0  Off    |       N/A
> N/A |
> |  35%   63 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E.
> Thread  |
>
> |-------------------------------+----------------------+----------------------|
> | Compute processes:                                               GPU
> Memory |
> |  GPU  PID     Process name
> Usage      |
>
> |=============================================================================|
> |  No running compute processes
> found                                         |
>
> +-----------------------------------------------------------------------------+
>
> Looking forward to your reply!
>
> Cheers
>
> Igor Podladtchikov
> Spectraseis
> 1899 Wynkoop St, Suite 350
> Denver, CO 80202
> Tel. +1 303 658 9172 (direct)
> Tel. +1 303 330 8296 (cell)
> www.spectraseis.com
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120629/d7dcd178/attachment-0001.html