[mvapich-discuss] problems with MV2_USE_CUDA=1

Thu Jun 28 12:54:16 EDT 2012

Hi,

I downloaded the latest mvapich version about two weeks ago and I'm having trouble using the CUDA stuff.

I installed on a stand-alone node with 4 Tesla C1060's, and tried running the benchmarks, which error out.
$ is the command and > the shell output:

$ mpirun_rsh -np 2 guppy guppy MV2_USE_CUDA=1 ./osu_bw D D
> [guppy:mpispawn_0][child_handler] MPI process (rank: 0, pid: 17710) exited with status 1

I know C1060's don't support UVA but I kind of expect mvapich to resort to "regular" communication if the GPU doesn't support it.. The final goal is to install mvapich on our cluster with M2070's used for production but I kind of need a proof of concept first.

I isolated the problem with dummy code:

#include <stdio.h>
#include "mpi.h"

int main(int argc, char** argv){
  // init mpi
  MPI_Init(&argc, &argv);
  int rank, size, len;
  char name[1024];
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(name, &len);
  // say hi
  printf("%s: rank %d size %d\n", name, rank, size);
  // finalize
  MPI_Finalize();
  return 0;
}

I compiled the code like this:

mpicc mvapich_test.c -o mvtest

And ran it like this:

$ mpirun_rsh -np 2 guppy guppy ./mvtest
> guppy: rank 1 size 2
> guppy: rank 0 size 2

So far so good, right?

Then I add MV2_USE_CUDA=1 to my launch command:

$ mpirun_rsh -np 2 guppy guppy MV2_USE_CUDA=1 ./mvtest
> [cli_0]: [cli_1]: aborting job:
> Fatal error in MPI_Init:
> Other MPI error
>
> aborting job:
> Fatal error in MPI_Init:
> Other MPI error
>
> [guppy:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 7. MPI process died?
> [guppy:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
> [guppy:mpispawn_0][child_handler] MPI process (rank: 1, pid: 17755) exited with status 1
> [guppy:mpispawn_0][child_handler] MPI process (rank: 0, pid: 17754) exited with status 1

So I'm not doing anything with the GPU's yet, but if I understand correctly your MPI_Init implementation attempts to create a context on the gpu and fails for some reason?
All my other CUDA apps run fine on this node, including mpi based gpu solvers. I can even run them with your mpirun.

The only full example I was able to find on how to use your MV2_USE_CUDA=1 was here:
http://cudamusing.blogspot.com/
and his stuff just works, so that doesn't help..

I really hope this is something simple and I'm just plain stupid. I read your user guide, including the FAQ and Troubleshooting section, tried this and that for about a week, I hope you can give me some clues.

Here's some system info:

$ cat /etc/*release*
> CentOS release 5.5 (Final)

$ uname -a:
> Linux guppy 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:54:20 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

$ mpiname -a:
> MVAPICH2 1.8 Mon Apr 30 14:56:40 EDT 2012 ch3:mrail
>
> Compilation
> CC: gcc    -DNDEBUG -DNVALGRIND -O2
> CXX: c++   -DNDEBUG -DNVALGRIND -O2
> F77:
> FC:
>
> Configuration
> --enable-cuda --with-cuda-include=/usr/local/cuda/include --with-cuda-libpath=/usr/local/cuda/lib64 --enable-shared --disable-f77 --disable-fc --without-hwloc

$ cudaquery
> Using cuda version 4020 (Driver API v2)
> Using cuda runtime version 4020 (Runtime API v2)
> Found 4 devices.
> Tesla C1060 (id 0 cc 1.3) : 4294770688 bytes (4.000 GB)
> Tesla C1060 (id 1 cc 1.3) : 4294770688 bytes (4.000 GB)
> Tesla C1060 (id 2 cc 1.3) : 4294770688 bytes (4.000 GB)
> Tesla C1060 (id 3 cc 1.3) : 4294770688 bytes (4.000 GB)

$ ll /usr/lib64/libcuda*
> lrwxrwxrwx 1 root root      12 Jun 22 14:42 /usr/lib64/libcuda.so -> libcuda.so.1
> lrwxrwxrwx 1 root root      17 Jun 22 14:42 /usr/lib64/libcuda.so.1 -> libcuda.so.295.41
> -rwxr-xr-x 1 root root 8612596 Jun 22 14:42 /usr/lib64/libcuda.so.295.41

$ ll /usr/local/cuda/lib64
> lrwxrwxrwx 1 root root        14 Jun 26 17:13 libcublas.so -> libcublas.so.4
> lrwxrwxrwx 1 root root        18 Jun 26 17:13 libcublas.so.4 -> libcublas.so.4.2.9
> -rwxr-xr-x 1 root root 109211936 Jun 26 17:13 libcublas.so.4.2.9
> lrwxrwxrwx 1 root root        14 Jun 26 17:13 libcudart.so -> libcudart.so.4
> lrwxrwxrwx 1 root root        18 Jun 26 17:13 libcudart.so.4 -> libcudart.so.4.2.9
> -rwxr-xr-x 1 root root    369600 Jun 26 17:13 libcudart.so.4.2.9
> lrwxrwxrwx 1 root root        13 Jun 26 17:13 libcufft.so -> libcufft.so.4
> lrwxrwxrwx 1 root root        17 Jun 26 17:13 libcufft.so.4 -> libcufft.so.4.2.9
> -rwxr-xr-x 1 root root  31161488 Jun 26 17:13 libcufft.so.4.2.9
> lrwxrwxrwx 1 root root        13 Jun 26 17:13 libcuinj.so -> libcuinj.so.4
> lrwxrwxrwx 1 root root        17 Jun 26 17:13 libcuinj.so.4 -> libcuinj.so.4.2.9
> -rwxr-xr-x 1 root root    150480 Jun 26 17:13 libcuinj.so.4.2.9
> lrwxrwxrwx 1 root root        14 Jun 26 17:13 libcurand.so -> libcurand.so.4
> lrwxrwxrwx 1 root root        18 Jun 26 17:13 libcurand.so.4 -> libcurand.so.4.2.9
> -rwxr-xr-x 1 root root  27315384 Jun 26 17:13 libcurand.so.4.2.9
> lrwxrwxrwx 1 root root        16 Jun 26 17:13 libcusparse.so -> libcusparse.so.4
> lrwxrwxrwx 1 root root        20 Jun 26 17:13 libcusparse.so.4 -> libcusparse.so.4.2.9
> -rwxr-xr-x 1 root root 195959968 Jun 26 17:13 libcusparse.so.4.2.9
> lrwxrwxrwx 1 root root        11 Jun 26 17:13 libnpp.so -> libnpp.so.4
> lrwxrwxrwx 1 root root        15 Jun 26 17:13 libnpp.so.4 -> libnpp.so.4.2.9
> -rwxr-xr-x 1 root root  55095288 Jun 26 17:13 libnpp.so.4.2.9

$ nvidia-smi
Thu Jun 28 10:37:37 2012
+------------------------------------------------------+
| NVIDIA-SMI 3.295.41   Driver Version: 295.41         |
|-------------------------------+----------------------+----------------------+
| Nb.  Name                     | Bus Id        Disp.  | Volatile ECC SB / DB |
| Fan   Temp   Power Usage /Cap | Memory Usage         | GPU Util. Compute M. |
|===============================+======================+======================|
| 0.  Tesla C1060               | 0000:02:00.0  Off    |       N/A        N/A |
|  35%   69 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E. Thread  |
|-------------------------------+----------------------+----------------------|
| 1.  Tesla C1060               | 0000:03:00.0  Off    |       N/A        N/A |
|  35%   53 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E. Thread  |
|-------------------------------+----------------------+----------------------|
| 2.  Tesla C1060               | 0000:83:00.0  Off    |       N/A        N/A |
|  35%   61 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E. Thread  |
|-------------------------------+----------------------+----------------------|
| 3.  Tesla C1060               | 0000:84:00.0  Off    |       N/A        N/A |
|  35%   63 C  P8    N/A /  N/A |   0%    3MB / 4095MB |    0%     E. Thread  |
|-------------------------------+----------------------+----------------------|
| Compute processes:                                               GPU Memory |
|  GPU  PID     Process name                                       Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+

Looking forward to your reply!

Cheers

Igor Podladtchikov
Spectraseis
1899 Wynkoop St, Suite 350
Denver, CO 80202
Tel. +1 303 658 9172 (direct)
Tel. +1 303 330 8296 (cell)
www.spectraseis.com<http://www.spectraseis.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120628/eb5eb738/attachment.html