[mvapich-discuss] problems with MPI + GPU

Dhabaleswar Panda panda at cse.ohio-state.edu
Wed Jan 9 00:11:16 EST 2008


Hi Brian,

> Hi all -
>
> Sorry for all the traffic, but I'm getting very close to being able to
> reliably run my application with mvapich2.

Good to know this.

> The problem I am having now is with GPUs.   I am running an application
> which uses GPUs and the CUDA programming environment to accelerate
> computation.  It's exciting stuff, and depending on the problem, I see 2 to
> 6x speedup (I am running a ray tracing type application).  Everything works
> if I run without MPI, but if I run with mvapich2, my GPU initialization
> fails about 75% of the time, making my runs quite unreliable.  In the 25%
> when the device initializes, everything else works fine.

Unfortunately, we have not tested MVAPICH2 + IB (OFED) + GPU (with CUDA).
If anybody else in this list has experience in running MVAPICH2 in this
mode, they can indicate their experience.

You can also post a note regarding this to the OFED general list.

> Now, I'm not sure what could possibly cause this, and I could see this
> problem cropping up due to any of the following factors:
>
> 1) bug in mvapich2
> 2) bug in CUDA
> 3) bug in OFED IB stuff
>
> Does anyone have any ideas how to even begin tracking this down?  Could it
> be something like infiniband device initialization walking into NVIDIA's
> memory space?
>   I'm grasping at straws here ;)

Can you run basic MPICH2 (from Argonne) with Ethernet + GPU (with CUDA)?
This will isolate IB-specific issues with IB/OFED and provide more
insights to this problem.

Thanks,

DK

> Thanks,
>   Brian
>



More information about the mvapich-discuss mailing list