[mvapich-discuss] problems with MPI + GPU

Choudhury, Durga Durga.Choudhury at drs-ss.com
Wed Jan 9 12:06:48 EST 2008


Brian

 

I would be very interested to know what, if any, solution you found to
this issue. Please post your findings to the list, or at list send it to
me individually.

 

Thank you.

 

Durga

 

________________________________

From: mvapich-discuss-bounces at cse.ohio-state.edu
[mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Brian
Budge
Sent: Wednesday, January 09, 2008 11:26 AM
To: Dhabaleswar Panda
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] problems with MPI + GPU

 

Hi DK -

I just rebuilt mvapich2 with tcp instead of ofa, and now my program
reliably executes.  I'll post something to the OFED list if I can find
it.

Thanks,
  Brian

On Jan 8, 2008 9:11 PM, Dhabaleswar Panda < panda at cse.ohio-state.edu>
wrote:

Hi Brian, 


> Hi all -
>
> Sorry for all the traffic, but I'm getting very close to being able to
> reliably run my application with mvapich2.

Good to know this.


> The problem I am having now is with GPUs.   I am running an
application
> which uses GPUs and the CUDA programming environment to accelerate
> computation.  It's exciting stuff, and depending on the problem, I see
2 to 
> 6x speedup (I am running a ray tracing type application).  Everything
works
> if I run without MPI, but if I run with mvapich2, my GPU
initialization
> fails about 75% of the time, making my runs quite unreliable.  In the
25% 
> when the device initializes, everything else works fine.

Unfortunately, we have not tested MVAPICH2 + IB (OFED) + GPU (with
CUDA).
If anybody else in this list has experience in running MVAPICH2 in this 
mode, they can indicate their experience.

You can also post a note regarding this to the OFED general list.


> Now, I'm not sure what could possibly cause this, and I could see this

> problem cropping up due to any of the following factors:
>
> 1) bug in mvapich2
> 2) bug in CUDA
> 3) bug in OFED IB stuff
>
> Does anyone have any ideas how to even begin tracking this down?
Could it 
> be something like infiniband device initialization walking into
NVIDIA's
> memory space?
>   I'm grasping at straws here ;)

Can you run basic MPICH2 (from Argonne) with Ethernet + GPU (with CUDA)?

This will isolate IB-specific issues with IB/OFED and provide more
insights to this problem.

Thanks,

DK

> Thanks,
>   Brian
>

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080109/2f240c81/attachment-0001.html


More information about the mvapich-discuss mailing list