[mvapich-discuss] problem with mvapich2.ofa + CUDA

Brian Budge brian.budge at gmail.com
Sun Mar 16 21:10:18 EDT 2008


Hi Matt -

Thanks for the suggestion.  It works!  I wonder if CUDA also tries to
override malloc or something...

Thanks again for the help.  I'll notify the appropriate CUDA people of
this behavior.

  Brian

On Thu, Mar 13, 2008 at 10:27 AM, Matthew Koop <koop at cse.ohio-state.edu> wrote:
> Brian,
>
>  You can try compiling MVAPICH2 with the -DDISABLE_PTMALLOC CFLAG in the
>  make.mvapich2.ofa script. It may be that our own malloc library is causing
>  problems with the cudaMalloc() that you are trying to use.
>
>  Let us know if this helps at all. Also, let us know if you have any
>  reproducers that we can look at as well.
>
>  Thanks,
>
>  Matt
>
>
>
>  On Tue, 11 Mar 2008, Brian Budge wrote:
>
>  > Hi all -
>  >
>  > I have an application which is using MPI over infiniband and which
>  > also uses CUDA on NVIDIA graphics cards.  The program can be
>  > configured with and without MPI (without limits the process to a
>  > single node), and which can run with GPUs and without GPUs.
>  >
>  > The problem appears when I am using MPI over IB and GPUs:
>  > Essentially, malloc of GPU memory fails in this configuration.  If I
>  > use mvapich2 over tcp this problem doesn't show up (even though I am
>  > running IP over IB).  Likewise, if I don't use GPUs, my program works
>  > fine.
>  >
>  > A bit more detail about the memory malloc failing:  The function is
>  > cudaMalloc(), available through the CUDA runtime libraries.  I can
>  > actually get these calls to succeed until a certain stage in my
>  > program, which happens to be after several dynamic libraries are
>  > opened via dlopen, and after spawning a thread.
>  >
>  > I am running mvapich2 in multithreaded mode, calling MPI_Init_thread()
>  > soon after main() is entered.
>  >
>  > I have tried some fairly minimal reproduction cases, and I can't seem
>  > to make them fail.  I may have to try something a bit more
>  > complicated.  However, in the meantime, can anyone suggest what might
>  > be broken?  Perhaps I've misconfigured mvapich with IB?
>  >
>  > I'm running gentoo linux with kernel version 2.6.24, and infiniband is
>  > built into the kernel along with the mellanox drivers (I have
>  > infinihost cards).
>  >
>  > Thanks for any suggestions,
>  >   Brian
>  > _______________________________________________
>  > mvapich-discuss mailing list
>  > mvapich-discuss at cse.ohio-state.edu
>  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>  >
>
>


More information about the mvapich-discuss mailing list