[mvapich-discuss] Deadlock with CUDA and InfiniBand

Hari Subramoni subramoni.1 at osu.edu
Thu Sep 11 09:35:21 EDT 2014


Hi Freddie,

Thanks for the details. I understand the issue now. I do not think QLogic
HCAs have the proper support for the rdma fast path feature in MVAPICH2.
This could be the reason why you saw the hang with that feature enabled.
And yes - for QLogic HCA's you should be building MVAPICH2 with ch3:psm for
best performance and functionality.

Regards,
Hari.

On Thu, Sep 11, 2014 at 5:03 AM, Witherden, Freddie <
freddie.witherden08 at imperial.ac.uk> wrote:

> Hi Hari,
>
> > This is a little strange. We have not encountered this issue before.
> Could you please
> >  let us know which version of MVAPICH2 you are using and with what
> configure / run time options?
> >
> > We recently released MVAPICH2-2.0. Could you please try with that and
> see if the
> > same issue exists there as well? You can download MVPAICH2-2.0 from the
> following site.
>
> [freddie at mystery-cluster-head local]$ ./bin/mpiname -a
> MVAPICH2 2.0 Fri Jun 20 20:00:00 EDT 2014 ch3:mrail
>
> Compilation
> CC: gcc    -DNDEBUG -DNVALGRIND -O2
> CXX: g++   -DNDEBUG -DNVALGRIND
> F77: gfortran -L/usr/lib64 -L/lib -L/lib   -O2
> FC: gfortran
>
> Configuration
> --prefix=/home/freddie/local --with-ib-libpath=/usr/lib64
> --with-ib-include=/usr/include
>
> which I built from source myself on a cluster running Rocks 6.1.1.  I am
> unsure what the command is to dump the runtime variables although the only
> MV2_* variable I interact with is MV2_USE_RDMA_FAST_PATH=0.  The node list
> comes from SGE.
>
> At runtime I get a warning:
>
>   WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
> without InfiniBand registration cache support.
>
> although this is to be expected given that with Python + mpi4py we are
> loading MPI very late in the game.  I am also told that I should be using
> ch3:psm with my QLogic HCAs as it will perform better.
>
> Regards, Freddie.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140911/cdbf44a2/attachment-0001.html>


More information about the mvapich-discuss mailing list