[mvapich-discuss] Deadlock with CUDA and InfiniBand

Witherden, Freddie freddie.witherden08 at imperial.ac.uk
Thu Sep 11 06:03:34 EDT 2014


Hi Hari,

> This is a little strange. We have not encountered this issue before. Could you please
>  let us know which version of MVAPICH2 you are using and with what configure / run time options?
>
> We recently released MVAPICH2-2.0. Could you please try with that and see if the 
> same issue exists there as well? You can download MVPAICH2-2.0 from the following site.

[freddie at mystery-cluster-head local]$ ./bin/mpiname -a
MVAPICH2 2.0 Fri Jun 20 20:00:00 EDT 2014 ch3:mrail

Compilation
CC: gcc    -DNDEBUG -DNVALGRIND -O2
CXX: g++   -DNDEBUG -DNVALGRIND
F77: gfortran -L/usr/lib64 -L/lib -L/lib   -O2
FC: gfortran  

Configuration
--prefix=/home/freddie/local --with-ib-libpath=/usr/lib64 --with-ib-include=/usr/include

which I built from source myself on a cluster running Rocks 6.1.1.  I am unsure what the command is to dump the runtime variables although the only MV2_* variable I interact with is MV2_USE_RDMA_FAST_PATH=0.  The node list comes from SGE.

At runtime I get a warning:

  WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support.

although this is to be expected given that with Python + mpi4py we are loading MPI very late in the game.  I am also told that I should be using ch3:psm with my QLogic HCAs as it will perform better.

Regards, Freddie.


More information about the mvapich-discuss mailing list