[mvapich-discuss] Deadlock with CUDA and InfiniBand
Witherden, Freddie
freddie.witherden08 at imperial.ac.uk
Thu Sep 11 06:03:34 EDT 2014
Hi Hari,
> This is a little strange. We have not encountered this issue before. Could you please
> let us know which version of MVAPICH2 you are using and with what configure / run time options?
>
> We recently released MVAPICH2-2.0. Could you please try with that and see if the
> same issue exists there as well? You can download MVPAICH2-2.0 from the following site.
[freddie at mystery-cluster-head local]$ ./bin/mpiname -a
MVAPICH2 2.0 Fri Jun 20 20:00:00 EDT 2014 ch3:mrail
Compilation
CC: gcc -DNDEBUG -DNVALGRIND -O2
CXX: g++ -DNDEBUG -DNVALGRIND
F77: gfortran -L/usr/lib64 -L/lib -L/lib -O2
FC: gfortran
Configuration
--prefix=/home/freddie/local --with-ib-libpath=/usr/lib64 --with-ib-include=/usr/include
which I built from source myself on a cluster running Rocks 6.1.1. I am unsure what the command is to dump the runtime variables although the only MV2_* variable I interact with is MV2_USE_RDMA_FAST_PATH=0. The node list comes from SGE.
At runtime I get a warning:
WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support.
although this is to be expected given that with Python + mpi4py we are loading MPI very late in the game. I am also told that I should be using ch3:psm with my QLogic HCAs as it will perform better.
Regards, Freddie.
More information about the mvapich-discuss
mailing list