[mvapich-discuss] 1.7 + OFA-IB-CH3 = hang

Eric A. Borisch eborisch at ieee.org
Tue Oct 25 15:02:31 EDT 2011


Good afternoon, all.

I have a small rocks 5.4 cluster running OFED 1.5.1 with dual-port SDR
cards on all nodes.

* I've been running 1.6 in this configuration (ch3:mrail) for months
with no issues
* 1.7 fails (hangs) on a number of the micro-benchmarks
 -> Setting MV2_USE_SRQ=0 makes things work again...
* Running 1.7 under h3:nemesis:ib works, but only with one rail
 -> Setting MV2_NUM_PORTS=2 causes a crash with a floating point
exception; a more graceful 'unsupported config' would be nice, but is
not the topic of this e-mail

The outputs from 'mpiname -a' are at the bottom of this message.

Micro-benchmark results:

These run:
osu_alltoall
osu_bcast
osu_latency
osu_latency_mt
osu_multi_lat

These hang:
osu_acc_latency
osu_bibw
osu_bw
osu_get_bw
osu_get_latency
osu_mbw_mr
osu_passive_acc_latency
osu_passive_get_bw
osu_passive_get_latency
osu_passive_put_bw
osu_passive_put_latency
osu_put_bibw
osu_put_bw
osu_put_latency

All of these run just fine under 1.6 or 1.7 (1.7 if and only if
MV2_USE_SRQ=0) with both MV2_NUM_PORTS=1 or 2 (and performance
indicates both rails are running.)

It's hanging inside MPIDI_CH3I_Progress() when it hangs.

Any suggestions? Any reason I should have to turn of the shared
receive queue on 1.7?

Thanks,
 Eric


mpiname -a outputs:

1.6:
MVAPICH2 1.6 2011-03-09 ch3:mrail

Compilation
CC: gcc44 -fopenmp -march=native -pipe -O2 -DNDEBUG -O2
CXX: g++44 -fopenmp -march=native -pipe -O2 -DNDEBUG -O2
F77: gfortran  -DNDEBUG
F90: gfortran  -DNDEBUG

Configuration
--prefix=/share/apps/mpi/mvapich2/1.6 --with-rdma=gen2 --with-mpe
--disable-f77 --disable-f90 --with-pm=hydra:mpd --enable-fast
--enable-cache --without-hwloc

------------------

1.7:
MVAPICH2 1.7 Thu Oct 13 17:31:44 EDT 2011 ch3:mrail

Compilation
CC: gcc44 -fopenmp -march=native -pipe -O2   -DNDEBUG -DNVALGRIND -O2
CXX: g++44 -fopenmp -march=native -pipe -O2  -DNDEBUG -DNVALGRIND -O2
F77:
FC:

Configuration
--prefix=/share/apps/mpi/mvapich2/1.7_ch3_gcc44 --with-rdma=gen2
--with-mpe --disable-f77 --disable-fc --with-pm=hydra:mpd
--enable-fast --enable-cache --without-hwloc


More information about the mvapich-discuss mailing list