[mvapich-discuss] 1.7 + OFA-IB-CH3 = hang

Devendar Bureddy bureddy at cse.ohio-state.edu
Tue Oct 25 15:36:25 EDT 2011


Hi Eric

Thanks for reporting it. We never encounter this issue in our in-house
testing.  We will get back to you after doing some more investigation.

Thanks
Devendar

On Tue, Oct 25, 2011 at 3:02 PM, Eric A. Borisch <eborisch at ieee.org> wrote:

> Good afternoon, all.
>
> I have a small rocks 5.4 cluster running OFED 1.5.1 with dual-port SDR
> cards on all nodes.
>
> * I've been running 1.6 in this configuration (ch3:mrail) for months
> with no issues
> * 1.7 fails (hangs) on a number of the micro-benchmarks
>  -> Setting MV2_USE_SRQ=0 makes things work again...
> * Running 1.7 under h3:nemesis:ib works, but only with one rail
>  -> Setting MV2_NUM_PORTS=2 causes a crash with a floating point
> exception; a more graceful 'unsupported config' would be nice, but is
> not the topic of this e-mail
>
> The outputs from 'mpiname -a' are at the bottom of this message.
>
> Micro-benchmark results:
>
> These run:
> osu_alltoall
> osu_bcast
> osu_latency
> osu_latency_mt
> osu_multi_lat
>
> These hang:
> osu_acc_latency
> osu_bibw
> osu_bw
> osu_get_bw
> osu_get_latency
> osu_mbw_mr
> osu_passive_acc_latency
> osu_passive_get_bw
> osu_passive_get_latency
> osu_passive_put_bw
> osu_passive_put_latency
> osu_put_bibw
> osu_put_bw
> osu_put_latency
>
> All of these run just fine under 1.6 or 1.7 (1.7 if and only if
> MV2_USE_SRQ=0) with both MV2_NUM_PORTS=1 or 2 (and performance
> indicates both rails are running.)
>
> It's hanging inside MPIDI_CH3I_Progress() when it hangs.
>
> Any suggestions? Any reason I should have to turn of the shared
> receive queue on 1.7?
>
> Thanks,
>  Eric
>
>
> mpiname -a outputs:
>
> 1.6:
> MVAPICH2 1.6 2011-03-09 ch3:mrail
>
> Compilation
> CC: gcc44 -fopenmp -march=native -pipe -O2 -DNDEBUG -O2
> CXX: g++44 -fopenmp -march=native -pipe -O2 -DNDEBUG -O2
> F77: gfortran  -DNDEBUG
> F90: gfortran  -DNDEBUG
>
> Configuration
> --prefix=/share/apps/mpi/mvapich2/1.6 --with-rdma=gen2 --with-mpe
> --disable-f77 --disable-f90 --with-pm=hydra:mpd --enable-fast
> --enable-cache --without-hwloc
>
> ------------------
>
> 1.7:
> MVAPICH2 1.7 Thu Oct 13 17:31:44 EDT 2011 ch3:mrail
>
> Compilation
> CC: gcc44 -fopenmp -march=native -pipe -O2   -DNDEBUG -DNVALGRIND -O2
> CXX: g++44 -fopenmp -march=native -pipe -O2  -DNDEBUG -DNVALGRIND -O2
> F77:
> FC:
>
> Configuration
> --prefix=/share/apps/mpi/mvapich2/1.7_ch3_gcc44 --with-rdma=gen2
> --with-mpe --disable-f77 --disable-fc --with-pm=hydra:mpd
> --enable-fast --enable-cache --without-hwloc
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20111025/85576d6c/attachment.html


More information about the mvapich-discuss mailing list