[mvapich-discuss] MVAPICH-0.9.9 & ConnectX problems
Sayantan Sur
surs at cse.ohio-state.edu
Thu Jun 21 21:51:18 EDT 2007
Hello Andrey,
Thanks for your email. A couple of months back we had seen some erratic
behavior, but of late using the 2.0.156 firmware I haven't noticed any
hangs for 30-40 runs. How many processes are you running?
The platform description is given in:
http://mvapich.cse.ohio-state.edu/performance/mvapich/em64t/MVAPICH-em64t-gen2-ConnectX.shtml
Thanks,
Sayantan.
Andrey Slepuhin wrote:
> Dear folks,
>
> I have a test 4-node cluster using Intel Clovertown CPUs and new
> Mellanox ConnectX cards. While running NAS NPB benchmarks I see that
> several tests silently hang occasionally. I tried to use different
> compilers (GCC and Intel) and different optimization options to avoid
> miscompiling MVAPICH itself and the benchmarks, but the situation
> didn't change. I do not see such problems with OpenMPI however. The
> cluster nodes configuration is:
> Intel S5000PSL motherboard
> 2 x Intel Xeon 5345 2.33 GHz
> 8GB RAM
> Mellanox MHGH-XTC card, firmware revision 2.0.158
> Stock SLES10 without updates (but I also had the same problems with
> 2.6.22-rc4 kernel from Rolan Dreier's git tree)
> OFED-1.2.c-6 distribution from Mellanox
>
> Do you have any ideas what can cause the program freeze? I will much
> appreciate any help.
>
> Thanks in advance,
> Andrey
>
> P.S. BTW, what was the system configuration where 1.39usec latency for
> ConnectX was achieved? At the moment my best result with MVAPICH is
> 1.67usec using one Mellanox switch...
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
http://www.cse.ohio-state.edu/~surs
More information about the mvapich-discuss
mailing list