[mvapich-discuss] MVAPICH-0.9.9 & ConnectX problems

Sayantan Sur surs at cse.ohio-state.edu
Thu Jun 21 21:51:18 EDT 2007


Hello Andrey,

Thanks for your email. A couple of months back we had seen some erratic 
behavior, but of late using the 2.0.156 firmware I haven't noticed any 
hangs for 30-40 runs. How many processes are you running?

The platform description is given in:

http://mvapich.cse.ohio-state.edu/performance/mvapich/em64t/MVAPICH-em64t-gen2-ConnectX.shtml

Thanks,
Sayantan.

Andrey Slepuhin wrote:
> Dear folks,
>
> I have a test 4-node cluster using Intel Clovertown CPUs and new 
> Mellanox ConnectX cards. While running NAS NPB benchmarks I see  that 
> several tests silently hang occasionally.  I tried to use different 
> compilers (GCC and Intel) and different optimization options to avoid 
> miscompiling MVAPICH itself and the benchmarks, but the situation 
> didn't change. I do not see such problems with OpenMPI however.  The 
> cluster nodes configuration is:
> Intel S5000PSL motherboard
> 2 x Intel Xeon 5345 2.33 GHz
> 8GB RAM
> Mellanox MHGH-XTC card, firmware revision 2.0.158
> Stock SLES10 without updates (but I also had the same problems with 
> 2.6.22-rc4 kernel from Rolan Dreier's git tree)
> OFED-1.2.c-6 distribution from Mellanox
>
> Do you have any ideas what can cause the program freeze? I will much 
> appreciate any help.
>
> Thanks in advance,
> Andrey
>
> P.S. BTW, what was the system configuration where 1.39usec latency for 
> ConnectX was achieved? At the moment my best result with MVAPICH is 
> 1.67usec using one Mellanox switch...
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list