[mvapich-discuss] MVAPICH-0.9.9 & ConnectX problems

Andrey Slepuhin andrey.slepuhin at t-platforms.ru
Thu Jun 21 14:40:29 EDT 2007


Dear folks,

I have a test 4-node cluster using Intel Clovertown CPUs and new 
Mellanox ConnectX cards. While running NAS NPB benchmarks I see  that 
several tests silently hang occasionally.  I tried to use different 
compilers (GCC and Intel) and different optimization options to avoid 
miscompiling MVAPICH itself and the benchmarks, but the situation didn't 
change. I do not see such problems with OpenMPI however.  The cluster 
nodes configuration is:
Intel S5000PSL motherboard
2 x Intel Xeon 5345 2.33 GHz
8GB RAM
Mellanox MHGH-XTC card, firmware revision 2.0.158
Stock SLES10 without updates (but I also had the same problems with 
2.6.22-rc4 kernel from Rolan Dreier's git tree)
OFED-1.2.c-6 distribution from Mellanox

Do you have any ideas what can cause the program freeze? I will much 
appreciate any help.

Thanks in advance,
Andrey

P.S. BTW, what was the system configuration where 1.39usec latency for 
ConnectX was achieved? At the moment my best result with MVAPICH is 
1.67usec using one Mellanox switch...


More information about the mvapich-discuss mailing list