[mvapich-discuss] MVAPICH-0.9.9 & ConnectX problems

Andrey Slepuhin andrey.slepuhin at t-platforms.ru
Fri Jun 22 01:48:24 EDT 2007


The problem was seen even with 4 processes. BTW, my firmware is 2.0.158, 
not 2.0.156.

Thanks,
Andrey

Sayantan Sur wrote:
> Hello Andrey,
>
> Thanks for your email. A couple of months back we had seen some 
> erratic behavior, but of late using the 2.0.156 firmware I haven't 
> noticed any hangs for 30-40 runs. How many processes are you running?
>
> The platform description is given in:
>
> http://mvapich.cse.ohio-state.edu/performance/mvapich/em64t/MVAPICH-em64t-gen2-ConnectX.shtml 
>
>
> Thanks,
> Sayantan.
>
> Andrey Slepuhin wrote:
>> Dear folks,
>>
>> I have a test 4-node cluster using Intel Clovertown CPUs and new 
>> Mellanox ConnectX cards. While running NAS NPB benchmarks I see  that 
>> several tests silently hang occasionally.  I tried to use different 
>> compilers (GCC and Intel) and different optimization options to avoid 
>> miscompiling MVAPICH itself and the benchmarks, but the situation 
>> didn't change. I do not see such problems with OpenMPI however.  The 
>> cluster nodes configuration is:
>> Intel S5000PSL motherboard
>> 2 x Intel Xeon 5345 2.33 GHz
>> 8GB RAM
>> Mellanox MHGH-XTC card, firmware revision 2.0.158
>> Stock SLES10 without updates (but I also had the same problems with 
>> 2.6.22-rc4 kernel from Rolan Dreier's git tree)
>> OFED-1.2.c-6 distribution from Mellanox
>>
>> Do you have any ideas what can cause the program freeze? I will much 
>> appreciate any help.
>>
>> Thanks in advance,
>> Andrey
>>
>> P.S. BTW, what was the system configuration where 1.39usec latency 
>> for ConnectX was achieved? At the moment my best result with MVAPICH 
>> is 1.67usec using one Mellanox switch...
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>


More information about the mvapich-discuss mailing list