[mvapich-discuss] MVAPICH-0.9.9 & ConnectX problems

Sayantan Sur surs at cse.ohio-state.edu
Fri Jun 22 11:08:25 EDT 2007


Hi Andrey,

Andrey Slepuhin wrote:
> The problem was seen even with 4 processes. BTW, my firmware is 
> 2.0.158, not 2.0.156.

Which benchmark do you see hanging most often? Also if you could let us 
know the class of the test, it will be great. Are you running in block 
distribution or cyclic?

Thanks,
Sayantan.

>
> Thanks,
> Andrey
>
> Sayantan Sur wrote:
>> Hello Andrey,
>>
>> Thanks for your email. A couple of months back we had seen some 
>> erratic behavior, but of late using the 2.0.156 firmware I haven't 
>> noticed any hangs for 30-40 runs. How many processes are you running?
>>
>> The platform description is given in:
>>
>> http://mvapich.cse.ohio-state.edu/performance/mvapich/em64t/MVAPICH-em64t-gen2-ConnectX.shtml 
>>
>>
>> Thanks,
>> Sayantan.
>>
>> Andrey Slepuhin wrote:
>>> Dear folks,
>>>
>>> I have a test 4-node cluster using Intel Clovertown CPUs and new 
>>> Mellanox ConnectX cards. While running NAS NPB benchmarks I see  
>>> that several tests silently hang occasionally.  I tried to use 
>>> different compilers (GCC and Intel) and different optimization 
>>> options to avoid miscompiling MVAPICH itself and the benchmarks, but 
>>> the situation didn't change. I do not see such problems with OpenMPI 
>>> however.  The cluster nodes configuration is:
>>> Intel S5000PSL motherboard
>>> 2 x Intel Xeon 5345 2.33 GHz
>>> 8GB RAM
>>> Mellanox MHGH-XTC card, firmware revision 2.0.158
>>> Stock SLES10 without updates (but I also had the same problems with 
>>> 2.6.22-rc4 kernel from Rolan Dreier's git tree)
>>> OFED-1.2.c-6 distribution from Mellanox
>>>
>>> Do you have any ideas what can cause the program freeze? I will much 
>>> appreciate any help.
>>>
>>> Thanks in advance,
>>> Andrey
>>>
>>> P.S. BTW, what was the system configuration where 1.39usec latency 
>>> for ConnectX was achieved? At the moment my best result with MVAPICH 
>>> is 1.67usec using one Mellanox switch...
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>


-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list