[mvapich-discuss] MVAPICH-0.9.8 lockup with OFED-1.1-rc3

Pavel Shamis (Pasha) pasha at mellanox.co.il
Sun Sep 10 11:18:53 EDT 2006


Andrew,
Do you see the problem with any mpi application ?
I'm not sure but it looks like hostfile issue.


Sayantan Sur wrote:
> Hello Andrew,
> 
> Andrew Dobbie wrote:
> 
>> Hi Dr. Panda,
>>
>> I just tested using OFED-1.0 and I am getting the exact same results as
>> I did with 1.1-rc3 using gen2.  The systems I am using have 32bit RHEL4
>> update 3 installed.
>>
>> Is there anything else you would like me to try?
>>  
>>
> I am just wondering whether you see the same behavior with 64-bit RHEL4 
> using OFED-1.0/1.1?
> 
> Thanks,
> Sayantan.
> 
>> -Andrew
>>
>> On Fri, 2006-09-08 at 12:42 -0400, Dhabaleswar Panda wrote:
>>  
>>
>>> Andrew - Thanks for your note. Since OFED-1.1 is still being finalized
>>> and not released yet, we have not installed it on any of your systems
>>> yet.  We have tested MVAPICH 0.9.8 with OFED-1.0 and it works without
>>> any problem. On your system, do you see this problem with OFED-1.0 or
>>> only with OFED-1.1-rc3?
>>>
>>> Thanks,
>>> DK
>>>
>>>
>>>   
>>>> I'm not sure if this problem is caused by MVAPICH directly but I'm
>>>> certainly you will know what the cause is.
>>>>
>>>> I downloaded and compiled mvapich-0.9.8 but my application would not 
>>>> run
>>>> and all the benchmarks failed to run.  Every application locks up in 
>>>> the
>>>> same spot so I've included the backtrace.
>>>>
>>>> Here's the backtrace I got attaching gdb to osu_bw.
>>>>
>>>> (gdb) bt
>>>> #0  0xb7f457c3 in smpi_init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #1  0xb7f43bc2 in MPID_Init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #2  0xb7f3aea8 in MPIR_Init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #3  0xb7f3acbe in PMPI_Init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #4  0x08048794 in main (argc=1, argv=0xbfd2e8e4) at osu_bw.c:80
>>>>
>>>> I managed to get everything running properly be removing -D_SMP_ and -
>>>> D_SMP_RNDV_ from the build flags.  I don't recall any warnings about -
>>>> D_SMP_ in the documentation.  Am I correct in assuming this should not
>>>> happen?
>>>>
>>>> I am using Mellanox PCI-X cards from Voltaire with 3.4.0 firmware and
>>>> OFED mthca driver on a 2.6.17 kernel.  All my machines are dual Opteron
>>>> 248s.
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at mail.cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>     
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at mail.cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>  
>>
> 
> 


-- 
Pavel Shamis (Pasha)
Software Engineer
Mellanox Technologies LTD.
pasha at mellanox.co.il


More information about the mvapich-discuss mailing list