[mvapich-discuss] MVAPICH-0.9.8 lockup with OFED-1.1-rc3

Sayantan Sur surs at cse.ohio-state.edu
Fri Sep 8 15:34:27 EDT 2006


Hello Andrew,

Andrew Dobbie wrote:

>Hi Dr. Panda,
>
>I just tested using OFED-1.0 and I am getting the exact same results as
>I did with 1.1-rc3 using gen2.  The systems I am using have 32bit RHEL4
>update 3 installed.
>
>Is there anything else you would like me to try?
>  
>
I am just wondering whether you see the same behavior with 64-bit RHEL4 
using OFED-1.0/1.1?

Thanks,
Sayantan.

>-Andrew
>
>On Fri, 2006-09-08 at 12:42 -0400, Dhabaleswar Panda wrote:
>  
>
>>Andrew - Thanks for your note. Since OFED-1.1 is still being finalized
>>and not released yet, we have not installed it on any of your systems
>>yet.  We have tested MVAPICH 0.9.8 with OFED-1.0 and it works without
>>any problem. On your system, do you see this problem with OFED-1.0 or
>>only with OFED-1.1-rc3?
>>
>>Thanks, 
>>
>>DK
>>
>>
>>    
>>
>>>I'm not sure if this problem is caused by MVAPICH directly but I'm
>>>certainly you will know what the cause is.
>>>
>>>I downloaded and compiled mvapich-0.9.8 but my application would not run
>>>and all the benchmarks failed to run.  Every application locks up in the
>>>same spot so I've included the backtrace.
>>>
>>>Here's the backtrace I got attaching gdb to osu_bw.
>>>
>>>(gdb) bt
>>>#0  0xb7f457c3 in smpi_init ()
>>>from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>#1  0xb7f43bc2 in MPID_Init ()
>>>from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>#2  0xb7f3aea8 in MPIR_Init ()
>>>from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>#3  0xb7f3acbe in PMPI_Init ()
>>>from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>#4  0x08048794 in main (argc=1, argv=0xbfd2e8e4) at osu_bw.c:80
>>>
>>>I managed to get everything running properly be removing -D_SMP_ and -
>>>D_SMP_RNDV_ from the build flags.  I don't recall any warnings about -
>>>D_SMP_ in the documentation.  Am I correct in assuming this should not
>>>happen?
>>>
>>>I am using Mellanox PCI-X cards from Voltaire with 3.4.0 firmware and
>>>OFED mthca driver on a 2.6.17 kernel.  All my machines are dual Opteron
>>>248s.
>>>
>>>Thanks in advance.
>>>
>>>
>>>_______________________________________________
>>>mvapich-discuss mailing list
>>>mvapich-discuss at mail.cse.ohio-state.edu
>>>http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>      
>>>
>
>
>_______________________________________________
>mvapich-discuss mailing list
>mvapich-discuss at mail.cse.ohio-state.edu
>http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>  
>


-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list