[mvapich-discuss] MVAPICH-0.9.8 lockup with OFED-1.1-rc3
Pavel Shamis (Pasha)
pasha at mellanox.co.il
Sun Sep 10 11:18:53 EDT 2006
Andrew,
Do you see the problem with any mpi application ?
I'm not sure but it looks like hostfile issue.
Sayantan Sur wrote:
> Hello Andrew,
>
> Andrew Dobbie wrote:
>
>> Hi Dr. Panda,
>>
>> I just tested using OFED-1.0 and I am getting the exact same results as
>> I did with 1.1-rc3 using gen2. The systems I am using have 32bit RHEL4
>> update 3 installed.
>>
>> Is there anything else you would like me to try?
>>
>>
> I am just wondering whether you see the same behavior with 64-bit RHEL4
> using OFED-1.0/1.1?
>
> Thanks,
> Sayantan.
>
>> -Andrew
>>
>> On Fri, 2006-09-08 at 12:42 -0400, Dhabaleswar Panda wrote:
>>
>>
>>> Andrew - Thanks for your note. Since OFED-1.1 is still being finalized
>>> and not released yet, we have not installed it on any of your systems
>>> yet. We have tested MVAPICH 0.9.8 with OFED-1.0 and it works without
>>> any problem. On your system, do you see this problem with OFED-1.0 or
>>> only with OFED-1.1-rc3?
>>>
>>> Thanks,
>>> DK
>>>
>>>
>>>
>>>> I'm not sure if this problem is caused by MVAPICH directly but I'm
>>>> certainly you will know what the cause is.
>>>>
>>>> I downloaded and compiled mvapich-0.9.8 but my application would not
>>>> run
>>>> and all the benchmarks failed to run. Every application locks up in
>>>> the
>>>> same spot so I've included the backtrace.
>>>>
>>>> Here's the backtrace I got attaching gdb to osu_bw.
>>>>
>>>> (gdb) bt
>>>> #0 0xb7f457c3 in smpi_init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #1 0xb7f43bc2 in MPID_Init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #2 0xb7f3aea8 in MPIR_Init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #3 0xb7f3acbe in PMPI_Init ()
>>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
>>>> #4 0x08048794 in main (argc=1, argv=0xbfd2e8e4) at osu_bw.c:80
>>>>
>>>> I managed to get everything running properly be removing -D_SMP_ and -
>>>> D_SMP_RNDV_ from the build flags. I don't recall any warnings about -
>>>> D_SMP_ in the documentation. Am I correct in assuming this should not
>>>> happen?
>>>>
>>>> I am using Mellanox PCI-X cards from Voltaire with 3.4.0 firmware and
>>>> OFED mthca driver on a 2.6.17 kernel. All my machines are dual Opteron
>>>> 248s.
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at mail.cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at mail.cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
--
Pavel Shamis (Pasha)
Software Engineer
Mellanox Technologies LTD.
pasha at mellanox.co.il
More information about the mvapich-discuss
mailing list