[mvapich-discuss] MVAPICH-0.9.8 lockup with OFED-1.1-rc3

Andrew Dobbie adobbie at cims.carleton.ca
Mon Sep 11 08:55:45 EDT 2006


Yes.  The application I use runs mpirun_rsh and has the same problem as
the benchmarks.  The hostnames I use point to IPoIB addresses of the
machines and all hosts have the same entries in /etc/hosts.  Specifying
which hosts to use on command line or from -hostfile doesn't seem to
matter.

Does that answer the question you were asking?

On Sun, 2006-09-10 at 18:18 +0300, Pavel Shamis (Pasha) wrote:
> Andrew,
> Do you see the problem with any mpi application ?
> I'm not sure but it looks like hostfile issue.
> 
> 
> Sayantan Sur wrote:
> > Hello Andrew,
> > 
> > Andrew Dobbie wrote:
> > 
> >> Hi Dr. Panda,
> >>
> >> I just tested using OFED-1.0 and I am getting the exact same results as
> >> I did with 1.1-rc3 using gen2.  The systems I am using have 32bit RHEL4
> >> update 3 installed.
> >>
> >> Is there anything else you would like me to try?
> >>  
> >>
> > I am just wondering whether you see the same behavior with 64-bit RHEL4 
> > using OFED-1.0/1.1?
> > 
> > Thanks,
> > Sayantan.
> > 
> >> -Andrew
> >>
> >> On Fri, 2006-09-08 at 12:42 -0400, Dhabaleswar Panda wrote:
> >>  
> >>
> >>> Andrew - Thanks for your note. Since OFED-1.1 is still being finalized
> >>> and not released yet, we have not installed it on any of your systems
> >>> yet.  We have tested MVAPICH 0.9.8 with OFED-1.0 and it works without
> >>> any problem. On your system, do you see this problem with OFED-1.0 or
> >>> only with OFED-1.1-rc3?
> >>>
> >>> Thanks,
> >>> DK
> >>>
> >>>
> >>>   
> >>>> I'm not sure if this problem is caused by MVAPICH directly but I'm
> >>>> certainly you will know what the cause is.
> >>>>
> >>>> I downloaded and compiled mvapich-0.9.8 but my application would not 
> >>>> run
> >>>> and all the benchmarks failed to run.  Every application locks up in 
> >>>> the
> >>>> same spot so I've included the backtrace.
> >>>>
> >>>> Here's the backtrace I got attaching gdb to osu_bw.
> >>>>
> >>>> (gdb) bt
> >>>> #0  0xb7f457c3 in smpi_init ()
> >>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
> >>>> #1  0xb7f43bc2 in MPID_Init ()
> >>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
> >>>> #2  0xb7f3aea8 in MPIR_Init ()
> >>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
> >>>> #3  0xb7f3acbe in PMPI_Init ()
> >>>> from /usr/local/mvapich/lib/shared/libmpich.so.1.0
> >>>> #4  0x08048794 in main (argc=1, argv=0xbfd2e8e4) at osu_bw.c:80
> >>>>
> >>>> I managed to get everything running properly be removing -D_SMP_ and -
> >>>> D_SMP_RNDV_ from the build flags.  I don't recall any warnings about -
> >>>> D_SMP_ in the documentation.  Am I correct in assuming this should not
> >>>> happen?
> >>>>
> >>>> I am using Mellanox PCI-X cards from Voltaire with 3.4.0 firmware and
> >>>> OFED mthca driver on a 2.6.17 kernel.  All my machines are dual Opteron
> >>>> 248s.
> >>>>
> >>>> Thanks in advance.
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> mvapich-discuss mailing list
> >>>> mvapich-discuss at mail.cse.ohio-state.edu
> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>>
> >>>>     
> >>
> >>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at mail.cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>  
> >>
> > 
> > 
> 
> 




More information about the mvapich-discuss mailing list