[mvapich-discuss] MVAPICH-0.9.8 lockup with OFED-1.1-rc3

Andrew Dobbie adobbie at cims.carleton.ca
Wed Sep 13 11:22:00 EDT 2006


I will change the hosts file and see if that fixes the problem.

I'm in the process of compiling and installing OFED and mvapich on the
64bit right now, will have results shortly.

-Andrew

On Wed, 2006-09-13 at 10:13 -0500, Sayantan Sur wrote:
> Hello Andrew,
> 
> Andrew Dobbie wrote:
> 
> >The hosts file is identical for all machines except for the 127.0.0.1
> >entry.  Will this cause a problem?  Also, is it wise to specify IPoIB
> >addresses of machines instead of ethernet?  I am confused as to why I
> >only have problems with mvapich compiled with _SMP_ and not without.
> >  
> >
> Just for clarification, are you still on the 32-bit OS or on the 64-bit 
> one? IPoIB shouldn't make a difference. If you aren't launching very 
> very large number of processes, it won't likely make a difference which 
> one you use (IP over Ethernet or IP over IB), since the control traffic 
> in mpirun_rsh will be small.
> 
> As Pasha indicates, the problem could be with the hostfiles. MVAPICH 
> (the code under _SMP_) figures out which processes are on the same node 
> by doing comparing the hostnames.
> 
> >127.0.0.1       ND01    localhost.localdomain   localhost
> >  
> >
> What happens if all the /etc/host files say:
> 
> 127.0.0.1      localhost.localdomain localhost
> 
> (instead of having ND01 in the line)
> 
> Thanks,
> Sayantan.
> 
> >192.168.1.80    FLSRVR
> >192.168.8.1     ND01
> >192.168.8.2     ND02
> >192.168.8.3     ND03
> >192.168.8.4     ND04
> >192.168.8.5     ND05
> >192.168.8.6     ND06
> >  
> >
> 
> >On Wed, 2006-09-13 at 15:19 +0300, Pavel Shamis (Pasha) wrote:
> >  
> >
> >>Andrew Dobbie wrote:
> >>    
> >>
> >>>Yes.  The application I use runs mpirun_rsh and has the same problem as
> >>>the benchmarks.  The hostnames I use point to IPoIB addresses of the
> >>>machines and all hosts have the same entries in /etc/hosts.  Specifying
> >>>which hosts to use on command line or from -hostfile doesn't seem to
> >>>matter.
> >>>
> >>>Does that answer the question you were asking?
> >>>
> >>>      
> >>>
> >>Can you please provide your /etc/hosts file? The file
> >>should be exactly the same on all machines, please check it.
> >>
> >>    
> >>
> >
> >
> >_______________________________________________
> >mvapich-discuss mailing list
> >mvapich-discuss at mail.cse.ohio-state.edu
> >http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >  
> >
> 
> 




More information about the mvapich-discuss mailing list