[mvapich-discuss] MVAPICH2 VAPI and Scaling

Vishwas vvasisht at locuz.com
Fri Nov 3 13:58:55 EST 2006


Hi,

I tried with TCP/IP.. it scales.
Now I have VAPI also scaling, after I did the following changes..
a. VCLUSTER=_LARGE_CLUSTER (Before it was _MEDIUM_CLUSTER)
b. HAVE_MPD_RING="-DUSE_MPD_RING" (Before it was "")
c. MULTI_THREAD="yes" (Before it was "")

Can you please tell me how it worked.. what is the reason.. Since it is
working fine, I did not play around it.

One more thing I tried was to put VCLUSTER=_LARGE_CLUSTER in UDapl
implementation and try if it scales.. But it did not scale.

Regards
Vishwas

-----Original Message-----
From: wei huang [mailto:huanwei at cse.ohio-state.edu] 
Sent: Thursday, November 02, 2006 11:06 PM
To: Vishwas Vasisht
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: RE: [mvapich-discuss] MVAPICH2 VAPI and Scaling

Hi,

> I have tried with mpich2. It works fine for all (64*4) 256 processor. I
tried hellow, cpi codes over this.

How about mvapich2-0.9.6 compiled with TCP/IP?

-- Wei

> -----Original Message-----
> From: wei huang [mailto:huanwei at cse.ohio-state.edu]
> Sent: Thu 11/2/2006 9:48 PM
> To: Vishwas Vasisht
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: RE: [mvapich-discuss] MVAPICH2  VAPI and Scaling
>
> Hi,
>
> It looks like some setup problem here. In order to separate things out,
> would you please compile with TCP/IP and see if your job can start
> sucessfully on more than 32 nodes? To do so, please see:
>
>
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2
_user_guide.html#x1-130004.4.4
>
> Or if you have mpich2 release, you may also try using a fresh mpich2
> release.
>
> This will help diagnose the problem faster.
>
> Thanks.
>
> -- Wei
>
> >
> > The whole implementation is being done phasewise. The final
implementatin will be a cluster will be 64 nodes (+  master) of dual
cpu(dual core). All these nodes now I have updated with new version as
suggested.
> > I have given the size of cluster to come under MEDIUM.
> > mpd.hosts file contains
> > node000 ifhn=inode000
> > node000 ifhn=inode000
> > ...
> > node063 ifhn=inode063
> >
> > If you require any information you can mail me. Please help me to get
through this problem.
> >
> > Regards
> > Vishwas
> >
> > -----Original Message-----
> > From: wei huang [mailto:huanwei at cse.ohio-state.edu]
> > Sent: Thu 11/2/2006 7:00 PM
> > To: Vishwas Vasisht
> > Cc: mvapich-discuss at cse.ohio-state.edu
> > Subject: Re: [mvapich-discuss] MVAPICH2  VAPI and Scaling
> >
> > Hi Vishwas,
> >
> > > To lauch the mpd
> > >
> > > mpdboot -n 65 -f mpd.hosts -m /usr/local/mvapich2/bin/mpd --verbose
> > > -ifhn=ifrontend
> >
> > If I understood correctly, you have a 32 nodes cluster. So I am not sure
> > why here you use 65 as input parameter? Also, what have you put in your
> > mpd.hosts?
> >
> > We found that you are using an old release of MVAPICH2 (0.9.3), the
latest
> > release version is 0.9.6. Is it possible for you to update your stack?
You
> > just need to download the tarball from our website and compile, there
> > should be no additional complexity than re-compiling 0.9.3.
> >
> > Thanks.
> >
> > -- Wei
> >
> > >
> > >
> > >
> > > To run my job
> > >
> > > mpirun -np <num> ./a.out
> > >
> > >
> > >
> > > If this <num> is greater than 32, job get stuck at the about command
and
> > > remains there for very long time.
> > >
> > >
> > >
> > > Can anyone please help me to resolve this.
> > >
> > >
> > >
> > > Regards
> > >
> > > Vishwas
> > >
> > >
> >
> >
> >
> >
>
>
>
>



-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.22/512 - Release Date: 11/1/2006




More information about the mvapich-discuss mailing list