[mvapich-discuss] MVAPICH2 VAPI and Scaling

wei huang huanwei at cse.ohio-state.edu
Thu Nov 2 11:18:48 EST 2006


Hi,

It looks like some setup problem here. In order to separate things out,
would you please compile with TCP/IP and see if your job can start
sucessfully on more than 32 nodes? To do so, please see:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html#x1-130004.4.4

Or if you have mpich2 release, you may also try using a fresh mpich2
release.

This will help diagnose the problem faster.

Thanks.

-- Wei

>
> The whole implementation is being done phasewise. The final implementatin will be a cluster will be 64 nodes (+  master) of dual cpu(dual core). All these nodes now I have updated with new version as suggested.
> I have given the size of cluster to come under MEDIUM.
> mpd.hosts file contains
> node000 ifhn=inode000
> node000 ifhn=inode000
> ...
> node063 ifhn=inode063
>
> If you require any information you can mail me. Please help me to get through this problem.
>
> Regards
> Vishwas
>
> -----Original Message-----
> From: wei huang [mailto:huanwei at cse.ohio-state.edu]
> Sent: Thu 11/2/2006 7:00 PM
> To: Vishwas Vasisht
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] MVAPICH2  VAPI and Scaling
>
> Hi Vishwas,
>
> > To lauch the mpd
> >
> > mpdboot -n 65 -f mpd.hosts -m /usr/local/mvapich2/bin/mpd --verbose
> > -ifhn=ifrontend
>
> If I understood correctly, you have a 32 nodes cluster. So I am not sure
> why here you use 65 as input parameter? Also, what have you put in your
> mpd.hosts?
>
> We found that you are using an old release of MVAPICH2 (0.9.3), the latest
> release version is 0.9.6. Is it possible for you to update your stack? You
> just need to download the tarball from our website and compile, there
> should be no additional complexity than re-compiling 0.9.3.
>
> Thanks.
>
> -- Wei
>
> >
> >
> >
> > To run my job
> >
> > mpirun -np <num> ./a.out
> >
> >
> >
> > If this <num> is greater than 32, job get stuck at the about command and
> > remains there for very long time.
> >
> >
> >
> > Can anyone please help me to resolve this.
> >
> >
> >
> > Regards
> >
> > Vishwas
> >
> >
>
>
>
>




More information about the mvapich-discuss mailing list