[mvapich-discuss] Help with polled desc error
Shao-Ching Huang
schuang at ats.ucla.edu
Fri Feb 1 20:24:36 EST 2008
Hi Wei,
We cleaned up a few things and re-ran the mpiGraph tests. The updated
results are posted here:
http://reynolds.turb.ucla.edu/~schuang/mpiGraph/mpiGraph-8a.out_html/index.html
http://reynolds.turb.ucla.edu/~schuang/mpiGraph/mpiGraph-9a.out_html/index.html
Please ignore results in my previous email. Thank you.
Regards,
Shao-Ching
On Thu, Jan 31, 2008 at 08:35:41PM -0800, Shao-Ching Huang wrote:
>
> Hi Wei,
>
> We did 2 runs of mpiGraph that you suggested on 48 nodes, with one (1)
> MPI process per node:
>
> mpiexec -np 48 ./mpiGraph 4096 10 10 >& mpiGraph.out
>
> The results from the two runs are posted here:
>
> http://reynolds.turb.ucla.edu/~schuang/mpiGraph/mpiGraph-1.out_html/
> http://reynolds.turb.ucla.edu/~schuang/mpiGraph/mpiGraph-2.out_html/
>
> During the tests, some other users are also running jobs on some of
> these 48 nodes.
>
> Could you please help us interpret these results, if possible?
>
> Thank you.
>
> Shao-Ching Huang
>
>
> On Thu, Jan 31, 2008 at 01:05:06PM -0500, wei huang wrote:
> > Hi Scott,
> >
> > We went up to 256 processes (32 nodes) and did not see the problem in few
> > hundred runs (cpi). Thus, to narrow down the problem, we want to make sure
> > the fabrics and system setup are ok. To diagnose this, we suggest you
> > running mpiGraph program from http://sourceforge.net/projects/mpigraph.
> > This test stresses the interconnects. It should fail at a much higher
> > frequency than simple cpi program if there is a problem with your system
> > setup.
> >
> > Thanks.
> >
> > Regards,
> > Wei Huang
> >
> > 774 Dreese Lab, 2015 Neil Ave,
> > Dept. of Computer Science and Engineering
> > Ohio State University
> > OH 43210
> > Tel: (614)292-8501
> >
> >
> > On Wed, 30 Jan 2008, Scott A. Friedman wrote:
> >
> > > My co-worker passed this along...
> > >
> > > Yes, the error happens on the cpi.c program too. It happened 2 times
> > > among the 9 cases I ran.
> > >
> > > I was using 128 processes (on 32 4-core nodes).
> > >
> > > ---
> > >
> > > and another...
> > >
> > > It happens for a simple MPI program which just does MPI_Init and
> > > MPI_Finalize and print out number of processors. It happened for
> > > anything from 4 nodes (16 processors ) and more.
> > >
> > > What environment variables should we look for?
> > >
> > > Thanks,
> > > Scott
> > >
> > > wei huang wrote:
> > > > Hi Scott,
> > > >
> > > > On how many processes (and how many nodes) you ran your program? Do you
> > > > have any environmental variables when you are running the program? Does
> > > > the error happen on simple test like cpi?
> > > >
> > > > Thanks.
> > > >
> > > > Regards,
> > > > Wei Huang
> > > >
> > > > 774 Dreese Lab, 2015 Neil Ave,
> > > > Dept. of Computer Science and Engineering
> > > > Ohio State University
> > > > OH 43210
> > > > Tel: (614)292-8501
> > > >
> > > >
> > > > On Wed, 30 Jan 2008, Scott A. Friedman wrote:
> > > >
> > > >> The low level ibv tests work fine.
> > > >
> > > > _______________________________________________
> > > > mvapich-discuss mailing list
> > > > mvapich-discuss at cse.ohio-state.edu
> > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > >
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list