[mvapich-discuss] Re: Rndv Receiver is receiving less than as expected

Aaron Knister aaronk at umbc.edu
Tue Jun 29 11:39:38 EDT 2010


Oops- forgot to CC the list on my reply.

DK,

Using the RGET protocol the application ran to completion and was several
orders of magnitude faster than RPUT. I'm considering setting
MV2_RNDV_PROTOCOL to RGET as a system default-- would you advise against
this?

It's on the to-do list to move to to 1.4.1 or 1.5rc2, though it's never an
easy task to get users to recompile code and move them forward to a new mpi
version.

Thanks for looking into this!

-Aaron

On Tue, Jun 29, 2010 at 10:49 AM, Dhabaleswar Panda <
panda at cse.ohio-state.edu> wrote:

> Thanks for the update here. The RGET protocol (MV2_RNDV_PROTOCOL=RGET)
> allows a process A to get data from B while B is computing. This allows
> overlap of computation and communication. If you are seeing good
> performance with this option, you can use it for this application without
> any harm. MVAPICH2 supports both RPUT and RGET protocols. Based on the
> computation-communication characteristics of an application, you can use
> one of these two protocols to obtain the maximum performance.
>
> We are taking a look at the error you reported yesterday. Let us know
> whether you see similar error with RGET protocol or not.
>
> You also indicated in yesterday's e-mail that you are using 1.4rc2 (even
> though you see the error in 1.5rc2). The 1.4rc2 was released during August
> 2009. Since then many feature enhancements and bug fixes have happened.
> You can use either the 1.4.1 branch version (with all bug fixes) or
> update it to the latest 1.5rc2 version.
>
> Thanks,
>
> DK
>
> > For what it's worth I just set MV2_RNDV_PROTOCOL=RGET and so far the
> > performance is on-par and a little better than that of OpenMPI with this
> > application. I'll post back once I find out if it runs until completion
> or
> > crashes.
> >
> > One question though-- is there any harm in setting this variable on a
> > permanent basis, particularly in terms of performance?
> >
> > On Mon, Jun 28, 2010 at 6:11 PM, Aaron Knister <aaron.knister at gmail.com
> >wrote:
> >
> > > Hi,
> > >
> > > I'm running mvapich2-1.4rc2 using SLURM as the PMI and having some
> > > difficulties with gromacs-4.0.7. I can't find the exact number but with
> > > processor counts somewhere after 40-- definitely 80 and higher the
> gromacs
> > > application terminates after some time (the amount of time varies
> slightly
> > > between runs) with this error:
> > >
> > >
> > > Warning! Rndv Receiver is receiving (13760 < 24768) less than as
> expected
> > > Fatal error in MPI_Alltoall:
> > > Message truncated, error stack:
> > > MPI_Alltoall(734)......................: MPI_Alltoall(sbuf=0x1672840,
> > > scount=344, MPI_FLOAT, rbuf=0x2aaaad349360, rcount=344, MPI_FLOAT,
> > > comm=0xc4000000) failed
> > > MPIR_Alltoall(193).....................:
> > > MPIDI_CH3U_Post_data_receive_found(445): Message from rank 21 and tag 9
> > > truncated; 24768 bytes received but buffer size is 13760
> > > Warning! Rndv Receiver is receiving (22016 < 27520) less than as
> expected
> > > Fatal error in MPI_Alltoall:
> > > Message truncated, error stack:
> > > MPI_Alltoall(734)......................:
> MPI_Alltoall(sbuf=0x2aaaad3ce4e0,
> > > scount=344, MPI_FLOAT, rbuf=0x1e6af900, rcount=344, MPI_FLOAT,
> > > comm=0xc4000004) failed
> > > MPIR_Alltoall(193).....................:
> > > MPIDI_CH3U_Post_data_receive_found(445): Message from rank 17 and tag 9
> > > truncated; 27520 bytes received but buffer size is 22016
> > >
> > > The sizes of the buffers aren't identical each time, but the rank
> numbers
> > > that throw the errors seem to be consistent. The error doesn't occur
> with
> > > OpenMPI which interestingly runs the code significantly faster than
> mvapich2
> > > although I don't know why. I've also tried mvapich2-1.5rc2 and the
> error is
> > > still present. Please let me know if you need any additional
> information
> > > from me.
> > >
> > > Thanks in advance!
> > >
> > > -Aaron
> > >
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>


-- 
Aaron Knister
Systems Administrator
JCET/DoIT
University of Maryland, Baltimore County
aaronk at umbc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100629/e0b3be2f/attachment-0001.html


More information about the mvapich-discuss mailing list