[mvapich-discuss] Re: Rndv Receiver is receiving less than as expected

Aaron Knister aaron.knister at gmail.com
Mon Jun 28 19:06:32 EDT 2010


For what it's worth I just set MV2_RNDV_PROTOCOL=RGET and so far the
performance is on-par and a little better than that of OpenMPI with this
application. I'll post back once I find out if it runs until completion or
crashes.

One question though-- is there any harm in setting this variable on a
permanent basis, particularly in terms of performance?

On Mon, Jun 28, 2010 at 6:11 PM, Aaron Knister <aaron.knister at gmail.com>wrote:

> Hi,
>
> I'm running mvapich2-1.4rc2 using SLURM as the PMI and having some
> difficulties with gromacs-4.0.7. I can't find the exact number but with
> processor counts somewhere after 40-- definitely 80 and higher the gromacs
> application terminates after some time (the amount of time varies slightly
> between runs) with this error:
>
>
> Warning! Rndv Receiver is receiving (13760 < 24768) less than as expected
> Fatal error in MPI_Alltoall:
> Message truncated, error stack:
> MPI_Alltoall(734)......................: MPI_Alltoall(sbuf=0x1672840,
> scount=344, MPI_FLOAT, rbuf=0x2aaaad349360, rcount=344, MPI_FLOAT,
> comm=0xc4000000) failed
> MPIR_Alltoall(193).....................:
> MPIDI_CH3U_Post_data_receive_found(445): Message from rank 21 and tag 9
> truncated; 24768 bytes received but buffer size is 13760
> Warning! Rndv Receiver is receiving (22016 < 27520) less than as expected
> Fatal error in MPI_Alltoall:
> Message truncated, error stack:
> MPI_Alltoall(734)......................: MPI_Alltoall(sbuf=0x2aaaad3ce4e0,
> scount=344, MPI_FLOAT, rbuf=0x1e6af900, rcount=344, MPI_FLOAT,
> comm=0xc4000004) failed
> MPIR_Alltoall(193).....................:
> MPIDI_CH3U_Post_data_receive_found(445): Message from rank 17 and tag 9
> truncated; 27520 bytes received but buffer size is 22016
>
> The sizes of the buffers aren't identical each time, but the rank numbers
> that throw the errors seem to be consistent. The error doesn't occur with
> OpenMPI which interestingly runs the code significantly faster than mvapich2
> although I don't know why. I've also tried mvapich2-1.5rc2 and the error is
> still present. Please let me know if you need any additional information
> from me.
>
> Thanks in advance!
>
> -Aaron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100628/f8edfb5b/attachment-0001.html


More information about the mvapich-discuss mailing list