[mvapich-discuss] Rndv Receiver is receiving less than as expected

Aaron Knister aaron.knister at gmail.com
Mon Jun 28 19:14:33 EDT 2010


Hi Krishna,

Thanks for the reply. I'm using IBM branded ConnectX HCAs (MT26428) with fw
2.6.648, and the switch is a QLogic 12800-180. As far as compilers I'm using
the gnu compilers (gcc,g++,gfortran) that come with redhat- version
4.1.2-48.

-Aaron

On Mon, Jun 28, 2010 at 7:06 PM, Krishna Chaitanya Kandalla <
kandalla at cse.ohio-state.edu> wrote:

> Hi Aaron,
>               Thank you for reporting this error. Can you please let us
> know about the kind of hardware and the compiler that you are using?
>
> Thanks,
> Krishna
>
>
>
> On 06/28/2010 06:11 PM, Aaron Knister wrote:
>
>> Hi,
>>
>> I'm running mvapich2-1.4rc2 using SLURM as the PMI and having some
>> difficulties with gromacs-4.0.7. I can't find the exact number but with
>> processor counts somewhere after 40-- definitely 80 and higher the gromacs
>> application terminates after some time (the amount of time varies slightly
>> between runs) with this error:
>>
>>
>> Warning! Rndv Receiver is receiving (13760 < 24768) less than as expected
>> Fatal error in MPI_Alltoall:
>> Message truncated, error stack:
>> MPI_Alltoall(734)......................: MPI_Alltoall(sbuf=0x1672840,
>> scount=344, MPI_FLOAT, rbuf=0x2aaaad349360, rcount=344, MPI_FLOAT,
>> comm=0xc4000000) failed
>> MPIR_Alltoall(193).....................:
>> MPIDI_CH3U_Post_data_receive_found(445): Message from rank 21 and tag 9
>> truncated; 24768 bytes received but buffer size is 13760
>> Warning! Rndv Receiver is receiving (22016 < 27520) less than as expected
>> Fatal error in MPI_Alltoall:
>> Message truncated, error stack:
>> MPI_Alltoall(734)......................: MPI_Alltoall(sbuf=0x2aaaad3ce4e0,
>> scount=344, MPI_FLOAT, rbuf=0x1e6af900, rcount=344, MPI_FLOAT,
>> comm=0xc4000004) failed
>> MPIR_Alltoall(193).....................:
>> MPIDI_CH3U_Post_data_receive_found(445): Message from rank 17 and tag 9
>> truncated; 27520 bytes received but buffer size is 22016
>>
>> The sizes of the buffers aren't identical each time, but the rank numbers
>> that throw the errors seem to be consistent. The error doesn't occur with
>> OpenMPI which interestingly runs the code significantly faster than mvapich2
>> although I don't know why. I've also tried mvapich2-1.5rc2 and the error is
>> still present. Please let me know if you need any additional information
>> from me.
>>
>> Thanks in advance!
>>
>> -Aaron
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100628/ebaa482d/attachment.html


More information about the mvapich-discuss mailing list