[mvapich-discuss] Old error popping up again - message truncated

Tobias Hilbrich tobias.hilbrich at tu-dresden.de
Wed Jan 6 02:00:31 EST 2016


Hi Judith,

to investigate whether there is maybe an issue in you application, you may want to have a look at MUST:
https://doc.itc.rwth-aachen.de/display/CCP/Project+MUST <https://doc.itc.rwth-aachen.de/display/CCP/Project+MUST>

Its a tool that we develop to oversee MPI usage. It tells you whether your MPI calls are according to the MPI standard documents, or whether there are any illegal calls. If it tells you everything is fine, it helps to narrow the scope onto the MPI implementation. If it tells you there is an issue in your code, it may resolve the issue also. That things work with OpenMPI does not ensures that your application is doing everything perfectly. In our experience several MPI usage errors are being tolerated by some MPI implementations, but may crash others. Let us know if we can help.

Best,
-Tobias

-- 
Dr.-Ing. Tobias Hilbrich
Research Assistant

Technische Universitaet Dresden, Germany
Tel.: +49 (351) 463-38485 <tel:%2B49%20%28351%29%20463-38485>
E-Mail: tobias.hilbrich at tu-dresden.de <mailto:tobias.hilbrich at tu-dresden.de>
> On 05 Jan 2016, at 17:10, Gardiner, Judith <judithg at osc.edu> wrote:
> 
> We’re using mvapich2-2.1 and are encountering an error that I thought was solved a couple of versions ago.  The code runs correctly on 240 processors with OpenMPI, although with inconsistent performance, but it fails with mvapich2.  I haven’t tried to debug it to be sure it’s not an application error.  Do you have an environment variable I can set to quickly figure that out?  The code is Fortran 90, if that makes any difference.
>  
> Fatal error in MPI_Allreduce:
> Message truncated, error stack:
> MPI_Allreduce(937)........................: MPI_Allreduce(sbuf=0x7fff6a048a7c, rbuf=0x7fff6a048a78, count=1, MPI_INT, MPI_MAX, comm=0x84000004) failed
> MPI_Allreduce(919)........................:
> MPIDI_CH3I_SHMEM_COLL_Barrier_bcast(1496).:
> create_2level_comm(708)...................:
> MPIR_Allreduce_impl(777)..................:
> MPIR_Allreduce_index_tuned_intra_MV2(2486):
> FUNCNAME(357).............................:
> MPIDI_CH3U_Receive_data_found(282)........: Message from rank 1 and tag 14 truncated; 260 bytes received but buffer size is 4
>  
>  
> Thanks for your help.
>  
> Judy
>  
> --
> Judith D. Gardiner, Ph.D.
> Ohio Supercomputer Center
> 614-292-9623
> judithg at osc.edu <mailto:judithg at osc.edu>
>  
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160106/7ca54a0b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4928 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160106/7ca54a0b/attachment-0001.p7s>


More information about the mvapich-discuss mailing list