[mvapich-discuss] Old error popping up again - message truncated

Gardiner, Judith judithg at osc.edu
Wed Jan 6 09:44:05 EST 2016


Hi Tobias,

Thank you for the pointer to the MUST tool.  I wasn’t familiar with it.

I’m well aware that buggy code can work with some MPI implementations and not others, but this isn’t my code and I’m trying to avoid debugging it if possible.

Best,
Judy

From: Tobias Hilbrich [mailto:tobias.hilbrich at tu-dresden.de]
Sent: Wednesday, January 06, 2016 2:01 AM
To: Gardiner, Judith
Cc: mvapich-discuss at cse.ohio-state.edu; Protze, Joachim; Felix Münchhalfen
Subject: Re: [mvapich-discuss] Old error popping up again - message truncated

Hi Judith,

to investigate whether there is maybe an issue in you application, you may want to have a look at MUST:
https://doc.itc.rwth-aachen.de/display/CCP/Project+MUST

Its a tool that we develop to oversee MPI usage. It tells you whether your MPI calls are according to the MPI standard documents, or whether there are any illegal calls. If it tells you everything is fine, it helps to narrow the scope onto the MPI implementation. If it tells you there is an issue in your code, it may resolve the issue also. That things work with OpenMPI does not ensures that your application is doing everything perfectly. In our experience several MPI usage errors are being tolerated by some MPI implementations, but may crash others. Let us know if we can help.

Best,
-Tobias

--
Dr.-Ing. Tobias Hilbrich
Research Assistant

Technische Universitaet Dresden, Germany
Tel.: +49 (351) 463-38485<tel:%2B49%20%28351%29%20463-38485>
E-Mail: tobias.hilbrich at tu-dresden.de<mailto:tobias.hilbrich at tu-dresden.de>

On 05 Jan 2016, at 17:10, Gardiner, Judith <judithg at osc.edu<mailto:judithg at osc.edu>> wrote:

We’re using mvapich2-2.1 and are encountering an error that I thought was solved a couple of versions ago.  The code runs correctly on 240 processors with OpenMPI, although with inconsistent performance, but it fails with mvapich2.  I haven’t tried to debug it to be sure it’s not an application error.  Do you have an environment variable I can set to quickly figure that out?  The code is Fortran 90, if that makes any difference.

Fatal error in MPI_Allreduce:
Message truncated, error stack:
MPI_Allreduce(937)........................: MPI_Allreduce(sbuf=0x7fff6a048a7c, rbuf=0x7fff6a048a78, count=1, MPI_INT, MPI_MAX, comm=0x84000004) failed
MPI_Allreduce(919)........................:
MPIDI_CH3I_SHMEM_COLL_Barrier_bcast(1496).:
create_2level_comm(708)...................:
MPIR_Allreduce_impl(777)..................:
MPIR_Allreduce_index_tuned_intra_MV2(2486):
FUNCNAME(357).............................:
MPIDI_CH3U_Receive_data_found(282)........: Message from rank 1 and tag 14 truncated; 260 bytes received but buffer size is 4


Thanks for your help.

Judy

--
Judith D. Gardiner, Ph.D.
Ohio Supercomputer Center
614-292-9623
judithg at osc.edu<mailto:judithg at osc.edu>

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 14153 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160106/932b466c/attachment-0001.bin>


More information about the mvapich-discuss mailing list