[mvapich-discuss] MPI_BCAST appears to work incorrectly in vasp code - mvapich2 2.1rc1

Hari Subramoni subramoni.1 at osu.edu
Thu Jan 15 15:19:41 EST 2015


Hello Judith,

Thanks for the report. We've not seen any data validation issues like this
in our internal testing.

Could you please try with MV2_USE_ZCOPY_BCAST=0 and see if the issue
persists?

Regards,
Hari.

On Thu, Jan 15, 2015 at 2:27 PM, Gardiner, Judith <judithg at osc.edu> wrote:

>  We are running VASP successfully with mvapich2 1.9, but it fails with
> 2.1a and 2.1rc1.  It only happens when we use at least 40 processes.  We
> have 20 cores per node, so I've tried it with nodes=2:ppn=20,
> nodes=4:ppn=10, and nodes=8:ppn=5.  It fails on all of them.  The problem
> is repeatable.
>
>
>
> I've narrowed it down to a particular call to MPI_BCAST.  Rank 5 is
> broadcasting a single integer value.  The correct value (128) is received
> by all ranks running on the first node.  An incorrect value (2) is received
> by all ranks running on other nodes, including the root, rank 5, if it's
> not on the first node.  The return code is 0 on all nodes.
>
>
>
> The program loops through the ranks, with each rank broadcasting a vector
> length and then a vector.  The failure occurs when rank 5 broadcasts its
> vector length.  The program hangs on the next broadcast because of the
> incorrect lengths.
>
>
>
> I was unable to reproduce the problem in a toy program.
>
>
>
> Here's our version information.
>
>
>
> [r0111]$ mpiname -a
>
> MVAPICH2 2.1rc1 Thu Dec 18 20:00:00 EDT 2014 ch3:mrail
>
>
>
> Compilation
>
> CC: icc    -DNDEBUG -DNVALGRIND -g -O2
>
> CXX: icpc   -DNDEBUG -DNVALGRIND -g -O2
>
> F77: ifort -L/lib -L/lib   -g -O2
>
> FC: ifort   -g -O2
>
>
>
> Configuration
>
> --prefix=/usr/local/mvapich2/intel/15/2.1rc1-debug --enable-shared
> --with-mpe --enable-romio --with-file-system=ufs+nfs --enable-debuginfo
> --enable-g=dbg --enable-mpit-pvars=mv2
>
>
>
> Any suggestions?
>
>
>
> Judy
>
>
>
> --
>
> Judith D. Gardiner, Ph.D.
>
> Ohio Supercomputer Center
>
> 614-292-9623
>
> judithg at osc.edu
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150115/c26067ba/attachment.html>


More information about the mvapich-discuss mailing list