[mvapich-discuss] mpi send/recieve hanging, how to diagnose?

Devendar Bureddy bureddy at cse.ohio-state.edu
Fri Jul 5 15:59:05 EDT 2013


Hi Ben

 Can you please tell us the configure and run-time flags?
 Can you try with a run time parameter MV2_DEFAULT_MAX_SEND_WQE=256 and see
if that changes the behavior?
 Do you know after how many MPI_Isends (out of large count of MPI_Isend)
the application is waiting on the completions?

-Devendar



On Fri, Jul 5, 2013 at 3:15 PM, Ben <Benjamin.M.Auer at nasa.gov> wrote:

> Hi,
> I'm currently having what seems to be an issue with mvapich.
> I'm part of a team that maintains a global climate model mostly written in
> Fortran 90/95. At a point in the code, there are
> large number of MPI_ISends/MPI_Recv (anywhere from thousands to hundreds
> of thousands) when when the data that is distributed across all mpi
> processes has to be collected on
> a particular processor to be transformed to a different resolution before
> being written.
> Above a certain resolution/number of mpiprocs the model simply hangs at
> the receive after the send.
> The strange thing this is that at the same resolution at lower processor
> count it works fine.
> For example at the troublesome resolution the model runs on 864 processors
> but hangs with 1536 processors.
> However, at a lower resolution the same code runs fine on 1536 processors
> and above.
> We are currently using the Intel 13 fortran compiler and had been using
> mvapich 1.8.1, although mvapich 1.9 also exhibits this behaviour. Does
> anyone have any suggests on how to diagnose what is going on or some
> parameters that we could play with that might help? This was perhaps a bit
> hand-wavy but we are rather stumped at this point how to proceed.
> Interestingly we have gotten the code to run with other mpi stacks at the
> resolution/processor count where mvapich hangs. I can provide more details
> if needed.
> Thanks
>
> --
> Ben Auer, PhD   SSAI, Scientific Programmer/Analyst
> NASA GSFC,  Global Modeling and Assimilation Office
> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
> Phone: 301-286-9176               Fax: 301-614-6246
>
> ______________________________**_________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-**state.edu <mvapich-discuss at cse.ohio-state.edu>
> http://mail.cse.ohio-state.**edu/mailman/listinfo/mvapich-**discuss<http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>



-- 
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130705/9fb0bef0/attachment-0001.html


More information about the mvapich-discuss mailing list