[mvapich-discuss] ISend and IRecv not finishing in Multithread-MPI

Uday R Bondhugula uday at csa.iisc.ernet.in
Tue Nov 5 04:57:35 EST 2013


We'll be able to try it with mvapich2-2.0a and let you know shortly. Thanks.

~ Uday

On Monday 04 November 2013 12:22 AM, Roshan Dathathri wrote:
> Hi Hari,
>
> Thanks for responding. It would be difficult to upgrade the software in
> the cluster at the moment, since many of us are targeting deadlines in
> the near future, and we wouldn't want to change the status-quo in the
> cluster. So, we would prefer if it works with the current version.
>
> Here is the output of mpiname -a:
> MVAPICH2 1.8.1 Thu Sep 27 18:55:23 EDT 2012 ch3:mrail
>
> Compilation
> CC: gcc    -DNDEBUG -DNVALGRIND -O2
> CXX: c++   -DNDEBUG -DNVALGRIND -O2
> F77: gfortran   -O2
> FC: gfortran   -O2
>
> Configuration
>
> Please find attached the source files.
> Note: The source files use Isend() since that is sufficient for
> correctness. Irsend() was only used to debug the issue; all Isend()
> calls can be replaced with Irsend() calls if required.
> To compile:
> mpicc -cc=icpc -D__MPI -O3 -fp-model precise -ansi-alias -ipo  -openmp
> -openmp-link=static -D__USE_BLOCK_CYCLIC
> -D__DYNSCHEDULER_DEDICATED_RECEIVER -DTIME -DPOLYBENCH_USE_SCALAR_LB
> -DPOLYBENCH_TIME polybench.c cholesky.dist_dynsched.c
> sigma_cholesky.dist_dynsched.c pi_cholesky.dist_dynsched.c polyrt.c -o
> dist_dynsched -ltbb -lm
> Optional flags (for generating debug logs): -D__DEBUG_FLUSH
> -D__DYNSCHEDULER_DEBUG_PRINT -D__DYNSCHEDULER_MORE_DEBUG_PRINT
> Example usage:
> mpirun_rsh  -np 32 -hostfile hosts MV2_ENABLE_AFFINITY=0
> OMP_NUM_THREADS=8 ./dist_dynsched 2> out_dist_dynsched
>
>
>
> On Sun, Nov 3, 2013 at 8:00 PM, Hari Subramoni <subramoni.1 at osu.edu
> <mailto:subramoni.1 at osu.edu>> wrote:
>
>     Hi Roshan,
>
>     MVAPICH2-1.8.1 is rather old and we have gone across multiple MPICH
>     releases after that as well. Could you please try with the latest
>     version of MVAPICH2 (MVPICH2-2.0a) and let us know if the issue
>     still happens? The latest version of the code is available for
>     download from the following site -
>     http://mvapich.cse.ohio-state.edu/download/mvapich2/.
>
>     In the mean time, we can also try to reproduce the error on our
>     side. Could you please send your code with the detailed build
>     instructions? Could you also let us know how you had built your
>     version of MVAPICH2? You can execute mpiname -a to obtain the
>     MVAPICH2 build information.
>
>     Thanks,
>     Hari.
>
>     ------------
>
>     Hi,
>
>     I am running a multi-threaded MPI program using MVAPICH2 1.8.1
>     with MV2_ENABLE_AFFINITY=0. In each MPI node, there are multiple
>     threads - one of these posts Irecv() to receive data from other nodes
>     while the rest could post Irsend() (ready mode) to send data to the other
>     nodes. Each thread periodically checks whether the posted
>     communication calls have been completed using Test(). The application
>     hangs since some of the sends and receives posted have not completed.
>     Here are the statistics collected from debug logs (one per node) that
>     were generated from an execution of the program:
>     Across all nodes, total number of :-
>     Irsend() posted: 50339
>     Irecv() posted with matched Irsend(): 50339 (since it is ready mode)
>     (more Irecv() could have been posted)
>     Irsend() completed: 48062
>     Irecv() completed: 47296
>     For multiple runs on the same number of nodes, this behavior is
>     consistent; though the actual numbers vary a lot, the relative difference
>     does not vary by much.
>     The behavior is similar if Irsend() is replaced with Issend() or Isend().
>     The return value of all MPI calls are checked for errors. None of the calls
>     return an error for the execution in consideration.
>
>     What could be the issue for this unexpected behavior? Are there any
>     compiler or runtime flags that would help debugging the issue?
>
>     Machine information:
>     32-node InfiniBand cluster of dual-SMP Xeon servers. Each node on the
>     cluster consists of two quad-core Intel Xeon E5430 2.66 GHz processors
>     with 12 MB L2 cache and 16 GB RAM. The InfiniBand host adapter is a
>     Mellanox MT25204 (InfiniHost III Lx HCA).
>     The program was run on 32 nodes with 8 OpenMP threads on each node.
>
>     Application information:
>     A single thread on each node posts multiple anonymous Irecv()
>     preemptively. Once it is receives data, it can produce tasks which need
>     to be computed. The rest of the threads consume/compute these tasks,
>     and can produce more tasks and post multiple Irsend().
>     There is no wait or sleep anywhere in the program; the threads are
>     spinning or busy-waiting.
>
>     I can share the debug logs if required. Each log is a text file of around
>     6MB with detailed information of the execution on that node.
>     I can also share the source files if required. All the source files put
>     together would be a few thousand lines of code.
>
>     Please let me know if you need more information.
>
>     --
>     Thanks,
>     Roshan
>
>
>     --
>     This message has been scanned for viruses and
>     dangerous content by *MailScanner* <http://www.mailscanner.info/>,
>     and is
>     believed to be clean.
>
>
>
>
> --
> Thanks,
> Roshan
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the mvapich-discuss mailing list