[mvapich-discuss] ISend and IRecv not finishing in Multithread-MPI
Uday R Bondhugula
uday at csa.iisc.ernet.in
Tue Nov 5 04:57:35 EST 2013
We'll be able to try it with mvapich2-2.0a and let you know shortly. Thanks.
~ Uday
On Monday 04 November 2013 12:22 AM, Roshan Dathathri wrote:
> Hi Hari,
>
> Thanks for responding. It would be difficult to upgrade the software in
> the cluster at the moment, since many of us are targeting deadlines in
> the near future, and we wouldn't want to change the status-quo in the
> cluster. So, we would prefer if it works with the current version.
>
> Here is the output of mpiname -a:
> MVAPICH2 1.8.1 Thu Sep 27 18:55:23 EDT 2012 ch3:mrail
>
> Compilation
> CC: gcc -DNDEBUG -DNVALGRIND -O2
> CXX: c++ -DNDEBUG -DNVALGRIND -O2
> F77: gfortran -O2
> FC: gfortran -O2
>
> Configuration
>
> Please find attached the source files.
> Note: The source files use Isend() since that is sufficient for
> correctness. Irsend() was only used to debug the issue; all Isend()
> calls can be replaced with Irsend() calls if required.
> To compile:
> mpicc -cc=icpc -D__MPI -O3 -fp-model precise -ansi-alias -ipo -openmp
> -openmp-link=static -D__USE_BLOCK_CYCLIC
> -D__DYNSCHEDULER_DEDICATED_RECEIVER -DTIME -DPOLYBENCH_USE_SCALAR_LB
> -DPOLYBENCH_TIME polybench.c cholesky.dist_dynsched.c
> sigma_cholesky.dist_dynsched.c pi_cholesky.dist_dynsched.c polyrt.c -o
> dist_dynsched -ltbb -lm
> Optional flags (for generating debug logs): -D__DEBUG_FLUSH
> -D__DYNSCHEDULER_DEBUG_PRINT -D__DYNSCHEDULER_MORE_DEBUG_PRINT
> Example usage:
> mpirun_rsh -np 32 -hostfile hosts MV2_ENABLE_AFFINITY=0
> OMP_NUM_THREADS=8 ./dist_dynsched 2> out_dist_dynsched
>
>
>
> On Sun, Nov 3, 2013 at 8:00 PM, Hari Subramoni <subramoni.1 at osu.edu
> <mailto:subramoni.1 at osu.edu>> wrote:
>
> Hi Roshan,
>
> MVAPICH2-1.8.1 is rather old and we have gone across multiple MPICH
> releases after that as well. Could you please try with the latest
> version of MVAPICH2 (MVPICH2-2.0a) and let us know if the issue
> still happens? The latest version of the code is available for
> download from the following site -
> http://mvapich.cse.ohio-state.edu/download/mvapich2/.
>
> In the mean time, we can also try to reproduce the error on our
> side. Could you please send your code with the detailed build
> instructions? Could you also let us know how you had built your
> version of MVAPICH2? You can execute mpiname -a to obtain the
> MVAPICH2 build information.
>
> Thanks,
> Hari.
>
> ------------
>
> Hi,
>
> I am running a multi-threaded MPI program using MVAPICH2 1.8.1
> with MV2_ENABLE_AFFINITY=0. In each MPI node, there are multiple
> threads - one of these posts Irecv() to receive data from other nodes
> while the rest could post Irsend() (ready mode) to send data to the other
> nodes. Each thread periodically checks whether the posted
> communication calls have been completed using Test(). The application
> hangs since some of the sends and receives posted have not completed.
> Here are the statistics collected from debug logs (one per node) that
> were generated from an execution of the program:
> Across all nodes, total number of :-
> Irsend() posted: 50339
> Irecv() posted with matched Irsend(): 50339 (since it is ready mode)
> (more Irecv() could have been posted)
> Irsend() completed: 48062
> Irecv() completed: 47296
> For multiple runs on the same number of nodes, this behavior is
> consistent; though the actual numbers vary a lot, the relative difference
> does not vary by much.
> The behavior is similar if Irsend() is replaced with Issend() or Isend().
> The return value of all MPI calls are checked for errors. None of the calls
> return an error for the execution in consideration.
>
> What could be the issue for this unexpected behavior? Are there any
> compiler or runtime flags that would help debugging the issue?
>
> Machine information:
> 32-node InfiniBand cluster of dual-SMP Xeon servers. Each node on the
> cluster consists of two quad-core Intel Xeon E5430 2.66 GHz processors
> with 12 MB L2 cache and 16 GB RAM. The InfiniBand host adapter is a
> Mellanox MT25204 (InfiniHost III Lx HCA).
> The program was run on 32 nodes with 8 OpenMP threads on each node.
>
> Application information:
> A single thread on each node posts multiple anonymous Irecv()
> preemptively. Once it is receives data, it can produce tasks which need
> to be computed. The rest of the threads consume/compute these tasks,
> and can produce more tasks and post multiple Irsend().
> There is no wait or sleep anywhere in the program; the threads are
> spinning or busy-waiting.
>
> I can share the debug logs if required. Each log is a text file of around
> 6MB with detailed information of the execution on that node.
> I can also share the source files if required. All the source files put
> together would be a few thousand lines of code.
>
> Please let me know if you need more information.
>
> --
> Thanks,
> Roshan
>
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>,
> and is
> believed to be clean.
>
>
>
>
> --
> Thanks,
> Roshan
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the mvapich-discuss
mailing list