[mvapich-discuss] MPI_Recv inconsistent performance

Mon Mar 7 15:28:01 EST 2016

Hari,
Thank you for the quick response. I’ve answered the questions below.

1. How many processes is involved in the test?
               A: 4 processes running on a single machine with 48 cores and 512GB RAM

2. Are the sender / receiver processes running on the same set of physical nodes for the runs when you observe the 2ms & 200ms execution times?
              A: yes, in the 4 process test

3. What is the sender doing? Is it posting the send at the same time in both cases? The MPI_Recv will take more time if the sender is late to send the data.
              A: each process: a) computes locally, b) receives data from from children, c)  reduces data, and d) sends result to parent

                   a. since we are testing a degenerate case, this step takes less than 500 microseconds
                   b. the receive step is the unpredictable step in terms of time. we receive an integer with a value of 0 from each child. 
                   c. since we are testing a degenerate case (nothing to reduce), the reduce step takes less than 100 microseconds
                   d. since we are testing a degenerate case (nothing to send), we send an integer with value of 0.

            this is the degenerate case, in most other cases, each of the compute steps will take more time, as well as the comm steps (receive and send)

4. How many messages are being sent back to back? Is it just one message or are many messages being sent?
            A: see #3 describing the flow.

5. Is it possible to get a test code to try out locally? 
            A:  we can partition the code to get a test program, will take some work to do that.

On a different note, is there a reason why you do an MPI_Probe before the MPI_Recv? 
            A. probe and receive vs receive alone: the probe is necessary because there are different sequenced message types that a given process may expect from each of the children. a given message type will provide info necessary to receive the next message type in the queue. if this scenario can be implemented with just a receive, that is preferred.

> On Mar 7, 2016, at 1:48 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:
> 
> Hello David,
> 
> Thanks for your report. I have a few follow up questions for you
> 
> 1. How many processes is involved in the test?
> 
> 2. Are the sender / receiver processes running on the same set of physical nodes for the runs when you observe the 2ms & 200ms execution times?
> 
> 3. What is the sender doing? Is it posting the send at the same time in both cases? The MPI_Recv will take more time if the sender is late to send the data.
> 
> 4. How many messages are being sent back to back? Is it just one message or are many messages being sent?
> 
> 5. Is it possible to get a test code to try out locally?
> 
> On a different note, is there a reason why you do an MPI_Probe before the MPI_Recv? Is your code doing is something like the following?
> 
> MPI_Probe
> <compute>
> MPI_Recv
> 
> If there is no compute between, then you don't need the MPI_Probe. The MPI_Recv being a blocking call should do the necessary communication progress.
> 
> Thx,
> Hari.
>  
> From: mvapich-discuss-bounces at cse.ohio-state.edu <mailto:mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of David Winslow [david.winslow at serendipitynow.com <mailto:david.winslow at serendipitynow.com>]
> Sent: Monday, March 07, 2016 1:27 PM
> To: mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu>
> Subject: [mvapich-discuss] MPI_Recv inconsistent performance
> 
> All,
> We are experiencing dramatic differences in performance of MPI_Recv between calls. With a cluster of 4 nodes (all on same box), we use MPI_Recv to receive a message. We timed the receive function and we see times from 2ms to greater than 200ms to receive the exact same message (a 4 byte integer).  We don’t understand what could cause this significant disparity and unpredictability between the calls.
> 
> Pseudo code
> 
> Get_time()
> MPI_Probe()
> MPI_Recv()
> Get_time()
> 
> Is there a way to control or configure MVAPICH2 to make the behavior of MPI_Probe/MPI_Recv more predictable/consistent?
> 
> output of mpiname -a
> 
> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:psm
> 
> Compilation
> CC: gcc    -DNDEBUG -DNVALGRIND -O2
> CXX: g++   -DNDEBUG -DNVALGRIND -O2
> F77:
> FC:
> 
> Configuration
> --prefix=/opt/mvapich2-2.2b-install-psm --with-device=ch3:psm --disable-fortran
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160307/230b2542/attachment-0001.html>