[mvapich-discuss] Slow MPI_Put for short msgs on Infiniband

Thu Oct 24 15:38:17 EDT 2013

Hi Mingzhe,

Basically I just wanted to know the basic performance characteristics. I 
expected that RMA performance was almost the same or even slightly 
better than send/recv, but the reality was totally different, so I was 
surprised.

FYI, We are currently developing a Global View Resilience framework [1], 
which provides a global array view to applications and extensively 
utilizes MPI-3 RMA operations to implement a global array. That's why 
I'm curious about MPI-level RMA performance in any message size.

[1] http://gvr.cs.uchicago.edu

Thanks,
Hajime

On 10/24/2013 11:39 AM, Mingzhe Li wrote:
> Hi Fujita,
>
> I was able to see the same behavior as you. I took a look at your
> benchmark and fount that the number of short messages issued is much
> larger than that of large messages. The overhead of sending 1K 1bytes
> small messages back to back is much higher than sending one 1KB message.
> That's why the latency is high for small messages in the one sided
> operation. We will take a look at this.
>
> One question I have is what's the use case for this benchmark? What kind
> of application use this pattern? Or you are just trying do some
> comparison between two sided send/recv and one sided?
>
> Thanks,
> Mingzhe
>
>>     *From: *Hajime Fujita <hfujita at uchicago.edu
>>     <mailto:hfujita at uchicago.edu>>
>>     *Subject: **[mvapich-discuss] Slow MPI_Put for short msgs on
>>     Infiniband*
>>     *Date: *October 23, 2013 at 2:24:51 PM EDT
>>     *To: *<mvapich-discuss at cse.ohio-state.edu
>>     <mailto:mvapich-discuss at cse.ohio-state.edu>>
>>
>>     Hi,
>>
>>     I'm currently using MVAPICH2-2.0a on Midway cluster [1] at UChicago.
>>     I found that MPI_Put performance for small message size was
>>     terribly bad on Infiniband.
>>
>>     When I run the attached benchmark program with 2 nodes, I got the
>>     following result. The first number in each line is access size (in
>>     bytes) and the second number is time (in seconds) to send 1MB of
>>     buffer.
>>     When I launch 2 processes in a single node, MPI_Put performance is
>>     almost similar to send/recv.
>>
>>     I'd like to know if this is a natual (anavoidable) behavior, or if
>>     there is any way to avoid/mitigate this performance penalty (e.g.
>>     by tweaking some build-time/runtime parameter).
>>
>>
>>     Message-based send/recv
>>     4, 0.248301
>>     8, 0.118962
>>     16, 0.213744
>>     32, 0.378181
>>     64, 0.045802
>>     128, 0.016429
>>     256, 0.013882
>>     512, 0.006235
>>     1024, 0.002326
>>     2048, 0.001569
>>     4096, 0.000832
>>     8192, 0.000414
>>     16384, 0.001361
>>     32768, 0.000745
>>     65536, 0.000486
>>     131072, 0.000365
>>     262144, 0.000305
>>     524288, 0.000272
>>     1048576, 0.000260
>>     RMA-based put
>>     16, 18.282146
>>     32, 4.329981
>>     64, 1.085714
>>     128, 0.273277
>>     256, 0.070170
>>     512, 0.017509
>>     1024, 0.004376
>>     2048, 0.001390
>>     4096, 0.000537
>>     8192, 0.000314
>>     16384, 0.000525
>>     32768, 0.000360
>>     65536, 0.000278
>>     131072, 0.000240
>>     262144, 0.000230
>>     524288, 0.000228
>>     1048576, 0.000228
>>
>>
>>     MVAPICH version and configuration information are as follows:
>>     [hfujita at midway-login1 mpimbench]$ mpichversion
>>     MVAPICH2 Version: 2.0a
>>     MVAPICH2 Release date:unreleased development copy
>>     MVAPICH2 Device: ch3:mrail
>>     MVAPICH2 configure: --prefix=/software/mvapich2-2.0-el6-x86_64
>>     --enable-shared
>>     MVAPICH2 CC: cc    -DNDEBUG -DNVALGRIND -O2
>>     MVAPICH2 CXX: c++   -DNDEBUG -DNVALGRIND -O2
>>     MVAPICH2 F77: gfortran -L/lib -L/lib   -O2
>>     MVAPICH2 FC: gfortran   -O2
>>
>>     Please let me know if you need more information about the environment.
>>
>>
>>     [1]: http://rcc.uchicago.edu/resources/midway_specs.html
>>
>>
>>     Thanks,
>>     Hajime
>>
>>     _______________________________________________
>>     mvapich-discuss mailing list
>>     mvapich-discuss at cse.ohio-state.edu
>>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>