[mvapich-discuss] Slow MPI_Put for short msgs on Infiniband
Hajime Fujita
hfujita at uchicago.edu
Thu Oct 24 15:38:17 EDT 2013
Hi Mingzhe,
Basically I just wanted to know the basic performance characteristics. I
expected that RMA performance was almost the same or even slightly
better than send/recv, but the reality was totally different, so I was
surprised.
FYI, We are currently developing a Global View Resilience framework [1],
which provides a global array view to applications and extensively
utilizes MPI-3 RMA operations to implement a global array. That's why
I'm curious about MPI-level RMA performance in any message size.
[1] http://gvr.cs.uchicago.edu
Thanks,
Hajime
On 10/24/2013 11:39 AM, Mingzhe Li wrote:
> Hi Fujita,
>
> I was able to see the same behavior as you. I took a look at your
> benchmark and fount that the number of short messages issued is much
> larger than that of large messages. The overhead of sending 1K 1bytes
> small messages back to back is much higher than sending one 1KB message.
> That's why the latency is high for small messages in the one sided
> operation. We will take a look at this.
>
> One question I have is what's the use case for this benchmark? What kind
> of application use this pattern? Or you are just trying do some
> comparison between two sided send/recv and one sided?
>
> Thanks,
> Mingzhe
>
>> *From: *Hajime Fujita <hfujita at uchicago.edu
>> <mailto:hfujita at uchicago.edu>>
>> *Subject: **[mvapich-discuss] Slow MPI_Put for short msgs on
>> Infiniband*
>> *Date: *October 23, 2013 at 2:24:51 PM EDT
>> *To: *<mvapich-discuss at cse.ohio-state.edu
>> <mailto:mvapich-discuss at cse.ohio-state.edu>>
>>
>> Hi,
>>
>> I'm currently using MVAPICH2-2.0a on Midway cluster [1] at UChicago.
>> I found that MPI_Put performance for small message size was
>> terribly bad on Infiniband.
>>
>> When I run the attached benchmark program with 2 nodes, I got the
>> following result. The first number in each line is access size (in
>> bytes) and the second number is time (in seconds) to send 1MB of
>> buffer.
>> When I launch 2 processes in a single node, MPI_Put performance is
>> almost similar to send/recv.
>>
>> I'd like to know if this is a natual (anavoidable) behavior, or if
>> there is any way to avoid/mitigate this performance penalty (e.g.
>> by tweaking some build-time/runtime parameter).
>>
>>
>> Message-based send/recv
>> 4, 0.248301
>> 8, 0.118962
>> 16, 0.213744
>> 32, 0.378181
>> 64, 0.045802
>> 128, 0.016429
>> 256, 0.013882
>> 512, 0.006235
>> 1024, 0.002326
>> 2048, 0.001569
>> 4096, 0.000832
>> 8192, 0.000414
>> 16384, 0.001361
>> 32768, 0.000745
>> 65536, 0.000486
>> 131072, 0.000365
>> 262144, 0.000305
>> 524288, 0.000272
>> 1048576, 0.000260
>> RMA-based put
>> 16, 18.282146
>> 32, 4.329981
>> 64, 1.085714
>> 128, 0.273277
>> 256, 0.070170
>> 512, 0.017509
>> 1024, 0.004376
>> 2048, 0.001390
>> 4096, 0.000537
>> 8192, 0.000314
>> 16384, 0.000525
>> 32768, 0.000360
>> 65536, 0.000278
>> 131072, 0.000240
>> 262144, 0.000230
>> 524288, 0.000228
>> 1048576, 0.000228
>>
>>
>> MVAPICH version and configuration information are as follows:
>> [hfujita at midway-login1 mpimbench]$ mpichversion
>> MVAPICH2 Version: 2.0a
>> MVAPICH2 Release date:unreleased development copy
>> MVAPICH2 Device: ch3:mrail
>> MVAPICH2 configure: --prefix=/software/mvapich2-2.0-el6-x86_64
>> --enable-shared
>> MVAPICH2 CC: cc -DNDEBUG -DNVALGRIND -O2
>> MVAPICH2 CXX: c++ -DNDEBUG -DNVALGRIND -O2
>> MVAPICH2 F77: gfortran -L/lib -L/lib -O2
>> MVAPICH2 FC: gfortran -O2
>>
>> Please let me know if you need more information about the environment.
>>
>>
>> [1]: http://rcc.uchicago.edu/resources/midway_specs.html
>>
>>
>> Thanks,
>> Hajime
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
More information about the mvapich-discuss
mailing list