[mvapich-discuss] Slow MPI_Put for short msgs on Infiniband

Hajime Fujita hfujita at uchicago.edu
Thu Oct 24 15:38:13 EDT 2013


Thanks Jeff,

Here's the result from your version. It looks like the results are 
almost the same.

RMA-based put
16, 17.240144
32, 4.292729
64, 1.077282
128, 0.270268
256, 0.068225
512, 0.016904
1024, 0.004233
2048, 0.001371
4096, 0.000821
8192, 0.000612
16384, 0.000764
32768, 0.000339
65536, 0.000263
131072, 0.000225
262144, 0.000218
524288, 0.000217
1048576, 0.000218


Thanks,
Hajime

On 10/24/2013 11:39 AM, Jeff Hammond wrote:
> It is not the source of your issue, but the code you use for Put never
> requires remote completion.  There are some other things worth adding
> for performance purposes.  I made some suggested changes below.
>
> If MVAPICH allocates registered memory during Win_allocate, then this
> code should be faster.  I do not believe that MPI_Alloc_mem does not
> right now.
>
> Jeff
>
> static void rma_put(int sender_rank, int receiver_rank)
> {
>      MPI_Win win;
>      int *buff;
>      int w;
>
> #ifdef OLD
>      MPI_Alloc_mem(COUNT_EACH * sizeof(int), MPI_INFO_NULL, &buff);
>      MPI_Win_create(buff, COUNT_EACH * sizeof(int), sizeof(int),
> MPI_INFO_NULL, MPI_COMM_WORLD, &win);
> #else
>      MPI_Info info;
>      MPI_Info_create(&info);
>      MPI_Info_set(info, "same_size", "true");
>      MPI_Win_allocate(COUNT_EACH * sizeof(int), sizeof(int), info,
> MPI_COMM_WORLD, &buff, &win);
> #endif
>
> #ifdef OLD
>      MPI_Win_lock_all(0, win);
> #else
>      MPI_Win_lock_all(MPI_MODE_NOCHECK, win);
> #endif
>
>      if (my_rank != sender_rank)
>          goto skip;
>
>      for (w = 4; w <= MAXWIDTH; w *= 2) {
>          double tsum = 0.0, tb, te;
>          int i;
>
>          for (i = 0; i < N_LOOP; i++) {
>              int s;
>              tb = MPI_Wtime();
>              for (s = 0; s + w <= COUNT_EACH; s += w)
>                  MPI_Put(buff + s * sizeof(int), w, MPI_INT,
>                      receiver_rank, s, w, MPI_INT, win);
>              MPI_Win_flush_local(receiver_rank, win);
>              te = MPI_Wtime();
>              tsum += te - tb;
>          }
> #ifndef OLD
>          MPI_Win_flush_all(win);
> #endif
>          printf("%zu, %f\n", w * sizeof(int), tsum / N_LOOP);
>      }
>
> skip:
>
>      MPI_Win_unlock_all(win);
>
>      MPI_Win_free(&win);
> #ifdef OLD
>      MPI_Free_mem(buff);
> #else
>      MPI_Info_free(&info);
> #endif
> }
>
>
> On Wed, Oct 23, 2013 at 1:24 PM, Hajime Fujita <hfujita at uchicago.edu> wrote:
>> Hi,
>>
>> I'm currently using MVAPICH2-2.0a on Midway cluster [1] at UChicago.
>> I found that MPI_Put performance for small message size was terribly bad on
>> Infiniband.
>>
>> When I run the attached benchmark program with 2 nodes, I got the following
>> result. The first number in each line is access size (in bytes) and the
>> second number is time (in seconds) to send 1MB of buffer.
>> When I launch 2 processes in a single node, MPI_Put performance is almost
>> similar to send/recv.
>>
>> I'd like to know if this is a natual (anavoidable) behavior, or if there is
>> any way to avoid/mitigate this performance penalty (e.g. by tweaking some
>> build-time/runtime parameter).
>>
>>
>> Message-based send/recv
>> 4, 0.248301
>> 8, 0.118962
>> 16, 0.213744
>> 32, 0.378181
>> 64, 0.045802
>> 128, 0.016429
>> 256, 0.013882
>> 512, 0.006235
>> 1024, 0.002326
>> 2048, 0.001569
>> 4096, 0.000832
>> 8192, 0.000414
>> 16384, 0.001361
>> 32768, 0.000745
>> 65536, 0.000486
>> 131072, 0.000365
>> 262144, 0.000305
>> 524288, 0.000272
>> 1048576, 0.000260
>> RMA-based put
>> 16, 18.282146
>> 32, 4.329981
>> 64, 1.085714
>> 128, 0.273277
>> 256, 0.070170
>> 512, 0.017509
>> 1024, 0.004376
>> 2048, 0.001390
>> 4096, 0.000537
>> 8192, 0.000314
>> 16384, 0.000525
>> 32768, 0.000360
>> 65536, 0.000278
>> 131072, 0.000240
>> 262144, 0.000230
>> 524288, 0.000228
>> 1048576, 0.000228
>>
>>
>> MVAPICH version and configuration information are as follows:
>> [hfujita at midway-login1 mpimbench]$ mpichversion
>> MVAPICH2 Version:       2.0a
>> MVAPICH2 Release date:  unreleased development copy
>> MVAPICH2 Device:        ch3:mrail
>> MVAPICH2 configure:     --prefix=/software/mvapich2-2.0-el6-x86_64
>> --enable-shared
>> MVAPICH2 CC:    cc    -DNDEBUG -DNVALGRIND -O2
>> MVAPICH2 CXX:   c++   -DNDEBUG -DNVALGRIND -O2
>> MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>> MVAPICH2 FC:    gfortran   -O2
>>
>> Please let me know if you need more information about the environment.
>>
>>
>> [1]: http://rcc.uchicago.edu/resources/midway_specs.html
>>
>>
>> Thanks,
>> Hajime
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>
>



More information about the mvapich-discuss mailing list