[mvapich-discuss] Slow MPI_Put for short msgs on Infiniband

Jeff Hammond jeff.science at gmail.com
Thu Oct 24 12:39:08 EDT 2013


It is not the source of your issue, but the code you use for Put never
requires remote completion.  There are some other things worth adding
for performance purposes.  I made some suggested changes below.

If MVAPICH allocates registered memory during Win_allocate, then this
code should be faster.  I do not believe that MPI_Alloc_mem does not
right now.

Jeff

static void rma_put(int sender_rank, int receiver_rank)
{
    MPI_Win win;
    int *buff;
    int w;

#ifdef OLD
    MPI_Alloc_mem(COUNT_EACH * sizeof(int), MPI_INFO_NULL, &buff);
    MPI_Win_create(buff, COUNT_EACH * sizeof(int), sizeof(int),
MPI_INFO_NULL, MPI_COMM_WORLD, &win);
#else
    MPI_Info info;
    MPI_Info_create(&info);
    MPI_Info_set(info, "same_size", "true");
    MPI_Win_allocate(COUNT_EACH * sizeof(int), sizeof(int), info,
MPI_COMM_WORLD, &buff, &win);
#endif

#ifdef OLD
    MPI_Win_lock_all(0, win);
#else
    MPI_Win_lock_all(MPI_MODE_NOCHECK, win);
#endif

    if (my_rank != sender_rank)
        goto skip;

    for (w = 4; w <= MAXWIDTH; w *= 2) {
        double tsum = 0.0, tb, te;
        int i;

        for (i = 0; i < N_LOOP; i++) {
            int s;
            tb = MPI_Wtime();
            for (s = 0; s + w <= COUNT_EACH; s += w)
                MPI_Put(buff + s * sizeof(int), w, MPI_INT,
                    receiver_rank, s, w, MPI_INT, win);
            MPI_Win_flush_local(receiver_rank, win);
            te = MPI_Wtime();
            tsum += te - tb;
        }
#ifndef OLD
        MPI_Win_flush_all(win);
#endif
        printf("%zu, %f\n", w * sizeof(int), tsum / N_LOOP);
    }

skip:

    MPI_Win_unlock_all(win);

    MPI_Win_free(&win);
#ifdef OLD
    MPI_Free_mem(buff);
#else
    MPI_Info_free(&info);
#endif
}


On Wed, Oct 23, 2013 at 1:24 PM, Hajime Fujita <hfujita at uchicago.edu> wrote:
> Hi,
>
> I'm currently using MVAPICH2-2.0a on Midway cluster [1] at UChicago.
> I found that MPI_Put performance for small message size was terribly bad on
> Infiniband.
>
> When I run the attached benchmark program with 2 nodes, I got the following
> result. The first number in each line is access size (in bytes) and the
> second number is time (in seconds) to send 1MB of buffer.
> When I launch 2 processes in a single node, MPI_Put performance is almost
> similar to send/recv.
>
> I'd like to know if this is a natual (anavoidable) behavior, or if there is
> any way to avoid/mitigate this performance penalty (e.g. by tweaking some
> build-time/runtime parameter).
>
>
> Message-based send/recv
> 4, 0.248301
> 8, 0.118962
> 16, 0.213744
> 32, 0.378181
> 64, 0.045802
> 128, 0.016429
> 256, 0.013882
> 512, 0.006235
> 1024, 0.002326
> 2048, 0.001569
> 4096, 0.000832
> 8192, 0.000414
> 16384, 0.001361
> 32768, 0.000745
> 65536, 0.000486
> 131072, 0.000365
> 262144, 0.000305
> 524288, 0.000272
> 1048576, 0.000260
> RMA-based put
> 16, 18.282146
> 32, 4.329981
> 64, 1.085714
> 128, 0.273277
> 256, 0.070170
> 512, 0.017509
> 1024, 0.004376
> 2048, 0.001390
> 4096, 0.000537
> 8192, 0.000314
> 16384, 0.000525
> 32768, 0.000360
> 65536, 0.000278
> 131072, 0.000240
> 262144, 0.000230
> 524288, 0.000228
> 1048576, 0.000228
>
>
> MVAPICH version and configuration information are as follows:
> [hfujita at midway-login1 mpimbench]$ mpichversion
> MVAPICH2 Version:       2.0a
> MVAPICH2 Release date:  unreleased development copy
> MVAPICH2 Device:        ch3:mrail
> MVAPICH2 configure:     --prefix=/software/mvapich2-2.0-el6-x86_64
> --enable-shared
> MVAPICH2 CC:    cc    -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 CXX:   c++   -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
> MVAPICH2 FC:    gfortran   -O2
>
> Please let me know if you need more information about the environment.
>
>
> [1]: http://rcc.uchicago.edu/resources/midway_specs.html
>
>
> Thanks,
> Hajime
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



-- 
Jeff Hammond
jeff.science at gmail.com


More information about the mvapich-discuss mailing list