[mvapich-discuss] MPI_Type_vector() function uses dynamic allocation?

Wed Jan 21 15:54:18 EST 2015

Hi Davide,

The design chose of MVAPICH2-GDR is to not preallocate GPU buffers as the
GPU memory is limited. Thus all the internal buffers are dynamically
allocated and freed on on-demand manner.

However, if there is enough use case of having the pack/unpack buffers
statically allocated, we can propose such feature in MVAPICH2 and
MVAPICH2-GDR.

If anyone else in the list would like to have this feature in
MVAPICH2/MVAPICH2-GDR, please let us know.

Thanks

On Wed, Jan 21, 2015 at 5:35 AM, Davide Marchi <
davide.marchi at student.unife.it> wrote:

> Hi,
>
> I'm using the MPI_Type_vector() Derived Datatypes for the exchange of
> non-contiguous buffers between two GPU with MPI_Sendrecv.
>
> Using the profiler (see below nvprof trace) I see that the Derived
> Datatypes use buffers with dynamic allocation because every time I call the
> MPI_Sendrecv function there are two cudaMalloc and two cudaFree (one for
> "pack" function and one for "unpack" function).
>
> There is any method or flag to set for use static buffer instead of
> dynamic buffer ?
>
> Thanks for your time and your work.
>
>
> ----------------------------------------------------------------------------------------------------------------
>
> ==10586== Profiling result:
>    Start  Duration  Grid Size  Block Size  Regs* SSMem*  DSMem*      Size
> Throughput  Context  Stream Name
> 4.61251s  540.75us          -           -      - -       -
> -           -        -       - cudaMalloc
> 4.61305s  1.5520us          -           -      - -       -
> -           -        -       - cudaConfigureCall
> 4.61306s  1.2080us          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61306s     380ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61306s     488ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61306s     344ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61306s     414ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61306s     488ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61306s  54.774us          -           -      - -       -
> -           -        -       - cudaLaunch
> (pack_unpack_vector_double(double*, int, double*, int, int, int) [2905])
> 4.61312s  1.3654ms          -           -      - -       -
> -           -        -       - cudaStreamSynchronize
> 4.61312s  1.3595ms (1 2048 1)  (1 1024 1)     10 0B      0B
> -           -        1      13 pack_unpack_vector_double(double*, int,
> double*, int, int, int) [2905]
> 4.61450s  1.8220us          -           -      - -       -
> -           -        -       - cuPointerGetAttribute
> 4.61450s     600ns          -           -      - -       -
> -           -        -       - cuPointerGetAttribute
> 4.61476s  371.55us          -           -      - -       -
> -           -        -       - cudaMalloc
> 4.61514s  1.0600us          -           -      - -       -
> -           -        -       - cuPointerGetAttribute
> 4.61514s     610ns          -           -      - -       -
> -           -        -       - cuPointerGetAttribute
> 4.61515s  8.1800us          -           -      - -       -
> -           -        -       - cudaStreamWaitEvent
> 4.61516s  32.070us          -           -      - -       -
> -           -        -       - cudaMemcpyAsync
> 4.61519s     832ns          -           -      - -       -  16.777MB
> 2e+04GB/s        1      13  [CUDA memcpy DtoD]
>
> 4.61543s  3.7889ms          -           -      - -       -  16.777MB
> 4.4280GB/s        1      14  [CUDA memcpy DtoD]
>
> 4.61924s     772ns          -           -      - -       -
> -           -        -       - cudaConfigureCall
> 4.61924s     668ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61924s     372ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61924s     344ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61924s     334ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61924s     370ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61924s     348ns          -           -      - -       -
> -           -        -       - cudaSetupArgument
> 4.61925s  24.304us          -           -      - -       -
> -           -        -       - cudaLaunch
> (pack_unpack_vector_double(double*, int, double*, int, int, int) [5463])
> 4.61927s  2.5708ms          -           -      - -       -
> -           -        -       - cudaStreamSynchronize
> 4.61927s  2.5633ms (1 2048 1)  (1 1024 1)     10 0B      0B
> -           -        1      14 pack_unpack_vector_double(double*, int,
> double*, int, int, int) [5463]
> 4.62185s  209.30us          -           -      - -       -
> -           -        -       - cudaFree
> 4.62207s  164.19us          -           -      - -       -
> -           -        -       - cudaFree
>
>
>
> --
> Davide
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>

-- 
 K.H
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150121/548d964c/attachment-0001.html>