[mvapich-discuss] MPI_Type_vector() function uses dynamic allocation?

Wed Jan 21 05:35:55 EST 2015

Hi,

I'm using the MPI_Type_vector() Derived Datatypes for the exchange of
non-contiguous buffers between two GPU with MPI_Sendrecv.

Using the profiler (see below nvprof trace) I see that the Derived
Datatypes use buffers with dynamic allocation because every time I call the
MPI_Sendrecv function there are two cudaMalloc and two cudaFree (one for
"pack" function and one for "unpack" function).

There is any method or flag to set for use static buffer instead of dynamic
buffer ?

Thanks for your time and your work.

----------------------------------------------------------------------------------------------------------------

==10586== Profiling result:
   Start  Duration  Grid Size  Block Size  Regs* SSMem*  DSMem*      Size
Throughput  Context  Stream Name
4.61251s  540.75us          -           -      - -       -
-           -        -       - cudaMalloc
4.61305s  1.5520us          -           -      - -       -
-           -        -       - cudaConfigureCall
4.61306s  1.2080us          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61306s     380ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61306s     488ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61306s     344ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61306s     414ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61306s     488ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61306s  54.774us          -           -      - -       -
-           -        -       - cudaLaunch
(pack_unpack_vector_double(double*, int, double*, int, int, int) [2905])
4.61312s  1.3654ms          -           -      - -       -
-           -        -       - cudaStreamSynchronize
4.61312s  1.3595ms (1 2048 1)  (1 1024 1)     10 0B      0B
-           -        1      13 pack_unpack_vector_double(double*, int,
double*, int, int, int) [2905]
4.61450s  1.8220us          -           -      - -       -
-           -        -       - cuPointerGetAttribute
4.61450s     600ns          -           -      - -       -
-           -        -       - cuPointerGetAttribute
4.61476s  371.55us          -           -      - -       -
-           -        -       - cudaMalloc
4.61514s  1.0600us          -           -      - -       -
-           -        -       - cuPointerGetAttribute
4.61514s     610ns          -           -      - -       -
-           -        -       - cuPointerGetAttribute
4.61515s  8.1800us          -           -      - -       -
-           -        -       - cudaStreamWaitEvent
4.61516s  32.070us          -           -      - -       -
-           -        -       - cudaMemcpyAsync
4.61519s     832ns          -           -      - -       -  16.777MB
2e+04GB/s        1      13  [CUDA memcpy DtoD]

4.61543s  3.7889ms          -           -      - -       -  16.777MB
4.4280GB/s        1      14  [CUDA memcpy DtoD]

4.61924s     772ns          -           -      - -       -
-           -        -       - cudaConfigureCall
4.61924s     668ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61924s     372ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61924s     344ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61924s     334ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61924s     370ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61924s     348ns          -           -      - -       -
-           -        -       - cudaSetupArgument
4.61925s  24.304us          -           -      - -       -
-           -        -       - cudaLaunch
(pack_unpack_vector_double(double*, int, double*, int, int, int) [5463])
4.61927s  2.5708ms          -           -      - -       -
-           -        -       - cudaStreamSynchronize
4.61927s  2.5633ms (1 2048 1)  (1 1024 1)     10 0B      0B
-           -        1      14 pack_unpack_vector_double(double*, int,
double*, int, int, int) [5463]
4.62185s  209.30us          -           -      - -       -
-           -        -       - cudaFree
4.62207s  164.19us          -           -      - -       -
-           -        -       - cudaFree

-- 
Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150121/b914f2c4/attachment.html>