[mvapich-discuss] MPI_Type_vector() function uses dynamic allocation?
Davide Marchi
davide.marchi at student.unife.it
Wed Jan 21 05:35:55 EST 2015
Hi,
I'm using the MPI_Type_vector() Derived Datatypes for the exchange of
non-contiguous buffers between two GPU with MPI_Sendrecv.
Using the profiler (see below nvprof trace) I see that the Derived
Datatypes use buffers with dynamic allocation because every time I call the
MPI_Sendrecv function there are two cudaMalloc and two cudaFree (one for
"pack" function and one for "unpack" function).
There is any method or flag to set for use static buffer instead of dynamic
buffer ?
Thanks for your time and your work.
----------------------------------------------------------------------------------------------------------------
==10586== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size
Throughput Context Stream Name
4.61251s 540.75us - - - - -
- - - - cudaMalloc
4.61305s 1.5520us - - - - -
- - - - cudaConfigureCall
4.61306s 1.2080us - - - - -
- - - - cudaSetupArgument
4.61306s 380ns - - - - -
- - - - cudaSetupArgument
4.61306s 488ns - - - - -
- - - - cudaSetupArgument
4.61306s 344ns - - - - -
- - - - cudaSetupArgument
4.61306s 414ns - - - - -
- - - - cudaSetupArgument
4.61306s 488ns - - - - -
- - - - cudaSetupArgument
4.61306s 54.774us - - - - -
- - - - cudaLaunch
(pack_unpack_vector_double(double*, int, double*, int, int, int) [2905])
4.61312s 1.3654ms - - - - -
- - - - cudaStreamSynchronize
4.61312s 1.3595ms (1 2048 1) (1 1024 1) 10 0B 0B
- - 1 13 pack_unpack_vector_double(double*, int,
double*, int, int, int) [2905]
4.61450s 1.8220us - - - - -
- - - - cuPointerGetAttribute
4.61450s 600ns - - - - -
- - - - cuPointerGetAttribute
4.61476s 371.55us - - - - -
- - - - cudaMalloc
4.61514s 1.0600us - - - - -
- - - - cuPointerGetAttribute
4.61514s 610ns - - - - -
- - - - cuPointerGetAttribute
4.61515s 8.1800us - - - - -
- - - - cudaStreamWaitEvent
4.61516s 32.070us - - - - -
- - - - cudaMemcpyAsync
4.61519s 832ns - - - - - 16.777MB
2e+04GB/s 1 13 [CUDA memcpy DtoD]
4.61543s 3.7889ms - - - - - 16.777MB
4.4280GB/s 1 14 [CUDA memcpy DtoD]
4.61924s 772ns - - - - -
- - - - cudaConfigureCall
4.61924s 668ns - - - - -
- - - - cudaSetupArgument
4.61924s 372ns - - - - -
- - - - cudaSetupArgument
4.61924s 344ns - - - - -
- - - - cudaSetupArgument
4.61924s 334ns - - - - -
- - - - cudaSetupArgument
4.61924s 370ns - - - - -
- - - - cudaSetupArgument
4.61924s 348ns - - - - -
- - - - cudaSetupArgument
4.61925s 24.304us - - - - -
- - - - cudaLaunch
(pack_unpack_vector_double(double*, int, double*, int, int, int) [5463])
4.61927s 2.5708ms - - - - -
- - - - cudaStreamSynchronize
4.61927s 2.5633ms (1 2048 1) (1 1024 1) 10 0B 0B
- - 1 14 pack_unpack_vector_double(double*, int,
double*, int, int, int) [5463]
4.62185s 209.30us - - - - -
- - - - cudaFree
4.62207s 164.19us - - - - -
- - - - cudaFree
--
Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150121/b914f2c4/attachment.html>
More information about the mvapich-discuss
mailing list