[mvapich-discuss] MPI_Type_vector() function uses dynamic allocation?
khaled hamidouche
khaledhamidouche at gmail.com
Wed Jan 21 15:54:18 EST 2015
Hi Davide,
The design chose of MVAPICH2-GDR is to not preallocate GPU buffers as the
GPU memory is limited. Thus all the internal buffers are dynamically
allocated and freed on on-demand manner.
However, if there is enough use case of having the pack/unpack buffers
statically allocated, we can propose such feature in MVAPICH2 and
MVAPICH2-GDR.
If anyone else in the list would like to have this feature in
MVAPICH2/MVAPICH2-GDR, please let us know.
Thanks
On Wed, Jan 21, 2015 at 5:35 AM, Davide Marchi <
davide.marchi at student.unife.it> wrote:
> Hi,
>
> I'm using the MPI_Type_vector() Derived Datatypes for the exchange of
> non-contiguous buffers between two GPU with MPI_Sendrecv.
>
> Using the profiler (see below nvprof trace) I see that the Derived
> Datatypes use buffers with dynamic allocation because every time I call the
> MPI_Sendrecv function there are two cudaMalloc and two cudaFree (one for
> "pack" function and one for "unpack" function).
>
> There is any method or flag to set for use static buffer instead of
> dynamic buffer ?
>
> Thanks for your time and your work.
>
>
> ----------------------------------------------------------------------------------------------------------------
>
> ==10586== Profiling result:
> Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size
> Throughput Context Stream Name
> 4.61251s 540.75us - - - - -
> - - - - cudaMalloc
> 4.61305s 1.5520us - - - - -
> - - - - cudaConfigureCall
> 4.61306s 1.2080us - - - - -
> - - - - cudaSetupArgument
> 4.61306s 380ns - - - - -
> - - - - cudaSetupArgument
> 4.61306s 488ns - - - - -
> - - - - cudaSetupArgument
> 4.61306s 344ns - - - - -
> - - - - cudaSetupArgument
> 4.61306s 414ns - - - - -
> - - - - cudaSetupArgument
> 4.61306s 488ns - - - - -
> - - - - cudaSetupArgument
> 4.61306s 54.774us - - - - -
> - - - - cudaLaunch
> (pack_unpack_vector_double(double*, int, double*, int, int, int) [2905])
> 4.61312s 1.3654ms - - - - -
> - - - - cudaStreamSynchronize
> 4.61312s 1.3595ms (1 2048 1) (1 1024 1) 10 0B 0B
> - - 1 13 pack_unpack_vector_double(double*, int,
> double*, int, int, int) [2905]
> 4.61450s 1.8220us - - - - -
> - - - - cuPointerGetAttribute
> 4.61450s 600ns - - - - -
> - - - - cuPointerGetAttribute
> 4.61476s 371.55us - - - - -
> - - - - cudaMalloc
> 4.61514s 1.0600us - - - - -
> - - - - cuPointerGetAttribute
> 4.61514s 610ns - - - - -
> - - - - cuPointerGetAttribute
> 4.61515s 8.1800us - - - - -
> - - - - cudaStreamWaitEvent
> 4.61516s 32.070us - - - - -
> - - - - cudaMemcpyAsync
> 4.61519s 832ns - - - - - 16.777MB
> 2e+04GB/s 1 13 [CUDA memcpy DtoD]
>
> 4.61543s 3.7889ms - - - - - 16.777MB
> 4.4280GB/s 1 14 [CUDA memcpy DtoD]
>
> 4.61924s 772ns - - - - -
> - - - - cudaConfigureCall
> 4.61924s 668ns - - - - -
> - - - - cudaSetupArgument
> 4.61924s 372ns - - - - -
> - - - - cudaSetupArgument
> 4.61924s 344ns - - - - -
> - - - - cudaSetupArgument
> 4.61924s 334ns - - - - -
> - - - - cudaSetupArgument
> 4.61924s 370ns - - - - -
> - - - - cudaSetupArgument
> 4.61924s 348ns - - - - -
> - - - - cudaSetupArgument
> 4.61925s 24.304us - - - - -
> - - - - cudaLaunch
> (pack_unpack_vector_double(double*, int, double*, int, int, int) [5463])
> 4.61927s 2.5708ms - - - - -
> - - - - cudaStreamSynchronize
> 4.61927s 2.5633ms (1 2048 1) (1 1024 1) 10 0B 0B
> - - 1 14 pack_unpack_vector_double(double*, int,
> double*, int, int, int) [5463]
> 4.62185s 209.30us - - - - -
> - - - - cudaFree
> 4.62207s 164.19us - - - - -
> - - - - cudaFree
>
>
>
> --
> Davide
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
--
K.H
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150121/548d964c/attachment-0001.html>
More information about the mvapich-discuss
mailing list