[mvapich-discuss] PSM netmod does release vbufs in large RMA data transferring

Mingzhe Li li.2192 at osu.edu
Thu Sep 1 19:56:27 EDT 2016


Hi Min,

Thank you for your detailed analysis and the patch. We will take the patch
and it will be available with the next release.

Thanks,
Mingzhe

On Thu, Sep 1, 2016 at 6:36 PM, Min Si <msi at anl.gov> wrote:

> Hi,
>
> I have observed heavy memory consumption in the PSM netmod when doing RMA
> communication with large data.
>
> After looked into the code of the PSM netmod, I found it is because the
> first request in *rndv* protocol is not really released after data
> transferring completed. For example, in a rndv PUT, the first request is
> for packet header, and the second is for rndv data (see function
> psm_1sided_putpkt). The second request can be released in rma_list_gc in
> ch3u_rma_sync.c, but the first one is not exposed to CH3 and cannot be
> exactly released in psm_process_completion, because the ref_count is not 0.
>
> Consequently, the vbuf allocated for the first request cannot be freed.
> Once the available vbufs in the pool are used up, new vbufs will be
> allocated (64 * 16KB). That is the reason I observed very heavy memory
> usage in osu_put_bw/osu_get_bw benchmarks, where every message size
> executes 64 times and thus the next message size always reallocates 64*loop
> new vbufs if it goes into rndv protocol (>16KB).
>
> I have attached a patch based on MVAPICH2-2.2rc1 to fix this issue in
> PUT/ACC/GET/GET_ACC.
> - For Put/Acc, I think the ref_count should be decreased to 1 in rndv
> branch, since only PSM layer checks it. Therefore it can be released in
> function psm_process_completion.
> - For Get/Get_Acc, I think the first request needs to be completed in
> psm_getresp_rndv_complete (ref_count--, and completion counter=0), thus it
> can be correctly released in CH3 function rma_list_gc.
>
> Thanks,
> Min
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160901/750eba44/attachment-0001.html>


More information about the mvapich-discuss mailing list