[mvapich-discuss] PSM netmod does release vbufs in large RMA data transferring

Min Si msi at anl.gov
Fri Sep 2 12:36:47 EDT 2016


Hi Mingzhe,

After discussing with the MPICH group, we think the fix patch for 
Put/Acc part is hacky.

- It should use *MPID_Request_release* instead of 
*MPIU_Object_release_ref*, because the ref_count is supposed to be a 
"private" variable for the request allocation code, and should not be 
directly updated anywhere. In fact, the latest MPICH completely removed 
MPIU_Object_release_ref. For the Put/Acc rndv message, the released 
ref_count can be considered as the reference for 
psm_1sided_putpkt/psm_1sided_accumpkt, so we can *release* it after work 
done in these function.

- Above is only the minimum code change to make it work, but is still 
not the right approach. Because for rndv Put/Acc, the request is always 
allocated with ref_count 2, but is immediately reduced to 1 which adds 
unnecessary instructions. The right way to fix this would be to create a 
request with only ref_count of 1. But that needs change in 
psm_create_req, or another request allocation function.

I cannot find time to make the right patch for it , but could you please 
fix this in the right way before adding it into a future release of 
MVAPICH ?

Thanks,
Min

On 9/1/16 6:56 PM, Mingzhe Li wrote:
> Hi Min,
>
> Thank you for your detailed analysis and the patch. We will take the 
> patch and it will be available with the next release.
>
> Thanks,
> Mingzhe
>
> On Thu, Sep 1, 2016 at 6:36 PM, Min Si <msi at anl.gov 
> <mailto:msi at anl.gov>> wrote:
>
>     Hi,
>
>     I have observed heavy memory consumption in the PSM netmod when
>     doing RMA communication with large data.
>
>     After looked into the code of the PSM netmod, I found it is
>     because the first request in *rndv* protocol is not really
>     released after data transferring completed. For example, in a rndv
>     PUT, the first request is for packet header, and the second is for
>     rndv data (see function psm_1sided_putpkt). The second request can
>     be released in rma_list_gc in ch3u_rma_sync.c, but the first one
>     is not exposed to CH3 and cannot be exactly released in
>     psm_process_completion, because the ref_count is not 0.
>
>     Consequently, the vbuf allocated for the first request cannot be
>     freed. Once the available vbufs in the pool are used up, new vbufs
>     will be allocated (64 * 16KB). That is the reason I observed very
>     heavy memory usage in osu_put_bw/osu_get_bw benchmarks, where
>     every message size executes 64 times and thus the next message
>     size always reallocates 64*loop new vbufs if it goes into rndv
>     protocol (>16KB).
>
>     I have attached a patch based on MVAPICH2-2.2rc1 to fix this issue
>     in PUT/ACC/GET/GET_ACC.
>     - For Put/Acc, I think the ref_count should be decreased to 1 in
>     rndv branch, since only PSM layer checks it. Therefore it can be
>     released in function psm_process_completion.
>     - For Get/Get_Acc, I think the first request needs to be completed
>     in psm_getresp_rndv_complete (ref_count--, and completion
>     counter=0), thus it can be correctly released in CH3 function
>     rma_list_gc.
>
>     Thanks,
>     Min
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>     <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160902/2f5c84dd/attachment.html>


More information about the mvapich-discuss mailing list