[mvapich-discuss] PSM netmod does release vbufs in large RMA data transferring
Min Si
msi at anl.gov
Fri Sep 2 12:36:47 EDT 2016
Hi Mingzhe,
After discussing with the MPICH group, we think the fix patch for
Put/Acc part is hacky.
- It should use *MPID_Request_release* instead of
*MPIU_Object_release_ref*, because the ref_count is supposed to be a
"private" variable for the request allocation code, and should not be
directly updated anywhere. In fact, the latest MPICH completely removed
MPIU_Object_release_ref. For the Put/Acc rndv message, the released
ref_count can be considered as the reference for
psm_1sided_putpkt/psm_1sided_accumpkt, so we can *release* it after work
done in these function.
- Above is only the minimum code change to make it work, but is still
not the right approach. Because for rndv Put/Acc, the request is always
allocated with ref_count 2, but is immediately reduced to 1 which adds
unnecessary instructions. The right way to fix this would be to create a
request with only ref_count of 1. But that needs change in
psm_create_req, or another request allocation function.
I cannot find time to make the right patch for it , but could you please
fix this in the right way before adding it into a future release of
MVAPICH ?
Thanks,
Min
On 9/1/16 6:56 PM, Mingzhe Li wrote:
> Hi Min,
>
> Thank you for your detailed analysis and the patch. We will take the
> patch and it will be available with the next release.
>
> Thanks,
> Mingzhe
>
> On Thu, Sep 1, 2016 at 6:36 PM, Min Si <msi at anl.gov
> <mailto:msi at anl.gov>> wrote:
>
> Hi,
>
> I have observed heavy memory consumption in the PSM netmod when
> doing RMA communication with large data.
>
> After looked into the code of the PSM netmod, I found it is
> because the first request in *rndv* protocol is not really
> released after data transferring completed. For example, in a rndv
> PUT, the first request is for packet header, and the second is for
> rndv data (see function psm_1sided_putpkt). The second request can
> be released in rma_list_gc in ch3u_rma_sync.c, but the first one
> is not exposed to CH3 and cannot be exactly released in
> psm_process_completion, because the ref_count is not 0.
>
> Consequently, the vbuf allocated for the first request cannot be
> freed. Once the available vbufs in the pool are used up, new vbufs
> will be allocated (64 * 16KB). That is the reason I observed very
> heavy memory usage in osu_put_bw/osu_get_bw benchmarks, where
> every message size executes 64 times and thus the next message
> size always reallocates 64*loop new vbufs if it goes into rndv
> protocol (>16KB).
>
> I have attached a patch based on MVAPICH2-2.2rc1 to fix this issue
> in PUT/ACC/GET/GET_ACC.
> - For Put/Acc, I think the ref_count should be decreased to 1 in
> rndv branch, since only PSM layer checks it. Therefore it can be
> released in function psm_process_completion.
> - For Get/Get_Acc, I think the first request needs to be completed
> in psm_getresp_rndv_complete (ref_count--, and completion
> counter=0), thus it can be correctly released in CH3 function
> rma_list_gc.
>
> Thanks,
> Min
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160902/2f5c84dd/attachment.html>
More information about the mvapich-discuss
mailing list