[mvapich-discuss] PSM netmod does release vbufs in large RMA data transferring

Mingzhe Li li.2192 at osu.edu
Fri Sep 2 12:52:05 EDT 2016


Hi Min,

Thank you for the detailed information. We will come up with a right fix
and make it available with the next release.

Thanks,
Mingzhe

On Fri, Sep 2, 2016 at 12:36 PM, Min Si <msi at anl.gov> wrote:

> Hi Mingzhe,
>
> After discussing with the MPICH group, we think the fix patch for Put/Acc
> part is hacky.
>
> - It should use *MPID_Request_release* instead of
> *MPIU_Object_release_ref*, because the ref_count is supposed to be a
> "private" variable for the request allocation code, and should not be
> directly updated anywhere. In fact, the latest MPICH completely removed
> MPIU_Object_release_ref. For the Put/Acc rndv message, the released
> ref_count can be considered as the reference for
> psm_1sided_putpkt/psm_1sided_accumpkt, so we can *release* it after work
> done in these function.
>
> - Above is only the minimum code change to make it work, but is still not
> the right approach. Because for rndv Put/Acc, the request is always
> allocated with ref_count 2, but is immediately reduced to 1 which adds
> unnecessary instructions. The right way to fix this would be to create a
> request with only ref_count of 1. But that needs change in psm_create_req,
> or another request allocation function.
>
> I cannot find time to make the right patch for it , but could you please
> fix this in the right way before adding it into a future release of MVAPICH
> ?
>
> Thanks,
> Min
>
>
> On 9/1/16 6:56 PM, Mingzhe Li wrote:
>
> Hi Min,
>
> Thank you for your detailed analysis and the patch. We will take the patch
> and it will be available with the next release.
>
> Thanks,
> Mingzhe
>
> On Thu, Sep 1, 2016 at 6:36 PM, Min Si <msi at anl.gov> wrote:
>
>> Hi,
>>
>> I have observed heavy memory consumption in the PSM netmod when doing RMA
>> communication with large data.
>>
>> After looked into the code of the PSM netmod, I found it is because the
>> first request in *rndv* protocol is not really released after data
>> transferring completed. For example, in a rndv PUT, the first request is
>> for packet header, and the second is for rndv data (see function
>> psm_1sided_putpkt). The second request can be released in rma_list_gc in
>> ch3u_rma_sync.c, but the first one is not exposed to CH3 and cannot be
>> exactly released in psm_process_completion, because the ref_count is not 0.
>>
>> Consequently, the vbuf allocated for the first request cannot be freed.
>> Once the available vbufs in the pool are used up, new vbufs will be
>> allocated (64 * 16KB). That is the reason I observed very heavy memory
>> usage in osu_put_bw/osu_get_bw benchmarks, where every message size
>> executes 64 times and thus the next message size always reallocates 64*loop
>> new vbufs if it goes into rndv protocol (>16KB).
>>
>> I have attached a patch based on MVAPICH2-2.2rc1 to fix this issue in
>> PUT/ACC/GET/GET_ACC.
>> - For Put/Acc, I think the ref_count should be decreased to 1 in rndv
>> branch, since only PSM layer checks it. Therefore it can be released in
>> function psm_process_completion.
>> - For Get/Get_Acc, I think the first request needs to be completed in
>> psm_getresp_rndv_complete (ref_count--, and completion counter=0), thus it
>> can be correctly released in CH3 function rma_list_gc.
>>
>> Thanks,
>> Min
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160902/58b0106c/attachment-0001.html>


More information about the mvapich-discuss mailing list