[mvapich-discuss] PSM netmod does release vbufs in large RMA data transferring

Min Si msi at anl.gov
Thu Sep 1 18:36:02 EDT 2016


Hi,

I have observed heavy memory consumption in the PSM netmod when doing  
RMA communication with large data.

After looked into the code of the PSM netmod, I found it is because the  
first request in *rndv* protocol is not really released after data  
transferring completed. For example, in a rndv PUT, the first request is  
for packet header, and the second is for rndv data (see function  
psm_1sided_putpkt). The second request can be released in rma_list_gc in  
ch3u_rma_sync.c, but the first one is not exposed to CH3 and cannot be  
exactly released in psm_process_completion, because the ref_count is not 0.

Consequently, the vbuf allocated for the first request cannot be freed.  
Once the available vbufs in the pool are used up, new vbufs will be  
allocated (64 * 16KB). That is the reason I observed very heavy memory  
usage in osu_put_bw/osu_get_bw benchmarks, where every message size  
executes 64 times and thus the next message size always reallocates  
64*loop new vbufs if it goes into rndv protocol (>16KB).

I have attached a patch based on MVAPICH2-2.2rc1 to fix this issue in  
PUT/ACC/GET/GET_ACC.
- For Put/Acc, I think the ref_count should be decreased to 1 in rndv  
branch, since only PSM layer checks it. Therefore it can be released in  
function psm_process_completion.
- For Get/Get_Acc, I think the first request needs to be completed in  
psm_getresp_rndv_complete (ref_count--, and completion counter=0), thus  
it can be correctly released in CH3 function rma_list_gc.

Thanks,
Min
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160901/58e58086/attachment.html>
-------------- next part --------------
diff --git a/src/mpid/ch3/channels/psm/src/psm_1sided.c b/src/mpid/ch3/channels/psm/src/psm_1sided.c
index 0645585..bb63f5f 100644
--- a/src/mpid/ch3/channels/psm/src/psm_1sided.c
+++ b/src/mpid/ch3/channels/psm/src/psm_1sided.c
@@ -253,6 +253,7 @@ int psm_1sided_putpkt(MPIDI_CH3_Pkt_put_t *pkt, MPID_IOV *iov, int iov_n,
     int rank, i;
     MPIDI_msg_sz_t buflen = 0, len;
     MPID_Request *req;
+    int inuse = 0;
 
     req = psm_create_req();
     req->kind = MPID_REQUEST_SEND;
@@ -287,6 +288,9 @@ int psm_1sided_putpkt(MPIDI_CH3_Pkt_put_t *pkt, MPID_IOV *iov, int iov_n,
         pkt->rndv_len = iov[iov_n-1].MPID_IOV_LEN;
         buflen = 0;
         
+        /* decrease header req's ref_count, since CH3 only checks the rndv one.*/
+        MPIU_Object_release_ref(req, &inuse);
+
         /* last iov is the packet */
         for(i = 0; i < (iov_n-1); i++) {
             iovp = (void *)iov[i].MPID_IOV_BUF;
@@ -316,6 +320,7 @@ int psm_1sided_accumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
     int mpi_errno = MPI_SUCCESS;
     MPIDI_msg_sz_t buflen = 0, len;
     MPID_Request *req;
+    int inuse = 0;
 
     req = psm_create_req();
     req->kind = MPID_REQUEST_SEND;
@@ -349,6 +354,9 @@ int psm_1sided_accumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
         pkt->rndv_len = iov[iov_n-1].MPID_IOV_LEN;
         buflen = 0;
         
+        /* decrease header req's ref_count, since CH3 only checks the rndv one.*/
+        MPIU_Object_release_ref(req, &inuse);
+
         /* last iov is the packet */
         for(i = 0; i < (iov_n-1); i++) {
             iovp = (void *)iov[i].MPID_IOV_BUF;
@@ -381,6 +389,7 @@ int psm_1sided_getaccumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
         uint64_t rtag, rtagsel;
     #endif
     PSM_ERROR_T psmerr;
+    int inuse = 0;
 
     req = psm_create_req();
     req->kind = MPID_REQUEST_SEND;
@@ -416,6 +425,9 @@ int psm_1sided_getaccumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
         /*tag for resp packet*/
         pkt->resp_rndv_tag = psm_get_rndvtag();
 
+        /* decrease header req's ref_count, since CH3 only checks the rndv one.*/
+        MPIU_Object_release_ref(req, &inuse);
+
         /* last iov is the packet */
         buflen = 0;
         for(i = 0; i < (iov_n-1); i++) {
@@ -1216,8 +1228,12 @@ int psm_getresp_rndv_complete(MPID_Request *req, int inlen)
         MPID_Request *savq = req->savedreq;
         psm_do_unpack(savq->dev.user_count, savq->dev.datatype, NULL, savq->dev.user_buf,
                 0, savq->dev.real_user_buf, inlen);
-        MPID_cc_set(req->savedreq->cc_ptr, 0);
         MPIU_Free(savq->dev.user_buf);
+
+        /* complete the control request and decrease ref_count,
+         * thus it can be freed in CH3. */
+        MPIDI_CH3U_Request_complete(savq);
+
         MPIU_Object_set_ref(req, 0);
         MPIDI_CH3_Request_destroy(req);
     }


More information about the mvapich-discuss mailing list