[mvapich-discuss] PSM netmod does release vbufs in large RMA data transferring
Min Si
msi at anl.gov
Thu Sep 1 18:36:02 EDT 2016
Hi,
I have observed heavy memory consumption in the PSM netmod when doing
RMA communication with large data.
After looked into the code of the PSM netmod, I found it is because the
first request in *rndv* protocol is not really released after data
transferring completed. For example, in a rndv PUT, the first request is
for packet header, and the second is for rndv data (see function
psm_1sided_putpkt). The second request can be released in rma_list_gc in
ch3u_rma_sync.c, but the first one is not exposed to CH3 and cannot be
exactly released in psm_process_completion, because the ref_count is not 0.
Consequently, the vbuf allocated for the first request cannot be freed.
Once the available vbufs in the pool are used up, new vbufs will be
allocated (64 * 16KB). That is the reason I observed very heavy memory
usage in osu_put_bw/osu_get_bw benchmarks, where every message size
executes 64 times and thus the next message size always reallocates
64*loop new vbufs if it goes into rndv protocol (>16KB).
I have attached a patch based on MVAPICH2-2.2rc1 to fix this issue in
PUT/ACC/GET/GET_ACC.
- For Put/Acc, I think the ref_count should be decreased to 1 in rndv
branch, since only PSM layer checks it. Therefore it can be released in
function psm_process_completion.
- For Get/Get_Acc, I think the first request needs to be completed in
psm_getresp_rndv_complete (ref_count--, and completion counter=0), thus
it can be correctly released in CH3 function rma_list_gc.
Thanks,
Min
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160901/58e58086/attachment.html>
-------------- next part --------------
diff --git a/src/mpid/ch3/channels/psm/src/psm_1sided.c b/src/mpid/ch3/channels/psm/src/psm_1sided.c
index 0645585..bb63f5f 100644
--- a/src/mpid/ch3/channels/psm/src/psm_1sided.c
+++ b/src/mpid/ch3/channels/psm/src/psm_1sided.c
@@ -253,6 +253,7 @@ int psm_1sided_putpkt(MPIDI_CH3_Pkt_put_t *pkt, MPID_IOV *iov, int iov_n,
int rank, i;
MPIDI_msg_sz_t buflen = 0, len;
MPID_Request *req;
+ int inuse = 0;
req = psm_create_req();
req->kind = MPID_REQUEST_SEND;
@@ -287,6 +288,9 @@ int psm_1sided_putpkt(MPIDI_CH3_Pkt_put_t *pkt, MPID_IOV *iov, int iov_n,
pkt->rndv_len = iov[iov_n-1].MPID_IOV_LEN;
buflen = 0;
+ /* decrease header req's ref_count, since CH3 only checks the rndv one.*/
+ MPIU_Object_release_ref(req, &inuse);
+
/* last iov is the packet */
for(i = 0; i < (iov_n-1); i++) {
iovp = (void *)iov[i].MPID_IOV_BUF;
@@ -316,6 +320,7 @@ int psm_1sided_accumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
int mpi_errno = MPI_SUCCESS;
MPIDI_msg_sz_t buflen = 0, len;
MPID_Request *req;
+ int inuse = 0;
req = psm_create_req();
req->kind = MPID_REQUEST_SEND;
@@ -349,6 +354,9 @@ int psm_1sided_accumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
pkt->rndv_len = iov[iov_n-1].MPID_IOV_LEN;
buflen = 0;
+ /* decrease header req's ref_count, since CH3 only checks the rndv one.*/
+ MPIU_Object_release_ref(req, &inuse);
+
/* last iov is the packet */
for(i = 0; i < (iov_n-1); i++) {
iovp = (void *)iov[i].MPID_IOV_BUF;
@@ -381,6 +389,7 @@ int psm_1sided_getaccumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
uint64_t rtag, rtagsel;
#endif
PSM_ERROR_T psmerr;
+ int inuse = 0;
req = psm_create_req();
req->kind = MPID_REQUEST_SEND;
@@ -416,6 +425,9 @@ int psm_1sided_getaccumpkt(MPIDI_CH3_Pkt_accum_t *pkt, MPID_IOV *iov, int iov_n,
/*tag for resp packet*/
pkt->resp_rndv_tag = psm_get_rndvtag();
+ /* decrease header req's ref_count, since CH3 only checks the rndv one.*/
+ MPIU_Object_release_ref(req, &inuse);
+
/* last iov is the packet */
buflen = 0;
for(i = 0; i < (iov_n-1); i++) {
@@ -1216,8 +1228,12 @@ int psm_getresp_rndv_complete(MPID_Request *req, int inlen)
MPID_Request *savq = req->savedreq;
psm_do_unpack(savq->dev.user_count, savq->dev.datatype, NULL, savq->dev.user_buf,
0, savq->dev.real_user_buf, inlen);
- MPID_cc_set(req->savedreq->cc_ptr, 0);
MPIU_Free(savq->dev.user_buf);
+
+ /* complete the control request and decrease ref_count,
+ * thus it can be freed in CH3. */
+ MPIDI_CH3U_Request_complete(savq);
+
MPIU_Object_set_ref(req, 0);
MPIDI_CH3_Request_destroy(req);
}
More information about the mvapich-discuss
mailing list