[mvapich-discuss] BUG in MVAPICH2-1.2p1 - OFA (RDMA) inside vbuf.c file while calling deallocate_vbufs()

gossips J polk678 at gmail.com
Tue Aug 11 01:18:18 EDT 2009


Hi,
It is observed that while deallocate_vbufs() there is error handling for
ibv_dereg_mr() API.
This, if it fails, mvapich2 goes for ibv_error_abort() call.

Now before doing all these stuff it has been observed that there is spin
lock acquired for vBUF.
++++
pthread_spin_lock(&vbuf_lock);
++++

So ideally before calling ibv_error_abort(), it should release this spin
lock as well.
If this is not done and MR dereg fails, OS gives kernel panic since spin
lock has not been released.

This seems BUG in mvapich2-1.2p1-1.src.rpm coming with OFED-1.4.1-GA.

Following patch should fix this:
++++++++
--- src/mpid/ch3/channels/mrail/src/gen2/vbuf.c
+++ src/mpid/ch3/channels/mrail/src/gen2/vbuf_fixed.c
@@ -105,6 +105,7 @@ int init_vbuf_lock()
 void deallocate_vbufs(int hca_num)
 {
     vbuf_region *r = vbuf_region_head;
+    int err = 0;

 #if !defined(CKPT)
     if (MPIDI_CH3I_RDMA_Process.has_srq
@@ -122,7 +123,8 @@ void deallocate_vbufs(int hca_num)
         if (r->mem_handle[hca_num] != NULL
             && ibv_dereg_mr(r->mem_handle[hca_num]))
         {
-            ibv_error_abort(IBV_RETURN_ERR, "could not deregister MR");
+            err = -1;
+           break;
         }

         DEBUG_PRINT("deregister vbufs\n");
@@ -139,6 +141,9 @@ void deallocate_vbufs(int hca_num)
     {
          pthread_spin_unlock(&vbuf_lock);
     }
+
+    if (err < 0)
+       ibv_error_abort(IBV_RETURN_ERR, "could not deregister MR");
 }

 static int allocate_vbuf_region(int nvbufs)
++++++++

Thanks,
Polk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090811/784828dc/attachment-0001.html


More information about the mvapich-discuss mailing list