[mvapich-discuss] problem about ibv_dealloc_pd

=?gb2312?q?=C7=BF=20=C2=ED?= vera_wx_cn at yahoo.com.cn
Thu Apr 17 05:15:51 EDT 2008


I build mvapich-1.0 with make.mvapich.gen2_multirail. I firstly run my MPI program on single HCA. (setting NUM_HCAS=1)
I let mpi tasks all catch a signal. The steps in the signal handler are: 
1) flush all pending messages;
2) MPIR_BsendRelease(,)
3) MPI_Barrier()
4) MPID_End()
5) checkpoint
6) exit
   
  In result, sometimes a few parts of MPI tasks failed in ibv_dealloc_pd() viainit.c:516, others successed. 
Somestimes all tasks finished all the above steps and exit successfully.
  
When failed, ibv_dealloc_pd() always returns 16 (IBV_WC_REM_ABORT_ERR). 
  
What infiniband resources are still associated with pd?
   
  I spend almost two weeks on checking and debugging my sources, I'm tied.
I test with bt.C.36 on the infiniband environments: 
CA type: MT25204, ports: 1, rate: 20
   
  Please help me,
thanks on advanced.

       
---------------------------------
 ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080417/218f26dd/attachment.html


More information about the mvapich-discuss mailing list