[mvapich-discuss] problem about ibv_dealloc_pd
=?gb2312?q?=C7=BF=20=C2=ED?=
vera_wx_cn at yahoo.com.cn
Thu Apr 17 05:15:51 EDT 2008
I build mvapich-1.0 with make.mvapich.gen2_multirail. I firstly run my MPI program on single HCA. (setting NUM_HCAS=1)
I let mpi tasks all catch a signal. The steps in the signal handler are:
1) flush all pending messages;
2) MPIR_BsendRelease(,)
3) MPI_Barrier()
4) MPID_End()
5) checkpoint
6) exit
In result, sometimes a few parts of MPI tasks failed in ibv_dealloc_pd() viainit.c:516, others successed.
Somestimes all tasks finished all the above steps and exit successfully.
When failed, ibv_dealloc_pd() always returns 16 (IBV_WC_REM_ABORT_ERR).
What infiniband resources are still associated with pd?
I spend almost two weeks on checking and debugging my sources, I'm tied.
I test with bt.C.36 on the infiniband environments:
CA type: MT25204, ports: 1, rate: 20
Please help me,
thanks on advanced.
---------------------------------
ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080417/218f26dd/attachment.html
More information about the mvapich-discuss
mailing list