[mvapich-discuss] VAPI_register_mr - mpid/vapi/collutils.c

Sayantan Sur surs at cse.ohio-state.edu
Wed Dec 6 20:47:36 EST 2006


Hello Steve,

* On Dec,1 Steve Jones<stevejones at stanford.edu> wrote :
> hi.
> 
> i have a job returning the following error, shown from multiple runs. can
> you provide a more detailed explanation of collutils? i found a few notes
> on the web about a supposed memory leakage for scalapack subroutines, but
> it's not verfied. have you had similar reports of this error before?

Thanks for the report.

The collutils are a bunch of helper functions for the RDMA collectives
used in MVAPICH. This error shows that these weren't able to register
required amount of memory.

One quick workaround is to disable the RDMA collectives by using
DISABLE_RDMA_ALLTOALL=1 DISABLE_RDMA_ALLGATHER=1 env. variables. An
example of using this would be:

$ mpirun_rsh -np N -hostfile mf DISABLE_RDMA_ALLTOALL=1 ./a.out

Could you please let us know if this workaround helps the situation?

Thanks,
Sayantan.

> 
> thanks.
> 
> steve
> 
> [0] Abort: VAPI_register_mr (Resources temporary unavailable) at line 66 in
> file mpid/vapi/collutils.c
> done.
> mpirun_rsh: Abort signaled from [0]
> Wed Dec  6 07:34:12 EST 2006
>  crashed ? dfM* & wfM* maybe still on /state/partition1/smjones_ForSteve.7/
> ?
> Wed Dec  6 07:34:12 EST 2006
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> )[40] Abort: VAPI_register_mr (Resources temporary unavailable) at line 66
> in file mpid/vapi/collutils.c
> done.
> mpirun_rsh: Abort signaled from [40]
> Wed Dec  6 08:09:06 EST 2006
>  crashed ? dfM* & wfM* maybe still on /state/partition1/smjones_ForSteve.6/
> ?
> Wed Dec  6 08:09:06 EST 2006
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> )[51] Abort: VAPI_register_mr (Resources temporary unavailable) at line 66
> in file mpid/vapi/collutils.c
> done.
> mpirun_rsh: Abort signaled from [51]
> Wed Dec  6 07:54:10 EST 2006
>  crashed ? dfM* & wfM* maybe still on /state/partition1/smjones_ForSteve.4/
> ?
> Wed Dec  6 07:54:10 EST 2006
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> )[37] Abort: VAPI_register_mr (Resources temporary unavailable) at line 66
> in file mpid/vapi/collutils.c
> done.
> mpirun_rsh: Abort signaled from [37]
> Wed Dec  6 07:59:41 EST 2006
>  crashed ? dfM* & wfM* maybe still on /state/partition1/smjones_ForSteve.5/
> ?
> Wed Dec  6 07:59:41 EST 2006
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
http://www.cse.ohio-state.edu/~surs


More information about the mvapich-discuss mailing list