[mvapich-discuss] VAPI_register_mr - mpid/vapi/collutils.c

Sayantan Sur surs at cse.ohio-state.edu
Thu Dec 7 11:55:10 EST 2006


Hi Steve,

* On Dec,3 Steve Jones<stevejones at stanford.edu> wrote :
> Hi Sayantan.
> 
> > Thanks for the report.
> >
> > The collutils are a bunch of helper functions for the RDMA collectives
> > used in MVAPICH. This error shows that these weren't able to register
> > required amount of memory.
> >
> > One quick workaround is to disable the RDMA collectives by using
> > DISABLE_RDMA_ALLTOALL=1 DISABLE_RDMA_ALLGATHER=1 env. variables. An
> > example of using this would be:
> >
> > $ mpirun_rsh -np N -hostfile mf DISABLE_RDMA_ALLTOALL=1 ./a.out
> >
> > Could you please let us know if this workaround helps the situation?
> 
> I ran using both variables, with the same error generated, one error per
> process, 64 processes in this case (output truncated). I've also included
> the command string:
> 
> /opt/mvapich/pgi/bin/mpirun_rsh -ssh -np $NP -hostfile hf \
> DISABLE_RDMA_ALLTOALL=1 DISABLE_RDMA_ALLGATHER=1 \
> ./Cmdft_${PBS_JOBNAME} >> Cmdft_out 2>&1
> 
> 
> )Failed to register2bcd5000length 249856 (Error:Resources temporary
> unavailable

Sorry it didn't work out. Is this message from collutils too, or from
elsewhere? Its unlikely, but if you could also try turning off the RDMA
Barrier to see if that helps?

DISABLE_RDMA_BARRIER=1.

Thanks,
Sayantan. 


> )Failed to register2bcd5000length 249856 (Error:Resources temporary
> unavailable
> )Failed to register2bcd5000length 249856 (Error:Resources temporary
> unavailable
> )Failed to register2beee000length 249856 (Error:Resources temporary
> unavailable
> )Failed to register2beee000length 249856 (Error:Resources temporary
> unavailable
> )Failed to register2c6db000length 249856 (Error:Resources temporary
> unavailable
> )Failed to register2c6db000length 249856 (Error:Resources temporary
> unavailable
> )Failed to register2c78e000length 249856 (Error:Resources temporary
> unavailable
> )
> 
> 
> 
> 
> 
> **** nit,irc,nhop=       2      0   1049 nb,icnvrgds=     412     38
> [37] Abort: VAPI_register_mr (Resources temporary unavailable) at line 66 in
> file mpid/vapi/collutils.c
> done.
> mpirun_rsh: Abort signaled from [37]
> Thu Dec  7 04:50:51 EST 2006
>  crashed ? dfM* & wfM* maybe still on /state/partition1/smjones_ForSteve.1/
> ?
> Thu Dec  7 04:50:51 EST 2006
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- 
http://www.cse.ohio-state.edu/~surs


More information about the mvapich-discuss mailing list