[mvapich-discuss] rdma_iba_priv.c + Error posting recv

Vishwas Vasisht vvasisht at locuz.com
Mon Nov 6 07:44:33 EST 2006


Hi,

I have 65 nodes Opetron cluster, with total of 260 cores(64 nodes + 1 Master, each dual processor, dual cored)
I was trying to submit a job (cpi, jobfarming..), using -np to be greater than 260. It was working till -np 300. But for above 300, I am getting these errors several times.

--------------------------------------------------------------------------
[rdma_iba_priv.c:406] error(-236): Error posting recv!
rank 12 in job 7  masternode_33851   caused collective abort of all ranks
  exit status of rank 12: killed by signal 9
--------------------------------------------------------------------------

Can you please help me sorting this out.

Regards
Vishwas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20061106/c91a286b/attachment.html


More information about the mvapich-discuss mailing list