[mvapich-discuss] rdma_iba_priv.c + Error posting recv

Vishwas vvasisht at locuz.com
Mon Nov 6 12:30:02 EST 2006


Hi,

Should I use this path and rebuild the mvapich (using make.mvapich.vapi) and
install it again.

Vishwas
-----Original Message-----
From: Matthew Koop [mailto:koop at cse.ohio-state.edu] 
Sent: Monday, November 06, 2006 10:52 PM
To: Vishwas Vasisht
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] rdma_iba_priv.c + Error posting recv

Vishwas,

I've attached a patch to address the issue you're seeing with the VAPI
device. Can you try this out and verify that it solves your problem?

I'd also like to strongly suggest that you move towards the OFED/Gen2
stack, since it has more support from vendors and the overall community.
Our support for OFED/Gen2 also has various features not found in the VAPI
version.

Let us know if you have any other questions.

Thanks,

Matt


On Mon, 6 Nov 2006, Vishwas Vasisht wrote:

> Hi,
>
> I have 65 nodes Opetron cluster, with total of 260 cores(64 nodes + 1
> Master, each dual processor, dual cored) I was trying to submit a job
> (cpi, jobfarming..), using -np to be greater than 260. It was working
> till -np 300. But for above 300, I am getting these errors several
> times.
>
> --------------------------------------------------------------------------
> [rdma_iba_priv.c:406] error(-236): Error posting recv!
> rank 12 in job 7  masternode_33851   caused collective abort of all ranks
>   exit status of rank 12: killed by signal 9
> --------------------------------------------------------------------------
>
> Can you please help me sorting this out.
>
> Regards
> Vishwas
>



More information about the mvapich-discuss mailing list