[mvapich-discuss] rdma_iba_priv.c + Error posting recv

Vishwas vvasisht at locuz.com
Tue Nov 7 11:13:27 EST 2006


Hi,

I applied the patch, now its working now, thanks..

Vishwas

-----Original Message-----
From: wei huang [mailto:huanwei at cse.ohio-state.edu] 
Sent: Monday, November 06, 2006 11:07 PM
To: Vishwas
Cc: 'Matthew Koop'; mvapich-discuss at cse.ohio-state.edu
Subject: RE: [mvapich-discuss] rdma_iba_priv.c + Error posting recv

Hi Vishwas,

You need to apply the patch and rebuild the package.

Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Mon, 6 Nov 2006, Vishwas wrote:

> Hi,
>
> Should I use this path and rebuild the mvapich (using make.mvapich.vapi)
and
> install it again.
>
> Vishwas
> -----Original Message-----
> From: Matthew Koop [mailto:koop at cse.ohio-state.edu]
> Sent: Monday, November 06, 2006 10:52 PM
> To: Vishwas Vasisht
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] rdma_iba_priv.c + Error posting recv
>
> Vishwas,
>
> I've attached a patch to address the issue you're seeing with the VAPI
> device. Can you try this out and verify that it solves your problem?
>
> I'd also like to strongly suggest that you move towards the OFED/Gen2
> stack, since it has more support from vendors and the overall community.
> Our support for OFED/Gen2 also has various features not found in the VAPI
> version.
>
> Let us know if you have any other questions.
>
> Thanks,
>
> Matt
>
>
> On Mon, 6 Nov 2006, Vishwas Vasisht wrote:
>
> > Hi,
> >
> > I have 65 nodes Opetron cluster, with total of 260 cores(64 nodes + 1
> > Master, each dual processor, dual cored) I was trying to submit a job
> > (cpi, jobfarming..), using -np to be greater than 260. It was working
> > till -np 300. But for above 300, I am getting these errors several
> > times.
> >
> >
--------------------------------------------------------------------------
> > [rdma_iba_priv.c:406] error(-236): Error posting recv!
> > rank 12 in job 7  masternode_33851   caused collective abort of all
ranks
> >   exit status of rank 12: killed by signal 9
> >
--------------------------------------------------------------------------
> >
> > Can you please help me sorting this out.
> >
> > Regards
> > Vishwas
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.22/512 - Release Date: 11/1/2006




More information about the mvapich-discuss mailing list