[mvapich-discuss] MVAPICH Error

Dhabaleswar Panda panda at cse.ohio-state.edu
Wed Aug 8 15:50:07 EDT 2007


> As it turns out my system admin tells me that the installation of open
> fabrics wasn't complete enough, once he fixed that it installed without
> a hitch.
> 
> With this new 1.0-beta version I'm not getting that error any longer.

Glad to know that you do not see any errors with 1.0-beta.

> Although, run times seem to be a bit longer; but we are still
> investigating that.

OK. Please keep us updated about your investigation result so that we
will take a look at it.

> Thanks for you help.

You are welcome. 

Best Regards, 

DK

> Tom
> 
> -----Original Message-----
> From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] 
> Sent: Monday, August 06, 2007 8:36 PM
> To: OShea, Thomas T.
> Cc: panda at cse.ohio-state.edu; mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] MVAPICH Error
> 
> > Hello,
> > 
> > Actually we would like to be running 1.0-beta, but we are having
> trouble
> > compiling it. The configure script bombs out while trying to find the
> > size of 'bool' or something. 
> 
> Sorry to know that you are having trouble compiling 1.0-beta. Could
> you please let us know the exact error you are seeing. It will help us
> to solve this problem. We have not seen any such errors on our
> systems.
> 
> > The version we are currently using is the 0.9.8p3 with the patch you
> > gave me earlier applied. 
> 
> Thanks for this information. We will investigate the assertion error
> issue.
> 
> Thanks, 
> 
> DK
> 
> > Thanks,
> > Tom
> > 
> > -----Original Message-----
> > From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] 
> > Sent: Saturday, August 04, 2007 5:15 AM
> > To: OShea, Thomas T.
> > Cc: mvapich-discuss at cse.ohio-state.edu
> > Subject: Re: [mvapich-discuss] MVAPICH Error
> > 
> > Hi Thomas, 
> > 
> > Are you seeing this behavior with MVAPICH2 0.9.8p2 with the patch
> > Gopal had sent to you on July 7th?
> > 
> > Have you tried MVAPICH2 0.9.8p3 or the latest release MVAPICH2
> > 1.0-beta.  Do you see the same behavior with these two versions
> > also. In these versions we have applied a better solution to the
> > problem you had reported originally.
> > 
> > If you can let us know which version you are using currently, it will
> > help us to narrow down the problem further.
> > 
> > Best Regards, 
> > 
> > DK
> > 
> > > Hello again,
> > > 
> > > Thanks for all your help in the past; I've been able to get my code
> up
> > > and running on a small 32 processor cluster. I'm doing scaling tests
> > and
> > > I ran with an array size of 16x16x16 with 1,2,4,8 and 16 processors
> > and
> > > saw fairly good scaling. When I increased the array sizes to
> 32x32x32
> > my
> > > code runs fine for all but the 8 processor case. The odd part is
> that
> > is
> > > doesn't crash until the 15th iteration, and I'm doing 21 iterations
> > for
> > > each case. Here is the error it produces:
> > > 
> > > =20
> > > 
> > > ch3_rndvtransfer.c:614: MPIDI_CH3_Get_rndv_push: Assertion
> > > '(get_resp_pkt->seqnum) + 1 =3D=3D (vc)->seqnum_send' failed.
> > > 
> > > =20
> > > 
> > > I imagine this will be a pain for me to debug since it takes about
> 30
> > > minutes to get to the point where it fails. Ever seen this error or
> > have
> > > any idea what might be causing it? Any tips would be greatly
> > > appreciated.=20
> > > 
> > > =20
> > > 
> > > Thanks,
> > > 
> > > Thomas O'Shea
> > 
> > 
> 



More information about the mvapich-discuss mailing list