[mvapich-discuss] MVAPICH Error

OShea, Thomas T. THOMAS.T.O'SHEA at saic.com
Wed Aug 8 13:36:58 EDT 2007


As it turns out my system admin tells me that the installation of open
fabrics wasn't complete enough, once he fixed that it installed without
a hitch.

With this new 1.0-beta version I'm not getting that error any longer.
Although, run times seem to be a bit longer; but we are still
investigating that.

Thanks for you help.

Tom

-----Original Message-----
From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] 
Sent: Monday, August 06, 2007 8:36 PM
To: OShea, Thomas T.
Cc: panda at cse.ohio-state.edu; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] MVAPICH Error

> Hello,
> 
> Actually we would like to be running 1.0-beta, but we are having
trouble
> compiling it. The configure script bombs out while trying to find the
> size of 'bool' or something. 

Sorry to know that you are having trouble compiling 1.0-beta. Could
you please let us know the exact error you are seeing. It will help us
to solve this problem. We have not seen any such errors on our
systems.

> The version we are currently using is the 0.9.8p3 with the patch you
> gave me earlier applied. 

Thanks for this information. We will investigate the assertion error
issue.

Thanks, 

DK

> Thanks,
> Tom
> 
> -----Original Message-----
> From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] 
> Sent: Saturday, August 04, 2007 5:15 AM
> To: OShea, Thomas T.
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] MVAPICH Error
> 
> Hi Thomas, 
> 
> Are you seeing this behavior with MVAPICH2 0.9.8p2 with the patch
> Gopal had sent to you on July 7th?
> 
> Have you tried MVAPICH2 0.9.8p3 or the latest release MVAPICH2
> 1.0-beta.  Do you see the same behavior with these two versions
> also. In these versions we have applied a better solution to the
> problem you had reported originally.
> 
> If you can let us know which version you are using currently, it will
> help us to narrow down the problem further.
> 
> Best Regards, 
> 
> DK
> 
> > Hello again,
> > 
> > Thanks for all your help in the past; I've been able to get my code
up
> > and running on a small 32 processor cluster. I'm doing scaling tests
> and
> > I ran with an array size of 16x16x16 with 1,2,4,8 and 16 processors
> and
> > saw fairly good scaling. When I increased the array sizes to
32x32x32
> my
> > code runs fine for all but the 8 processor case. The odd part is
that
> is
> > doesn't crash until the 15th iteration, and I'm doing 21 iterations
> for
> > each case. Here is the error it produces:
> > 
> > =20
> > 
> > ch3_rndvtransfer.c:614: MPIDI_CH3_Get_rndv_push: Assertion
> > '(get_resp_pkt->seqnum) + 1 =3D=3D (vc)->seqnum_send' failed.
> > 
> > =20
> > 
> > I imagine this will be a pain for me to debug since it takes about
30
> > minutes to get to the point where it fails. Ever seen this error or
> have
> > any idea what might be causing it? Any tips would be greatly
> > appreciated.=20
> > 
> > =20
> > 
> > Thanks,
> > 
> > Thomas O'Shea
> 
> 



More information about the mvapich-discuss mailing list