[mvapich-discuss] MVAPICH Error
OShea, Thomas T.
THOMAS.T.O'SHEA at saic.com
Wed Aug 8 13:36:58 EDT 2007
As it turns out my system admin tells me that the installation of open
fabrics wasn't complete enough, once he fixed that it installed without
a hitch.
With this new 1.0-beta version I'm not getting that error any longer.
Although, run times seem to be a bit longer; but we are still
investigating that.
Thanks for you help.
Tom
-----Original Message-----
From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu]
Sent: Monday, August 06, 2007 8:36 PM
To: OShea, Thomas T.
Cc: panda at cse.ohio-state.edu; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] MVAPICH Error
> Hello,
>
> Actually we would like to be running 1.0-beta, but we are having
trouble
> compiling it. The configure script bombs out while trying to find the
> size of 'bool' or something.
Sorry to know that you are having trouble compiling 1.0-beta. Could
you please let us know the exact error you are seeing. It will help us
to solve this problem. We have not seen any such errors on our
systems.
> The version we are currently using is the 0.9.8p3 with the patch you
> gave me earlier applied.
Thanks for this information. We will investigate the assertion error
issue.
Thanks,
DK
> Thanks,
> Tom
>
> -----Original Message-----
> From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu]
> Sent: Saturday, August 04, 2007 5:15 AM
> To: OShea, Thomas T.
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] MVAPICH Error
>
> Hi Thomas,
>
> Are you seeing this behavior with MVAPICH2 0.9.8p2 with the patch
> Gopal had sent to you on July 7th?
>
> Have you tried MVAPICH2 0.9.8p3 or the latest release MVAPICH2
> 1.0-beta. Do you see the same behavior with these two versions
> also. In these versions we have applied a better solution to the
> problem you had reported originally.
>
> If you can let us know which version you are using currently, it will
> help us to narrow down the problem further.
>
> Best Regards,
>
> DK
>
> > Hello again,
> >
> > Thanks for all your help in the past; I've been able to get my code
up
> > and running on a small 32 processor cluster. I'm doing scaling tests
> and
> > I ran with an array size of 16x16x16 with 1,2,4,8 and 16 processors
> and
> > saw fairly good scaling. When I increased the array sizes to
32x32x32
> my
> > code runs fine for all but the 8 processor case. The odd part is
that
> is
> > doesn't crash until the 15th iteration, and I'm doing 21 iterations
> for
> > each case. Here is the error it produces:
> >
> > =20
> >
> > ch3_rndvtransfer.c:614: MPIDI_CH3_Get_rndv_push: Assertion
> > '(get_resp_pkt->seqnum) + 1 =3D=3D (vc)->seqnum_send' failed.
> >
> > =20
> >
> > I imagine this will be a pain for me to debug since it takes about
30
> > minutes to get to the point where it fails. Ever seen this error or
> have
> > any idea what might be causing it? Any tips would be greatly
> > appreciated.=20
> >
> > =20
> >
> > Thanks,
> >
> > Thomas O'Shea
>
>
More information about the mvapich-discuss
mailing list