[mvapich-discuss] Problem solved: program crashing running mvapich over infiniband

Matthew Koop koop at cse.ohio-state.edu
Thu Feb 5 00:31:36 EST 2009


Derek,

Glad to know that the problem is solved and it is running smoothly. Let us
know if you encounter any other issues.

Thanks,
Matt

On Thu, 5 Feb 2009, Derek Stewart wrote:

> Hi Matthew,
>
> I just wanted to update you and let you know I solved the problem.  It turns
> out the local versions of the input file on the different nodes were not
> identical.  So the local program which was set for a larger memory
> calculation was receiving much less data from the small memory versions
> initiated on the other nodes.  Once I fixed this problem, it ran smoothly.
>
> Thanks again,
>
> Derek
>
>
> Matthew Koop writes:
>
> > Hi Derek,
> >
> > Thanks for reporting this problem. Can you give us some additional
> > information about the run/system? How many processes are you running with
> > and what HCAs are you using?
> >
> > We're also interested in trying to reproduce the problem here on our
> > machines. Is there a dataset that you are using that you could send to us?
> >
> > Matt
> >
> > On Wed, 4 Feb 2009, Derek Stewart wrote:
> >
> >> Hi all,
> >>
> >> I was wondering if anyone would have a suggestion for this error.  I am
> >> running abinit version 5.4.4p compiled with mvapich 2-1.2p1 and gcc (GCC)
> >> 3.4.6 and gfortran 4.1.2, Linux 2.6.9-78.0.13.ELsmp 64bit.
> >>
> >> Warning! Rndv Receiver is receiving (36680 < 1263624) less than as expected
> >> rank 1 in job 1
> >>
> >> c32_32836   caused collective abort of all ranks
> >>   exit status of rank 1: killed by signal 9
> >>
> >>
> >> Thanks,
> >>
> >> Derek
> >>
> >> ################################
> >> Derek Stewart, Ph. D.
> >> Scientific Computation Associate
> >> http://www.people.cornell.edu/pages/das248/
> >> 250 Duffield Hall
> >> Cornell Nanoscale Facility (CNF)
> >> Ithaca, NY 14853
> >> stewart (at) cnf.cornell.edu
> >> (607) 255-2856
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >
>
>
>
> ################################
> Derek Stewart, Ph. D.
> Scientific Computation Associate
> http://www.people.cornell.edu/pages/das248/
> 250 Duffield Hall
> Cornell Nanoscale Facility (CNF)
> Ithaca, NY 14853
> stewart (at) cnf.cornell.edu
> (607) 255-2856
>



More information about the mvapich-discuss mailing list