[mvapich-discuss] program crashing running mvapich over infiniband

Matthew Koop koop at cse.ohio-state.edu
Wed Feb 4 18:25:23 EST 2009


Hi Derek,

Thanks for reporting this problem. Can you give us some additional
information about the run/system? How many processes are you running with
and what HCAs are you using?

We're also interested in trying to reproduce the problem here on our
machines. Is there a dataset that you are using that you could send to us?

Matt

On Wed, 4 Feb 2009, Derek Stewart wrote:

> Hi all,
>
> I was wondering if anyone would have a suggestion for this error.  I am
> running abinit version 5.4.4p compiled with mvapich 2-1.2p1 and gcc (GCC)
> 3.4.6 and gfortran 4.1.2, Linux 2.6.9-78.0.13.ELsmp 64bit.
>
> Warning! Rndv Receiver is receiving (36680 < 1263624) less than as expected
> rank 1 in job 1
>
> c32_32836   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9
>
>
> Thanks,
>
> Derek
>
> ################################
> Derek Stewart, Ph. D.
> Scientific Computation Associate
> http://www.people.cornell.edu/pages/das248/
> 250 Duffield Hall
> Cornell Nanoscale Facility (CNF)
> Ithaca, NY 14853
> stewart (at) cnf.cornell.edu
> (607) 255-2856
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list