[mvapich-discuss] cpmd job failure

Sangamesh B forum.san at gmail.com
Sun Feb 1 10:22:04 EST 2009


Hello Sir,

On Sat, Jan 31, 2009 at 8:08 PM, Dhabaleswar Panda
<panda at cse.ohio-state.edu> wrote:
> Thanks for reporting this. Are you running MVAPICH2 1.2p1 with the
> `default' mode or with any environment variables? Can you also indicate
> the details on your platform (processor, number of cores/node, amount of
> memory per core, IB HCA speed, etc.).
>
I'm running it in 'default' mode. I've not used any additional variables.

Intel Xeon Quad core Dual processor (8 cores/node).
4GB RAM/node (512 MB/core)

Intel compilers 10

The same job runs fine with Open MPI.

Thanks,
Sangamesh

> Thanks,
>
> DK
>
> On Sat, 31 Jan 2009, Sangamesh B wrote:
>
>> Hello mvapich2 team,
>>
>>      The CPMD (www.cpmd.org) application is installed with intel
>> compilers on a Rocks4.3 Linux based infiniband supported cluster,
>> mvapich2 version 1.2p1.
>>
>> The 40 process job runs for some time and then fails with following output:
>>
>>  LINE SEARCH : LAMBDA=.164E-01 PREDICTED ENERGY = -1890.824133217
>>   57  9.731E-05   7.571E-06   -1890.824133   -8.483E-07     47.38
>>  LINE SEARCH : LAMBDA=.166E-01 PREDICTED ENERGY = -1890.824133946
>>   58  9.831E-05   7.265E-06   -1890.824134   -7.234E-07     47.41
>>  LINE SEARCH : LAMBDA=.178E-01 PREDICTED ENERGY = -1890.824134657
>>   59  9.529E-05   6.389E-06   -1890.824135   -6.945E-07     47.36
>> rank 17 in job 1  node-0-5.local_32810   caused collective abort of all ranks
>>   exit status of rank 17: killed by signal 9
>> rank 1 in job 1  node-0-5.local_32810   caused collective abort of all ranks
>>   exit status of rank 1: killed by signal 9
>>
>> For several same jobs, it fails around same point (but not exactly at
>> same step).
>>
>> What could be the solution for this?
>>
>> Thanks,
>> Sangamesh
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>


More information about the mvapich-discuss mailing list