[mvapich-discuss] Possible mvapich bug (possibly not as well).

Bharat mbkumar at gmail.com
Fri Sep 19 18:30:12 EDT 2008


I observed a similar problem with a program, which calls SCALAPCK routines.
The trouble was the program for certain job distribution patterns runs fine
upto certain stage and then gives no further  output, but the threads run
@ 100% CPU utilization.
The same program would run till the end for following cases
64 threads on 16nodes (8 cores/node), 64 threads on 8 machines and some  
others,
but gives problems for 32 threads, 16threads on 16 machines....
I was using mvapich2-1.0.3, mkl 10.0.1.014 mkl libraries

After replacing mvapich2-1.0.3 with mvapich2-1.2RC2, the problem  
disappeared.


Rgds,
Bharat

On Fri, 19 Sep 2008 14:47:33 -0600, Laurence Marks  
<L-marks at northwestern.edu> wrote:

> I have a highly reproducible, but so far untraceable problem. It could
> be due to mvapich, but also not.
>
> In a code which calls the scalapack subroutine PDSYGST (which uses two
> distributed matrices), if the matrices are 36927x36927 it works fine;
> if they are 38381x38381 it runs forever, i.e.until I kill it.
>
> This behavior occurs for the Intel mkl versions 10.0.3.020,
> 10.0.4.023, 10.1.0.009 and ifort/icc versions 10.1.015 and 10.1.018.
> It occurs for both an April 2008 svn of mvapich, and an svn of a few
> days ago. It also occurs with OFED-1.2.5.5 and OFED-1.3.
>
> I would welcome any suggestions.
>



-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


More information about the mvapich-discuss mailing list