[mvapich-discuss] Possible mvapich bug (possibly not as well).
Bharat
mbkumar at gmail.com
Fri Sep 19 18:30:12 EDT 2008
I observed a similar problem with a program, which calls SCALAPCK routines.
The trouble was the program for certain job distribution patterns runs fine
upto certain stage and then gives no further output, but the threads run
@ 100% CPU utilization.
The same program would run till the end for following cases
64 threads on 16nodes (8 cores/node), 64 threads on 8 machines and some
others,
but gives problems for 32 threads, 16threads on 16 machines....
I was using mvapich2-1.0.3, mkl 10.0.1.014 mkl libraries
After replacing mvapich2-1.0.3 with mvapich2-1.2RC2, the problem
disappeared.
Rgds,
Bharat
On Fri, 19 Sep 2008 14:47:33 -0600, Laurence Marks
<L-marks at northwestern.edu> wrote:
> I have a highly reproducible, but so far untraceable problem. It could
> be due to mvapich, but also not.
>
> In a code which calls the scalapack subroutine PDSYGST (which uses two
> distributed matrices), if the matrices are 36927x36927 it works fine;
> if they are 38381x38381 it runs forever, i.e.until I kill it.
>
> This behavior occurs for the Intel mkl versions 10.0.3.020,
> 10.0.4.023, 10.1.0.009 and ifort/icc versions 10.1.015 and 10.1.018.
> It occurs for both an April 2008 svn of mvapich, and an svn of a few
> days ago. It also occurs with OFED-1.2.5.5 and OFED-1.3.
>
> I would welcome any suggestions.
>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
More information about the mvapich-discuss
mailing list