[mvapich-discuss] Errors in BSBR when running xCbtest and xFbest of the BLACS

Claudio J. Margulis claudio-margulis at uiowa.edu
Thu May 16 10:18:16 EDT 2013


Dear Krishna, I don't think there are any special options: This were the 
commands:


gunzip mvapich2-1.9.tgz
tar -xvf mvapich2-1.9.tar
cd mvapich2-1.9
export 
LD_LIBRARY_PATH=/shared/gcc-4.5.1/lib64:/shared/gcc-4.5.1/lib:/shared/mpc-0.8.2/lib:/shared/mpfr-3.0.0/lib:/shared/gmp-4.3.2/lib
./configure 
--prefix=/usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1 
CC=/shared/gcc-4.5.1/bin/gcc CXX=/shared/gcc-4.5.1/bin/g++ 
F77=/shared/gcc-4.5.1/bin/gfortran FC=/shared/gcc-4.5.1/bin/gfortran
  make -j 16 >&make.log &
make install


cd scalapack-mvapich2-1.9/
tar -xvf scalapack-2.0.2.tar
cd scalapack-2.0.2
export 
LD_LIBRARY_PATH=/usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/lib:$LD_LIBRARY_PATH 

make all
cd BLACS/TESTING/
  /usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/bin/mpirun -np 16 
./xCbtest

I don't want to paste all the errors I get but a sample follows for the 
BSBR section:

INTEGER BSBR TESTS: BEGIN.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  2161:
    Invalid element at A(   2,   1):
    Expected=     -995413; Received=          -2
    Complementory triangle overwrite at A(   1,   1):
    Expected=          -2; Received=          -1
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  2161.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  2162:
    Invalid element at A(   2,   1):
    Expected=     -219319; Received=          -2
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  2162.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  3761:
    Invalid element at A(   2,   1):
    Expected=      574430; Received=          -2
    Complementory triangle overwrite at A(   1,   1):
    Expected=          -2; Received=          -1
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  3761.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  4561:
    Invalid element at A(   2,   1):
    Expected=      716842; Received=          -2
    Complementory triangle overwrite at A(   1,   1):
    Expected=          -2; Received=          -1
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  4561.

PROCESS {   1,   0} REPORTS ERRORS IN TEST#  1361:
    Invalid element at A(   2,   1):
    Expected=      862174; Received=          -2
    Complementory triangle overwrite at A(   1,   1):
    Expected=          -2; Received=          -1
PROCESS {   1,   0} DONE ERROR REPORT FOR TEST#  1361.

PROCESS {   1,   0} REPORTS ERRORS IN TEST#  2161:
    Invalid element at A(   2,   1):
    Expected=     -995413; Received=          -2
    Complementory triangle overwrite at A(   1,   1):
    Expected=          -2; Received=          -1
PROCESS {   1,   0} DONE ERROR REPORT FOR TEST#  2161.

PROCESS {   1,   0} REPORTS ERRORS IN TEST#  3761:
    Invalid element at A(   2,   1):
    Expected=      574430; Received=          -2


These errors do not occur when using the old broadcast method (i.e. with 
environmental variable MV2_USE_OLD_BCAST set to 1. There is also the 
issue of timing but lets deal with one thing at a time.

Furthermore, it seems like I am not the only one getting these errors. 
If you look at my original posting there is a link to:
http://fpmd.ucdavis.edu/qbox-list/viewtopic.php?p=290
which reports exactly the same issues.

Do you have any special setting that I may not be aware of that might 
result in successful output in your case?

Thanks for your help.
Cheers,
Claudio

Krishna Kandalla wrote:
> Hello Claudio,
>
>     I just tried running the xdqr test with 16 processes (one node) on 
> the TACC Stampede cluster. The overall execution time for this test, 
> with or without this flag does not seem to vary much. I am seeing 
> about 1.02 - 1.06s as the total time. This test also completes 
> correctly without the env variable that we had discussed. And, if it 
> helps, I am also seeing that this test takes about 1.7s with 
> Open-MPI-1.6.4.
>     If you are using any specific configure/run-time options for the 
> MVAPICH2-1.9 library, could you please share the details?
>
> Thanks,
> Krishna
>
> On Wed, May 15, 2013 at 10:31 AM, Claudio J. Margulis 
> <claudio-margulis at uiowa.edu <mailto:claudio-margulis at uiowa.edu>> wrote:
>
>     It seems that my mail did't go through so I am resending it.
>     Please read below.
>     Claudio
>
>
>     Claudio J. Margulis wrote:
>
>         Dear Krishna, thanks for responding.
>         Yes, with that environmental variable the errors are gone.
>         However run time for the tests become extremely long.
>         As an example a typical scalapack test
>         mpirun -np 16 ./xdqr <QR.dat that takes a second to run with
>         openmpi takes on the order of minutes with mvapich2.
>
>         Claudio
>
>
>     -- 
>     signature.html Claudio J. Margulis
>
>     Associate Professor of Chemistry
>     The University of Iowa
>     Margulis Group Page
>     <http://www.chem.uiowa.edu/faculty/margulis/group/first.html>
>
>

-- 
signature.html Claudio J. Margulis
Associate Professor of Chemistry
The University of Iowa
Margulis Group Page 
<http://www.chem.uiowa.edu/faculty/margulis/group/first.html>



More information about the mvapich-discuss mailing list