[mvapich-discuss] Errors in BSBR when running xCbtest and xFbest
of the BLACS
Claudio J. Margulis
claudio-margulis at uiowa.edu
Thu May 16 10:18:16 EDT 2013
Dear Krishna, I don't think there are any special options: This were the
commands:
gunzip mvapich2-1.9.tgz
tar -xvf mvapich2-1.9.tar
cd mvapich2-1.9
export
LD_LIBRARY_PATH=/shared/gcc-4.5.1/lib64:/shared/gcc-4.5.1/lib:/shared/mpc-0.8.2/lib:/shared/mpfr-3.0.0/lib:/shared/gmp-4.3.2/lib
./configure
--prefix=/usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1
CC=/shared/gcc-4.5.1/bin/gcc CXX=/shared/gcc-4.5.1/bin/g++
F77=/shared/gcc-4.5.1/bin/gfortran FC=/shared/gcc-4.5.1/bin/gfortran
make -j 16 >&make.log &
make install
cd scalapack-mvapich2-1.9/
tar -xvf scalapack-2.0.2.tar
cd scalapack-2.0.2
export
LD_LIBRARY_PATH=/usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/lib:$LD_LIBRARY_PATH
make all
cd BLACS/TESTING/
/usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/bin/mpirun -np 16
./xCbtest
I don't want to paste all the errors I get but a sample follows for the
BSBR section:
INTEGER BSBR TESTS: BEGIN.
PROCESS { 0, 1} REPORTS ERRORS IN TEST# 2161:
Invalid element at A( 2, 1):
Expected= -995413; Received= -2
Complementory triangle overwrite at A( 1, 1):
Expected= -2; Received= -1
PROCESS { 0, 1} DONE ERROR REPORT FOR TEST# 2161.
PROCESS { 0, 1} REPORTS ERRORS IN TEST# 2162:
Invalid element at A( 2, 1):
Expected= -219319; Received= -2
PROCESS { 0, 1} DONE ERROR REPORT FOR TEST# 2162.
PROCESS { 0, 1} REPORTS ERRORS IN TEST# 3761:
Invalid element at A( 2, 1):
Expected= 574430; Received= -2
Complementory triangle overwrite at A( 1, 1):
Expected= -2; Received= -1
PROCESS { 0, 1} DONE ERROR REPORT FOR TEST# 3761.
PROCESS { 0, 1} REPORTS ERRORS IN TEST# 4561:
Invalid element at A( 2, 1):
Expected= 716842; Received= -2
Complementory triangle overwrite at A( 1, 1):
Expected= -2; Received= -1
PROCESS { 0, 1} DONE ERROR REPORT FOR TEST# 4561.
PROCESS { 1, 0} REPORTS ERRORS IN TEST# 1361:
Invalid element at A( 2, 1):
Expected= 862174; Received= -2
Complementory triangle overwrite at A( 1, 1):
Expected= -2; Received= -1
PROCESS { 1, 0} DONE ERROR REPORT FOR TEST# 1361.
PROCESS { 1, 0} REPORTS ERRORS IN TEST# 2161:
Invalid element at A( 2, 1):
Expected= -995413; Received= -2
Complementory triangle overwrite at A( 1, 1):
Expected= -2; Received= -1
PROCESS { 1, 0} DONE ERROR REPORT FOR TEST# 2161.
PROCESS { 1, 0} REPORTS ERRORS IN TEST# 3761:
Invalid element at A( 2, 1):
Expected= 574430; Received= -2
These errors do not occur when using the old broadcast method (i.e. with
environmental variable MV2_USE_OLD_BCAST set to 1. There is also the
issue of timing but lets deal with one thing at a time.
Furthermore, it seems like I am not the only one getting these errors.
If you look at my original posting there is a link to:
http://fpmd.ucdavis.edu/qbox-list/viewtopic.php?p=290
which reports exactly the same issues.
Do you have any special setting that I may not be aware of that might
result in successful output in your case?
Thanks for your help.
Cheers,
Claudio
Krishna Kandalla wrote:
> Hello Claudio,
>
> I just tried running the xdqr test with 16 processes (one node) on
> the TACC Stampede cluster. The overall execution time for this test,
> with or without this flag does not seem to vary much. I am seeing
> about 1.02 - 1.06s as the total time. This test also completes
> correctly without the env variable that we had discussed. And, if it
> helps, I am also seeing that this test takes about 1.7s with
> Open-MPI-1.6.4.
> If you are using any specific configure/run-time options for the
> MVAPICH2-1.9 library, could you please share the details?
>
> Thanks,
> Krishna
>
> On Wed, May 15, 2013 at 10:31 AM, Claudio J. Margulis
> <claudio-margulis at uiowa.edu <mailto:claudio-margulis at uiowa.edu>> wrote:
>
> It seems that my mail did't go through so I am resending it.
> Please read below.
> Claudio
>
>
> Claudio J. Margulis wrote:
>
> Dear Krishna, thanks for responding.
> Yes, with that environmental variable the errors are gone.
> However run time for the tests become extremely long.
> As an example a typical scalapack test
> mpirun -np 16 ./xdqr <QR.dat that takes a second to run with
> openmpi takes on the order of minutes with mvapich2.
>
> Claudio
>
>
> --
> signature.html Claudio J. Margulis
>
> Associate Professor of Chemistry
> The University of Iowa
> Margulis Group Page
> <http://www.chem.uiowa.edu/faculty/margulis/group/first.html>
>
>
--
signature.html Claudio J. Margulis
Associate Professor of Chemistry
The University of Iowa
Margulis Group Page
<http://www.chem.uiowa.edu/faculty/margulis/group/first.html>
More information about the mvapich-discuss
mailing list