[mvapich-discuss] Errors in BSBR when running xCbtest and xFbest of the BLACS

Margulis, Claudio J claudio-margulis at uiowa.edu
Mon May 20 16:07:48 EDT 2013


Dear Krishna, I very much appreciate you looking into this and passing it to developers. I am replying to the mailing list so that the thread does not remain without conclusion. Hopefully MVAPICH2 will be able to properly deal with scalapack and the BLACS in future releases. I think it is important for people compiling their codes with MVAPICH2 to know that this release version does not pass the BLACS tests.

Is there a way we can get informed when a patch is released that will resolve this issue? Many quantum mechanics codes use these linear algebra routines.

Thanks,
cheers
Claudio

________________________________
From: krishna.kandalla at gmail.com [krishna.kandalla at gmail.com] on behalf of Krishna Kandalla [kandalla at cse.ohio-state.edu]
Sent: Thursday, May 16, 2013 10:19 AM
To: Margulis, Claudio J
Cc: MVAPICH-Core
Subject: Re: [mvapich-discuss] Errors in BSBR when running xCbtest and xFbest of the BLACS

Hi Claudio,
          Thanks for sharing the details. We see the same error message with the xCbtest. We will continue working on this issue.
(I am CC'ing our internal developer list)

Thanks,
Krishna

On Thu, May 16, 2013 at 10:28 AM, Claudio J. Margulis <claudio-margulis at uiowa.edu<mailto:claudio-margulis at uiowa.edu>> wrote:
I guess it would be useful if I also paste my SL.make for scalapack:

############################################################################
#
#  Program:         ScaLAPACK
#
#  Module:          SLmake.inc
#
#  Purpose:         Top-level Definitions
#
#  Creation date:   February 15, 2000
#
#  Modified:        October 13, 2011
#
#  Send bug reports, comments or suggestions to scalapack at cs.utk.edu<mailto:scalapack at cs.utk.edu>
#
############################################################################
#
#  C preprocessor definitions:  set CDEFS to one of the following:
#
#     -DNoChange (fortran subprogram names are lower case without any suffix)
#     -DUpCase   (fortran subprogram names are upper case without any suffix)
#     -DAdd_     (fortran subprogram names are lower case with "_" appended)

CDEFS         = -DAdd_

#
#  The fortran and C compilers, loaders, and their flags
#

FC            = /usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/bin/mpif90
CC            = /usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/bin/mpicc
NOOPT         = -O0
FCFLAGS       = -O3
CCFLAGS       = -O3
FCLOADER      = $(FC)
CCLOADER      = $(CC)
FCLOADFLAGS   = $(FCFLAGS)
CCLOADFLAGS   = $(CCFLAGS)

#
#  The archiver and the flag(s) to use when building archive (library)
#  Also the ranlib routine.  If your system has no ranlib, set RANLIB = echo
#

ARCH          = ar
ARCHFLAGS     = cr
RANLIB        = ranlib

#
#  The name of the ScaLAPACK library to be created
#

SCALAPACKLIB  = libscalapack.a

#
#  BLAS, LAPACK (and possibly other) libraries needed for linking test programs
#

#BLASLIB       =
LAPACKLIB     =
LIBS          = /shared/acml-4.4.0/gfortran64/lib/libacml.a




Claudio J. Margulis wrote:
Dear Krishna, I don't think there are any special options: This were the commands:


gunzip mvapich2-1.9.tgz
tar -xvf mvapich2-1.9.tar
cd mvapich2-1.9
export LD_LIBRARY_PATH=/shared/gcc-4.5.1/lib64:/shared/gcc-4.5.1/lib:/shared/mpc-0.8.2/lib:/shared/mpfr-3.0.0/lib:/shared/gmp-4.3.2/lib
./configure --prefix=/usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1 CC=/shared/gcc-4.5.1/bin/gcc CXX=/shared/gcc-4.5.1/bin/g++ F77=/shared/gcc-4.5.1/bin/gfortran FC=/shared/gcc-4.5.1/bin/gfortran
 make -j 16 >&make.log &
make install


cd scalapack-mvapich2-1.9/
tar -xvf scalapack-2.0.2.tar
cd scalapack-2.0.2
export LD_LIBRARY_PATH=/usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/lib:$LD_LIBRARY_PATH
make all
cd BLACS/TESTING/
 /usr/local/chemistry_software/mvapich2-1.9/gcc-4.5.1/bin/mpirun -np 16 ./xCbtest

I don't want to paste all the errors I get but a sample follows for the BSBR section:

INTEGER BSBR TESTS: BEGIN.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  2161:
   Invalid element at A(   2,   1):
   Expected=     -995413; Received=          -2
   Complementory triangle overwrite at A(   1,   1):
   Expected=          -2; Received=          -1
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  2161.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  2162:
   Invalid element at A(   2,   1):
   Expected=     -219319; Received=          -2
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  2162.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  3761:
   Invalid element at A(   2,   1):
   Expected=      574430; Received=          -2
   Complementory triangle overwrite at A(   1,   1):
   Expected=          -2; Received=          -1
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  3761.

PROCESS {   0,   1} REPORTS ERRORS IN TEST#  4561:
   Invalid element at A(   2,   1):
   Expected=      716842; Received=          -2
   Complementory triangle overwrite at A(   1,   1):
   Expected=          -2; Received=          -1
PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  4561.

PROCESS {   1,   0} REPORTS ERRORS IN TEST#  1361:
   Invalid element at A(   2,   1):
   Expected=      862174; Received=          -2
   Complementory triangle overwrite at A(   1,   1):
   Expected=          -2; Received=          -1
PROCESS {   1,   0} DONE ERROR REPORT FOR TEST#  1361.

PROCESS {   1,   0} REPORTS ERRORS IN TEST#  2161:
   Invalid element at A(   2,   1):
   Expected=     -995413; Received=          -2
   Complementory triangle overwrite at A(   1,   1):
   Expected=          -2; Received=          -1
PROCESS {   1,   0} DONE ERROR REPORT FOR TEST#  2161.

PROCESS {   1,   0} REPORTS ERRORS IN TEST#  3761:
   Invalid element at A(   2,   1):
   Expected=      574430; Received=          -2


These errors do not occur when using the old broadcast method (i.e. with environmental variable MV2_USE_OLD_BCAST set to 1. There is also the issue of timing but lets deal with one thing at a time.

Furthermore, it seems like I am not the only one getting these errors. If you look at my original posting there is a link to:
http://fpmd.ucdavis.edu/qbox-list/viewtopic.php?p=290
which reports exactly the same issues.

Do you have any special setting that I may not be aware of that might result in successful output in your case?

Thanks for your help.
Cheers,
Claudio

Krishna Kandalla wrote:
Hello Claudio,

    I just tried running the xdqr test with 16 processes (one node) on the TACC Stampede cluster. The overall execution time for this test, with or without this flag does not seem to vary much. I am seeing about 1.02 - 1.06s as the total time. This test also completes correctly without the env variable that we had discussed. And, if it helps, I am also seeing that this test takes about 1.7s with Open-MPI-1.6.4.
    If you are using any specific configure/run-time options for the MVAPICH2-1.9 library, could you please share the details?

Thanks,
Krishna

On Wed, May 15, 2013 at 10:31 AM, Claudio J. Margulis <claudio-margulis at uiowa.edu<mailto:claudio-margulis at uiowa.edu> <mailto:claudio-margulis at uiowa.edu<mailto:claudio-margulis at uiowa.edu>>> wrote:

    It seems that my mail did't go through so I am resending it.
    Please read below.
    Claudio


    Claudio J. Margulis wrote:

        Dear Krishna, thanks for responding.
        Yes, with that environmental variable the errors are gone.
        However run time for the tests become extremely long.
        As an example a typical scalapack test
        mpirun -np 16 ./xdqr <QR.dat that takes a second to run with
        openmpi takes on the order of minutes with mvapich2.

        Claudio


    --     signature.html Claudio J. Margulis

    Associate Professor of Chemistry
    The University of Iowa
    Margulis Group Page
<http://www.chem.uiowa.edu/faculty/margulis/group/first.html>




--
signature.html Claudio J. Margulis
Associate Professor of Chemistry
The University of Iowa
Margulis Group Page <http://www.chem.uiowa.edu/faculty/margulis/group/first.html>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130520/b44d1192/attachment.html


More information about the mvapich-discuss mailing list