[mvapich-discuss] Errors in BSBR when running xCbtest and xFbest of the BLACS

Krishna Kandalla kandalla at cse.ohio-state.edu
Mon May 20 16:57:39 EDT 2013


Hello Claudio,

   Thanks for checking back with us. We are working on this issue. We will
be posting an update on the mvapich-discuss list soon.

Thanks,
Krishna

On Mon, May 20, 2013 at 4:07 PM, Margulis, Claudio J <
claudio-margulis at uiowa.edu> wrote:

>  Dear Krishna, I very much appreciate you looking into this and passing
> it to developers. I am replying to the mailing list so that the thread does
> not remain without conclusion. Hopefully MVAPICH2 will be able to properly
> deal with scalapack and the BLACS in future releases. I think it is
> important for people compiling their codes with MVAPICH2 to know that this
> release version does not pass the BLACS tests.
>
> Is there a way we can get informed when a patch is released that will
> resolve this issue? Many quantum mechanics codes use these linear algebra
> routines.
>
> Thanks,
> cheers
> Claudio
>
>  ------------------------------
> *From:* krishna.kandalla at gmail.com [krishna.kandalla at gmail.com] on behalf
> of Krishna Kandalla [kandalla at cse.ohio-state.edu]
> *Sent:* Thursday, May 16, 2013 10:19 AM
> *To:* Margulis, Claudio J
> *Cc:* MVAPICH-Core
> *Subject:* Re: [mvapich-discuss] Errors in BSBR when running xCbtest and
> xFbest of the BLACS
>
>  Hi Claudio,
>           Thanks for sharing the details. We see the same error message
> with the xCbtest. We will continue working on this issue.
> (I am CC'ing our internal developer list)
>
>  Thanks,
> Krishna
>
> On Thu, May 16, 2013 at 10:28 AM, Claudio J. Margulis <
> claudio-margulis at uiowa.edu> wrote:
>
>> I guess it would be useful if I also paste my SL.make for scalapack:
>>
>> ##############################**##############################**
>> ################
>> #
>> #  Program:         ScaLAPACK
>> #
>> #  Module:          SLmake.inc
>> #
>> #  Purpose:         Top-level Definitions
>> #
>> #  Creation date:   February 15, 2000
>> #
>> #  Modified:        October 13, 2011
>> #
>> #  Send bug reports, comments or suggestions to scalapack at cs.utk.edu
>> #
>> ##############################**##############################**
>> ################
>> #
>> #  C preprocessor definitions:  set CDEFS to one of the following:
>> #
>> #     -DNoChange (fortran subprogram names are lower case without any
>> suffix)
>> #     -DUpCase   (fortran subprogram names are upper case without any
>> suffix)
>> #     -DAdd_     (fortran subprogram names are lower case with "_"
>> appended)
>>
>> CDEFS         = -DAdd_
>>
>> #
>> #  The fortran and C compilers, loaders, and their flags
>> #
>>
>> FC            = /usr/local/chemistry_software/**
>> mvapich2-1.9/gcc-4.5.1/bin/**mpif90
>> CC            = /usr/local/chemistry_software/**
>> mvapich2-1.9/gcc-4.5.1/bin/**mpicc
>> NOOPT         = -O0
>> FCFLAGS       = -O3
>> CCFLAGS       = -O3
>> FCLOADER      = $(FC)
>> CCLOADER      = $(CC)
>> FCLOADFLAGS   = $(FCFLAGS)
>> CCLOADFLAGS   = $(CCFLAGS)
>>
>> #
>> #  The archiver and the flag(s) to use when building archive (library)
>> #  Also the ranlib routine.  If your system has no ranlib, set RANLIB =
>> echo
>> #
>>
>> ARCH          = ar
>> ARCHFLAGS     = cr
>> RANLIB        = ranlib
>>
>> #
>> #  The name of the ScaLAPACK library to be created
>> #
>>
>> SCALAPACKLIB  = libscalapack.a
>>
>> #
>> #  BLAS, LAPACK (and possibly other) libraries needed for linking test
>> programs
>> #
>>
>> #BLASLIB       =
>> LAPACKLIB     =
>> LIBS          = /shared/acml-4.4.0/gfortran64/**lib/libacml.a
>>
>>
>>
>>
>> Claudio J. Margulis wrote:
>>
>>> Dear Krishna, I don't think there are any special options: This were the
>>> commands:
>>>
>>>
>>> gunzip mvapich2-1.9.tgz
>>> tar -xvf mvapich2-1.9.tar
>>> cd mvapich2-1.9
>>> export LD_LIBRARY_PATH=/shared/gcc-4.**5.1/lib64:/shared/gcc-4.5.1/**
>>> lib:/shared/mpc-0.8.2/lib:/**shared/mpfr-3.0.0/lib:/shared/**
>>> gmp-4.3.2/lib
>>> ./configure --prefix=/usr/local/chemistry_**
>>> software/mvapich2-1.9/gcc-4.5.**1 CC=/shared/gcc-4.5.1/bin/gcc
>>> CXX=/shared/gcc-4.5.1/bin/g++ F77=/shared/gcc-4.5.1/bin/**gfortran
>>> FC=/shared/gcc-4.5.1/bin/**gfortran
>>>  make -j 16 >&make.log &
>>> make install
>>>
>>>
>>> cd scalapack-mvapich2-1.9/
>>> tar -xvf scalapack-2.0.2.tar
>>> cd scalapack-2.0.2
>>> export LD_LIBRARY_PATH=/usr/local/**chemistry_software/mvapich2-1.**
>>> 9/gcc-4.5.1/lib:$LD_LIBRARY_**PATH
>>> make all
>>> cd BLACS/TESTING/
>>>  /usr/local/chemistry_software/**mvapich2-1.9/gcc-4.5.1/bin/**mpirun
>>> -np 16 ./xCbtest
>>>
>>> I don't want to paste all the errors I get but a sample follows for the
>>> BSBR section:
>>>
>>> INTEGER BSBR TESTS: BEGIN.
>>>
>>> PROCESS {   0,   1} REPORTS ERRORS IN TEST#  2161:
>>>    Invalid element at A(   2,   1):
>>>    Expected=     -995413; Received=          -2
>>>    Complementory triangle overwrite at A(   1,   1):
>>>    Expected=          -2; Received=          -1
>>> PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  2161.
>>>
>>> PROCESS {   0,   1} REPORTS ERRORS IN TEST#  2162:
>>>    Invalid element at A(   2,   1):
>>>    Expected=     -219319; Received=          -2
>>> PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  2162.
>>>
>>> PROCESS {   0,   1} REPORTS ERRORS IN TEST#  3761:
>>>    Invalid element at A(   2,   1):
>>>    Expected=      574430; Received=          -2
>>>    Complementory triangle overwrite at A(   1,   1):
>>>    Expected=          -2; Received=          -1
>>> PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  3761.
>>>
>>> PROCESS {   0,   1} REPORTS ERRORS IN TEST#  4561:
>>>    Invalid element at A(   2,   1):
>>>    Expected=      716842; Received=          -2
>>>    Complementory triangle overwrite at A(   1,   1):
>>>    Expected=          -2; Received=          -1
>>> PROCESS {   0,   1} DONE ERROR REPORT FOR TEST#  4561.
>>>
>>> PROCESS {   1,   0} REPORTS ERRORS IN TEST#  1361:
>>>    Invalid element at A(   2,   1):
>>>    Expected=      862174; Received=          -2
>>>    Complementory triangle overwrite at A(   1,   1):
>>>    Expected=          -2; Received=          -1
>>> PROCESS {   1,   0} DONE ERROR REPORT FOR TEST#  1361.
>>>
>>> PROCESS {   1,   0} REPORTS ERRORS IN TEST#  2161:
>>>    Invalid element at A(   2,   1):
>>>    Expected=     -995413; Received=          -2
>>>    Complementory triangle overwrite at A(   1,   1):
>>>    Expected=          -2; Received=          -1
>>> PROCESS {   1,   0} DONE ERROR REPORT FOR TEST#  2161.
>>>
>>> PROCESS {   1,   0} REPORTS ERRORS IN TEST#  3761:
>>>    Invalid element at A(   2,   1):
>>>    Expected=      574430; Received=          -2
>>>
>>>
>>> These errors do not occur when using the old broadcast method (i.e. with
>>> environmental variable MV2_USE_OLD_BCAST set to 1. There is also the issue
>>> of timing but lets deal with one thing at a time.
>>>
>>> Furthermore, it seems like I am not the only one getting these errors.
>>> If you look at my original posting there is a link to:
>>> http://fpmd.ucdavis.edu/qbox-**list/viewtopic.php?p=290<http://fpmd.ucdavis.edu/qbox-list/viewtopic.php?p=290>
>>> which reports exactly the same issues.
>>>
>>> Do you have any special setting that I may not be aware of that might
>>> result in successful output in your case?
>>>
>>> Thanks for your help.
>>> Cheers,
>>> Claudio
>>>
>>> Krishna Kandalla wrote:
>>>
>>>> Hello Claudio,
>>>>
>>>>     I just tried running the xdqr test with 16 processes (one node) on
>>>> the TACC Stampede cluster. The overall execution time for this test, with
>>>> or without this flag does not seem to vary much. I am seeing about 1.02 -
>>>> 1.06s as the total time. This test also completes correctly without the env
>>>> variable that we had discussed. And, if it helps, I am also seeing that
>>>> this test takes about 1.7s with Open-MPI-1.6.4.
>>>>     If you are using any specific configure/run-time options for the
>>>> MVAPICH2-1.9 library, could you please share the details?
>>>>
>>>> Thanks,
>>>> Krishna
>>>>
>>>> On Wed, May 15, 2013 at 10:31 AM, Claudio J. Margulis <
>>>> claudio-margulis at uiowa.edu <mailto:claudio-margulis@**uiowa.edu<claudio-margulis at uiowa.edu>>>
>>>> wrote:
>>>>
>>>>     It seems that my mail did't go through so I am resending it.
>>>>     Please read below.
>>>>     Claudio
>>>>
>>>>
>>>>     Claudio J. Margulis wrote:
>>>>
>>>>         Dear Krishna, thanks for responding.
>>>>         Yes, with that environmental variable the errors are gone.
>>>>         However run time for the tests become extremely long.
>>>>         As an example a typical scalapack test
>>>>         mpirun -np 16 ./xdqr <QR.dat that takes a second to run with
>>>>         openmpi takes on the order of minutes with mvapich2.
>>>>
>>>>         Claudio
>>>>
>>>>
>>>>     --     signature.html Claudio J. Margulis
>>>>
>>>>     Associate Professor of Chemistry
>>>>     The University of Iowa
>>>>     Margulis Group Page
>>>> <http://www.chem.uiowa.edu/**faculty/margulis/group/first.**html<http://www.chem.uiowa.edu/faculty/margulis/group/first.html>
>>>> >
>>>>
>>>>
>>>>
>>>
>> --
>> signature.html Claudio J. Margulis
>> Associate Professor of Chemistry
>> The University of Iowa
>> Margulis Group Page <http://www.chem.uiowa.edu/**
>> faculty/margulis/group/first.**html<http://www.chem.uiowa.edu/faculty/margulis/group/first.html>
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130520/d3768553/attachment-0001.html


More information about the mvapich-discuss mailing list