[mvapich-discuss] xcbrd tests

Shaun Rowland rowland at cse.ohio-state.edu
Sat Apr 7 00:03:08 EDT 2007


Bas van der Vlies wrote:
> Just tried mvapich1 version 0.9.9 fro svn and this also fails the xcbrd 
> test:
> 
> Relative machine precision (eps) is taken to be       0.596046E-07
> Routines pass computational tests if scaled residual is less than 10.000
> 
> TIME      M      N  NB     P     Q  BRD Time      MFLOPS Residual  CHECK
> ---- ------ ------ --- ----- ----- --------- ----------- -------- ------
> 
> WALL      4      4   2     1     1      0.00        0.00     0.58 PASSED
> ||A - Q*B*P|| / (||A|| * N * eps) =                       NaN
> WALL      4      4   3     1     1      0.00        0.00      NaN FAILED
> ||A - Q*B*P|| / (||A|| * N * eps) =                       NaN
> WALL      4      4   4     1     1      0.00        0.00      NaN FAILED

Hi Bas. I've been looking into this issue for a while. I believe I
know what the problem is. I built the following packages:

BLACS
ATLAS
ScaLAPACK (using the two above)

I built these four times for the following MPI installations:

MVAPICH 0.9.9 with gfortran
MVAPICH 0.9.9 with g77
MVAPICH2 0.9.8 with gfortran
MVAPICH2 0.9.8 with g77

The only time I had a problem was when I accidentally built the ATLAS
package with g77 and tried to use it with the other packages that were
built with gfortran. I thought this was probably the problem here
anyway, but since I accidentally did this the first time - I could see
that I got the same errors as you had reported for that one case. I
believe your problem is that all of your packages, including
MVAPICH/MVAPICH2, were not built with the same Fortran compiler. The
Fortran compiler needs to be common in all builds or you will run into
problems, and the problems won't be apparent until you try to use the
libraries and get strange results. This is exactly the type of situation
you've reported.

If you make sure that ScaLAPACK is built with either g77 or gfortran,
matching the MVAPICH/MVAPICH2 build you want to test - and also any of
the dependencies of ScaLAPACK as well - then these strange problems
should go away. The Fortran compiler needs to be the same all around. I
have not yet run every ScaLAPACK test, but I ran the ones you reported.
I had no issues when the Fortran compilers were uniform. Only when there
was a cross g77/gfortran built library introduced did I see the same
behavior.

I have notes on how I built the packages listed above:

BLACS
-----
http://www.cse.ohio-state.edu/~rowland/work/blacs.html

ATLAS
-----
http://www.cse.ohio-state.edu/~rowland/work/atlas.html

ScaLAPACK
---------
http://www.cse.ohio-state.edu/~rowland/work/scalapack.html

Maybe those notes will be useful to compare with. On a side note: if you
are using shared library builds of MVAPICH/MVAPICH2 to test, be sure
that the path to libmpich.a is not used in any of these configuration
files because the mpicc and mpif77 commands place the path of the shared
library right into the binary result, and this will cause problems if
the programs are statically linked in a weird way like this. No one
would or should do that sort of thing, but with these configuration
files you need to edit - it's possible to make a mistake here. If you
are using a static library build of MVAPICH/MVAPICH2, this does not
matter. The steps I've outlined note this appropriately and do the right
thing.

Also, you do need GFORTRAN_UNBUFFERED_ALL=y set in your environment for
the gfortran cases. For MVAPICH2, simply export that variable. For
MVAPICH it needs to be specified on the mpirun_rsh command line:

mpirun_rsh -np 4 host1 host2 host3 host4 GFORTRAN_UNBUFFERED_ALL=y ./test

for example. This is noted on the web pages above too.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


More information about the mvapich-discuss mailing list