[mvapich-discuss] RE: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer

Mehdi Bozzo-Rey mbozzore at platform.com
Mon Sep 8 08:00:41 EDT 2008


>From what I can see in the archive (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001888.html), there is something missing, so I resend my original email.

Mehdi


=======================================
From: Mehdi Bozzo-Rey 
Sent: September-08-08 7:48 AM
To: 'mvapich-discuss at cse.ohio-state.edu'
Subject: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer

Hello,

I recompiled mvapich 1.0.1, BLACS and ScaLAPACK.

- I am able to run the tests included in the BLACS distribution
- I am able to run most of the tests included in the ScaLAPACK distribution, except xcnep and xznep. They fail with the following errors:

Do you have any idea what could be the root cause ?

xcnep:

--------------------------------------------------------------------------
[mbozzore at compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xcnep
ScaLAPACK QSQ^H by Schur Decomposition.
'MPI machine'

Tests of the parallel complex single precision Schur decomposition.
The following scaled residual checks will be computed:
 Residual               = ||H-QSQ^H|| / (||H|| * eps * N )
 Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
The matrix A is randomly generated for each test.

An explanation of the input/output parameters follows:
TIME    : Indicates whether WALL or CPU time was used.
N       : The number of columns in the matrix A.
NB      : The size of the square blocks the matrix A is split into.
P       : The number of process rows.
Q       : The number of process columns.
THRESH  : If a residual value is less than THRESH, CHECK is flagged as PASSED
NEP time : Time in seconds to decompose the  matrix
MFLOPS  : Rate of execution

The following parameter values will be used:
  N       :             1     2     3     4     6    10    50
  NB      :             6     8    17
  P       :             1     2
  Q       :             1     2

Relative machine precision (eps) is taken to be       0.596046E-07
Routines pass computational tests if scaled residual is less than   20.000

TIME     N  NB    P    Q NEP Time   MFLOPS  CHECK
---- ----- --- ---- ---- -------- -------- ------

WALL     1   6    1    1     0.00     1.06 PASSED
WALL     1   8    1    1     0.00    18.00 PASSED
WALL     1  17    1    1     0.00     9.00 PASSED
WALL     2   6    1    1     0.00     2.77 PASSED
WALL     2   8    1    1     0.00    20.57 PASSED
WALL     2  17    1    1     0.00    20.57 PASSED
WALL     3   6    1    1     0.00     4.26 PASSED
WALL     3   8    1    1     0.00    12.15 PASSED
WALL     3  17    1    1     0.00    12.15 PASSED
WALL     4   6    1    1     0.00    19.53 PASSED
WALL     4   8    1    1     0.00    21.33 PASSED
WALL     4  17    1    1     0.00    20.21 PASSED
WALL     6   6    1    1     0.00    30.61 PASSED
WALL     6   8    1    1     0.00    32.67 PASSED
WALL     6  17    1    1     0.00    29.91 PASSED
WALL    10   6    1    1     0.00    72.87 PASSED
WALL    10   8    1    1     0.00    80.00 PASSED
WALL    10  17    1    1     0.00    88.67 PASSED
WALL    50   6    1    1     0.01   408.57 PASSED
WALL    50   8    1    1     0.01   428.08 PASSED
WALL    50  17    1    1     0.00   481.49 PASSED
WALL     1   6    2    2     0.00     0.90 PASSED
WALL     1   8    2    2     0.00     1.50 PASSED
WALL     1  17    2    2     0.00     1.50 PASSED
WALL     2   6    2    2     0.00     1.22 PASSED
WALL     2   8    2    2     0.00     2.53 PASSED
WALL     2  17    2    2     0.00     2.62 PASSED
WALL     3   6    2    2     0.00     1.65 PASSED
WALL     3   8    2    2     0.00     2.09 PASSED
WALL     3  17    2    2     0.00     2.05 PASSED
WALL     4   6    2    2     0.00     4.04 PASSED
WALL     4   8    2    2     0.00     4.13 PASSED
WALL     4  17    2    2     0.00     4.07 PASSED
WALL     6   6    2    2     0.00     6.81 PASSED
WALL     6   8    2    2     0.00     7.51 PASSED
WALL     6  17    2    2     0.00     7.64 PASSED
0 - MPI_RECV : Invalid buffer pointer
2 - MPI_RECV : Invalid buffer pointer
[2] [] Aborting Program!
[0] [] Aborting Program!
Abort signaled by rank 0:  Aborting program !
Exit code -3 signaled from compute-00-02
Killing remote processes...Abort signaled by rank 2:  Aborting program !
MPI process terminated unexpectedly
DONE
[mbozzore at compute-00-02 TESTING]$ Signal 15 received.
Signal 15 received.
--------------------------------------------------------------------------


And xznep:

--------------------------------------------------------------------------
[mbozzore at compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xznep
ScaLAPACK QSQ^H by Schur Decomposition.
'MPI machine'

Tests of the parallel complex double precision Schur decomposition.
The following scaled residual checks will be computed:
 Residual               = ||H-QSQ^H|| / (||H|| * eps * N )
 Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
The matrix A is randomly generated for each test.

An explanation of the input/output parameters follows:
TIME    : Indicates whether WALL or CPU time was used.
N       : The number of columns in the matrix A.
NB      : The size of the square blocks the matrix A is split into.
P       : The number of process rows.
Q       : The number of process columns.
THRESH  : If a residual value is less than THRESH, CHECK is flagged as PASSED
NEP time : Time in seconds to decompose the  matrix
MFLOPS  : Rate of execution

The following parameter values will be used:
  N       :             1     2     3     4     6    10    50
  NB      :             6     8    17
  P       :             1     2
  Q       :             1     2

Relative machine precision (eps) is taken to be       0.111022E-15
Routines pass computational tests if scaled residual is less than   20.000

TIME     N  NB    P    Q NEP Time   MFLOPS  CHECK
---- ----- --- ---- ---- -------- -------- ------

WALL     1   6    1    1     0.00     1.50 PASSED
WALL     1   8    1    1     0.00    18.00 PASSED
WALL     1  17    1    1     0.00    18.00 PASSED
WALL     2   6    1    1     0.00     2.15 PASSED
WALL     2   8    1    1     0.00    16.00 PASSED
WALL     2  17    1    1     0.00    16.00 PASSED
WALL     3   6    1    1     0.00     3.80 PASSED
WALL     3   8    1    1     0.00     8.10 PASSED
WALL     3  17    1    1     0.00     8.24 PASSED
WALL     4   6    1    1     0.00    14.22 PASSED
WALL     4   8    1    1     0.00    15.16 PASSED
WALL     4  17    1    1     0.00    15.16 PASSED
WALL     6   6    1    1     0.00    23.01 PASSED
WALL     6   8    1    1     0.00    23.71 PASSED
WALL     6  17    1    1     0.00    22.74 PASSED
WALL    10   6    1    1     0.00    46.39 PASSED
WALL    10   8    1    1     0.00    51.14 PASSED
WALL    10  17    1    1     0.00    55.56 PASSED
WALL    50   6    1    1     0.01   263.62 PASSED
WALL    50   8    1    1     0.01   283.73 PASSED
WALL    50  17    1    1     0.01   328.23 PASSED
WALL     1   6    2    2     0.00     0.90 PASSED
WALL     1   8    2    2     0.00     1.50 PASSED
WALL     1  17    2    2     0.00     1.50 PASSED
WALL     2   6    2    2     0.00     1.04 PASSED
WALL     2   8    2    2     0.00     2.48 PASSED
WALL     2  17    2    2     0.00     2.53 PASSED
WALL     3   6    2    2     0.00     1.28 PASSED
WALL     3   8    2    2     0.00     1.51 PASSED
WALL     3  17    2    2     0.00     1.51 PASSED
WALL     4   6    2    2     0.00     2.92 PASSED
WALL     4   8    2    2     0.00     2.98 PASSED
WALL     4  17    2    2     0.00     2.95 PASSED
WALL     6   6    2    2     0.00     6.12 PASSED
WALL     6   8    2    2     0.00     6.52 PASSED
WALL     6  17    2    2     0.00     6.92 PASSED
0 - MPI_RECV : Invalid buffer pointer
2 - MPI_RECV : Invalid buffer pointer
[2] [] Aborting Program!
[0] [] Aborting Program!
Abort signaled by rank 2:  Aborting program !
Abort signaled by rank 0:  Aborting program !
Exit code -3 signaled from compute-00-02
Killing remote processes...MPI process terminated unexpectedly
DONE
[mbozzore at compute-00-02 TESTING]$ Signal 15 received.
Signal 15 received.
--------------------------------------------------------------------------



My Bmake.inc and SLmake.inc are attached to this email.

Note: mpich 1.27p1, Open MPI 1.2.4 (IB) and Open MPI 1.2.5 (IB) are OK.

For example:
--------------------------------------------------------------------------
[mbozzore at compute-00-02 openmpi1.2.5]$ ompi_info | less
                Open MPI: 1.2.5
   Open MPI SVN revision: r16989
--------------------------------------------------------------------------


--------------------------------------------------------------------------
[mbozzore at compute-00-02 openmpi1.2.5]$ mpirun -np 4 --machinefile ./hosts --mca btl openib,self ./xcnep
ScaLAPACK QSQ^H by Schur Decomposition.
'MPI machine'

Tests of the parallel complex single precision Schur decomposition.
The following scaled residual checks will be computed:
 Residual               = ||H-QSQ^H|| / (||H|| * eps * N )
 Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
The matrix A is randomly generated for each test.

An explanation of the input/output parameters follows:
TIME    : Indicates whether WALL or CPU time was used.
N       : The number of columns in the matrix A.
NB      : The size of the square blocks the matrix A is split into.
P       : The number of process rows.
Q       : The number of process columns.
THRESH  : If a residual value is less than THRESH, CHECK is flagged as PASSED
NEP time : Time in seconds to decompose the  matrix
MFLOPS  : Rate of execution

The following parameter values will be used:
  N       :             1     2     3     4     6    10    50
  NB      :             6     8    17
  P       :             1     2
  Q       :             1     2

Relative machine precision (eps) is taken to be       0.596046E-07
Routines pass computational tests if scaled residual is less than   20.000

TIME     N  NB    P    Q NEP Time   MFLOPS  CHECK
---- ----- --- ---- ---- -------- -------- ------

WALL     1   6    1    1     0.00     1.51 PASSED
WALL     1   8    1    1     0.00    18.87 PASSED
WALL     1  17    1    1     0.00    18.87 PASSED
WALL     2   6    1    1     0.00     1.87 PASSED
WALL     2   8    1    1     0.00    10.24 PASSED
WALL     2  17    1    1     0.00    18.30 PASSED
WALL     3   6    1    1     0.00     4.19 PASSED
WALL     3   8    1    1     0.00    11.08 PASSED
WALL     3  17    1    1     0.00    11.52 PASSED
WALL     4   6    1    1     0.00    18.58 PASSED
WALL     4   8    1    1     0.00    20.22 PASSED
WALL     4  17    1    1     0.00    20.13 PASSED
WALL     6   6    1    1     0.00    27.78 PASSED
WALL     6   8    1    1     0.00    29.49 PASSED
WALL     6  17    1    1     0.00    28.61 PASSED
WALL    10   6    1    1     0.00    66.17 PASSED
WALL    10   8    1    1     0.00    72.04 PASSED
WALL    10  17    1    1     0.00    81.09 PASSED
WALL    50   6    1    1     0.01   392.33 PASSED
WALL    50   8    1    1     0.01   409.76 PASSED
WALL    50  17    1    1     0.00   463.93 PASSED
WALL     1   6    2    2     0.00     0.31 PASSED
WALL     1   8    2    2     0.00     0.72 PASSED
WALL     1  17    2    2     0.00     0.75 PASSED
WALL     2   6    2    2     0.00     0.76 PASSED
WALL     2   8    2    2     0.00     1.00 PASSED
WALL     2  17    2    2     0.00     1.11 PASSED
WALL     3   6    2    2     0.00     0.55 PASSED
WALL     3   8    2    2     0.00     1.12 PASSED
WALL     3  17    2    2     0.00     1.11 PASSED
WALL     4   6    2    2     0.00     2.20 PASSED
WALL     4   8    2    2     0.00     2.19 PASSED
WALL     4  17    2    2     0.00     2.19 PASSED
WALL     6   6    2    2     0.00     3.96 PASSED
WALL     6   8    2    2     0.00     4.13 PASSED
WALL     6  17    2    2     0.00     4.46 PASSED
WALL    10   6    2    2     0.02     1.16 PASSED
WALL    10   8    2    2     0.00     7.40 PASSED
WALL    10  17    2    2     0.00    13.08 PASSED
WALL    50   6    2    2     0.05    47.80 PASSED
WALL    50   8    2    2     0.04    57.51 PASSED
WALL    50  17    2    2     0.02   128.66 PASSED

Finished     42 tests, with the following results:
   42 tests completed and passed residual checks.
    0 tests completed and failed residual checks.
    0 tests skipped because of illegal input values.


END OF TESTS.
--------------------------------------------------------------------------


Thanks,

Mehdi


Mehdi Bozzo-Rey
HPC Solution Developer
Platform OCS5
Platform computing
Phone: +1 905 948 4649






More information about the mvapich-discuss mailing list