[mvapich-discuss] RE: mvapich 1.01 / scalapack: xcnep and xznep
fail with MPI_RECV : Invalid buffer pointer
Mehdi Bozzo-Rey
mbozzore at platform.com
Mon Sep 8 08:00:41 EDT 2008
>From what I can see in the archive (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001888.html), there is something missing, so I resend my original email.
Mehdi
=======================================
From: Mehdi Bozzo-Rey
Sent: September-08-08 7:48 AM
To: 'mvapich-discuss at cse.ohio-state.edu'
Subject: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer
Hello,
I recompiled mvapich 1.0.1, BLACS and ScaLAPACK.
- I am able to run the tests included in the BLACS distribution
- I am able to run most of the tests included in the ScaLAPACK distribution, except xcnep and xznep. They fail with the following errors:
Do you have any idea what could be the root cause ?
xcnep:
--------------------------------------------------------------------------
[mbozzore at compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xcnep
ScaLAPACK QSQ^H by Schur Decomposition.
'MPI machine'
Tests of the parallel complex single precision Schur decomposition.
The following scaled residual checks will be computed:
Residual = ||H-QSQ^H|| / (||H|| * eps * N )
Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
The matrix A is randomly generated for each test.
An explanation of the input/output parameters follows:
TIME : Indicates whether WALL or CPU time was used.
N : The number of columns in the matrix A.
NB : The size of the square blocks the matrix A is split into.
P : The number of process rows.
Q : The number of process columns.
THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED
NEP time : Time in seconds to decompose the matrix
MFLOPS : Rate of execution
The following parameter values will be used:
N : 1 2 3 4 6 10 50
NB : 6 8 17
P : 1 2
Q : 1 2
Relative machine precision (eps) is taken to be 0.596046E-07
Routines pass computational tests if scaled residual is less than 20.000
TIME N NB P Q NEP Time MFLOPS CHECK
---- ----- --- ---- ---- -------- -------- ------
WALL 1 6 1 1 0.00 1.06 PASSED
WALL 1 8 1 1 0.00 18.00 PASSED
WALL 1 17 1 1 0.00 9.00 PASSED
WALL 2 6 1 1 0.00 2.77 PASSED
WALL 2 8 1 1 0.00 20.57 PASSED
WALL 2 17 1 1 0.00 20.57 PASSED
WALL 3 6 1 1 0.00 4.26 PASSED
WALL 3 8 1 1 0.00 12.15 PASSED
WALL 3 17 1 1 0.00 12.15 PASSED
WALL 4 6 1 1 0.00 19.53 PASSED
WALL 4 8 1 1 0.00 21.33 PASSED
WALL 4 17 1 1 0.00 20.21 PASSED
WALL 6 6 1 1 0.00 30.61 PASSED
WALL 6 8 1 1 0.00 32.67 PASSED
WALL 6 17 1 1 0.00 29.91 PASSED
WALL 10 6 1 1 0.00 72.87 PASSED
WALL 10 8 1 1 0.00 80.00 PASSED
WALL 10 17 1 1 0.00 88.67 PASSED
WALL 50 6 1 1 0.01 408.57 PASSED
WALL 50 8 1 1 0.01 428.08 PASSED
WALL 50 17 1 1 0.00 481.49 PASSED
WALL 1 6 2 2 0.00 0.90 PASSED
WALL 1 8 2 2 0.00 1.50 PASSED
WALL 1 17 2 2 0.00 1.50 PASSED
WALL 2 6 2 2 0.00 1.22 PASSED
WALL 2 8 2 2 0.00 2.53 PASSED
WALL 2 17 2 2 0.00 2.62 PASSED
WALL 3 6 2 2 0.00 1.65 PASSED
WALL 3 8 2 2 0.00 2.09 PASSED
WALL 3 17 2 2 0.00 2.05 PASSED
WALL 4 6 2 2 0.00 4.04 PASSED
WALL 4 8 2 2 0.00 4.13 PASSED
WALL 4 17 2 2 0.00 4.07 PASSED
WALL 6 6 2 2 0.00 6.81 PASSED
WALL 6 8 2 2 0.00 7.51 PASSED
WALL 6 17 2 2 0.00 7.64 PASSED
0 - MPI_RECV : Invalid buffer pointer
2 - MPI_RECV : Invalid buffer pointer
[2] [] Aborting Program!
[0] [] Aborting Program!
Abort signaled by rank 0: Aborting program !
Exit code -3 signaled from compute-00-02
Killing remote processes...Abort signaled by rank 2: Aborting program !
MPI process terminated unexpectedly
DONE
[mbozzore at compute-00-02 TESTING]$ Signal 15 received.
Signal 15 received.
--------------------------------------------------------------------------
And xznep:
--------------------------------------------------------------------------
[mbozzore at compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xznep
ScaLAPACK QSQ^H by Schur Decomposition.
'MPI machine'
Tests of the parallel complex double precision Schur decomposition.
The following scaled residual checks will be computed:
Residual = ||H-QSQ^H|| / (||H|| * eps * N )
Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
The matrix A is randomly generated for each test.
An explanation of the input/output parameters follows:
TIME : Indicates whether WALL or CPU time was used.
N : The number of columns in the matrix A.
NB : The size of the square blocks the matrix A is split into.
P : The number of process rows.
Q : The number of process columns.
THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED
NEP time : Time in seconds to decompose the matrix
MFLOPS : Rate of execution
The following parameter values will be used:
N : 1 2 3 4 6 10 50
NB : 6 8 17
P : 1 2
Q : 1 2
Relative machine precision (eps) is taken to be 0.111022E-15
Routines pass computational tests if scaled residual is less than 20.000
TIME N NB P Q NEP Time MFLOPS CHECK
---- ----- --- ---- ---- -------- -------- ------
WALL 1 6 1 1 0.00 1.50 PASSED
WALL 1 8 1 1 0.00 18.00 PASSED
WALL 1 17 1 1 0.00 18.00 PASSED
WALL 2 6 1 1 0.00 2.15 PASSED
WALL 2 8 1 1 0.00 16.00 PASSED
WALL 2 17 1 1 0.00 16.00 PASSED
WALL 3 6 1 1 0.00 3.80 PASSED
WALL 3 8 1 1 0.00 8.10 PASSED
WALL 3 17 1 1 0.00 8.24 PASSED
WALL 4 6 1 1 0.00 14.22 PASSED
WALL 4 8 1 1 0.00 15.16 PASSED
WALL 4 17 1 1 0.00 15.16 PASSED
WALL 6 6 1 1 0.00 23.01 PASSED
WALL 6 8 1 1 0.00 23.71 PASSED
WALL 6 17 1 1 0.00 22.74 PASSED
WALL 10 6 1 1 0.00 46.39 PASSED
WALL 10 8 1 1 0.00 51.14 PASSED
WALL 10 17 1 1 0.00 55.56 PASSED
WALL 50 6 1 1 0.01 263.62 PASSED
WALL 50 8 1 1 0.01 283.73 PASSED
WALL 50 17 1 1 0.01 328.23 PASSED
WALL 1 6 2 2 0.00 0.90 PASSED
WALL 1 8 2 2 0.00 1.50 PASSED
WALL 1 17 2 2 0.00 1.50 PASSED
WALL 2 6 2 2 0.00 1.04 PASSED
WALL 2 8 2 2 0.00 2.48 PASSED
WALL 2 17 2 2 0.00 2.53 PASSED
WALL 3 6 2 2 0.00 1.28 PASSED
WALL 3 8 2 2 0.00 1.51 PASSED
WALL 3 17 2 2 0.00 1.51 PASSED
WALL 4 6 2 2 0.00 2.92 PASSED
WALL 4 8 2 2 0.00 2.98 PASSED
WALL 4 17 2 2 0.00 2.95 PASSED
WALL 6 6 2 2 0.00 6.12 PASSED
WALL 6 8 2 2 0.00 6.52 PASSED
WALL 6 17 2 2 0.00 6.92 PASSED
0 - MPI_RECV : Invalid buffer pointer
2 - MPI_RECV : Invalid buffer pointer
[2] [] Aborting Program!
[0] [] Aborting Program!
Abort signaled by rank 2: Aborting program !
Abort signaled by rank 0: Aborting program !
Exit code -3 signaled from compute-00-02
Killing remote processes...MPI process terminated unexpectedly
DONE
[mbozzore at compute-00-02 TESTING]$ Signal 15 received.
Signal 15 received.
--------------------------------------------------------------------------
My Bmake.inc and SLmake.inc are attached to this email.
Note: mpich 1.27p1, Open MPI 1.2.4 (IB) and Open MPI 1.2.5 (IB) are OK.
For example:
--------------------------------------------------------------------------
[mbozzore at compute-00-02 openmpi1.2.5]$ ompi_info | less
Open MPI: 1.2.5
Open MPI SVN revision: r16989
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[mbozzore at compute-00-02 openmpi1.2.5]$ mpirun -np 4 --machinefile ./hosts --mca btl openib,self ./xcnep
ScaLAPACK QSQ^H by Schur Decomposition.
'MPI machine'
Tests of the parallel complex single precision Schur decomposition.
The following scaled residual checks will be computed:
Residual = ||H-QSQ^H|| / (||H|| * eps * N )
Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
The matrix A is randomly generated for each test.
An explanation of the input/output parameters follows:
TIME : Indicates whether WALL or CPU time was used.
N : The number of columns in the matrix A.
NB : The size of the square blocks the matrix A is split into.
P : The number of process rows.
Q : The number of process columns.
THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED
NEP time : Time in seconds to decompose the matrix
MFLOPS : Rate of execution
The following parameter values will be used:
N : 1 2 3 4 6 10 50
NB : 6 8 17
P : 1 2
Q : 1 2
Relative machine precision (eps) is taken to be 0.596046E-07
Routines pass computational tests if scaled residual is less than 20.000
TIME N NB P Q NEP Time MFLOPS CHECK
---- ----- --- ---- ---- -------- -------- ------
WALL 1 6 1 1 0.00 1.51 PASSED
WALL 1 8 1 1 0.00 18.87 PASSED
WALL 1 17 1 1 0.00 18.87 PASSED
WALL 2 6 1 1 0.00 1.87 PASSED
WALL 2 8 1 1 0.00 10.24 PASSED
WALL 2 17 1 1 0.00 18.30 PASSED
WALL 3 6 1 1 0.00 4.19 PASSED
WALL 3 8 1 1 0.00 11.08 PASSED
WALL 3 17 1 1 0.00 11.52 PASSED
WALL 4 6 1 1 0.00 18.58 PASSED
WALL 4 8 1 1 0.00 20.22 PASSED
WALL 4 17 1 1 0.00 20.13 PASSED
WALL 6 6 1 1 0.00 27.78 PASSED
WALL 6 8 1 1 0.00 29.49 PASSED
WALL 6 17 1 1 0.00 28.61 PASSED
WALL 10 6 1 1 0.00 66.17 PASSED
WALL 10 8 1 1 0.00 72.04 PASSED
WALL 10 17 1 1 0.00 81.09 PASSED
WALL 50 6 1 1 0.01 392.33 PASSED
WALL 50 8 1 1 0.01 409.76 PASSED
WALL 50 17 1 1 0.00 463.93 PASSED
WALL 1 6 2 2 0.00 0.31 PASSED
WALL 1 8 2 2 0.00 0.72 PASSED
WALL 1 17 2 2 0.00 0.75 PASSED
WALL 2 6 2 2 0.00 0.76 PASSED
WALL 2 8 2 2 0.00 1.00 PASSED
WALL 2 17 2 2 0.00 1.11 PASSED
WALL 3 6 2 2 0.00 0.55 PASSED
WALL 3 8 2 2 0.00 1.12 PASSED
WALL 3 17 2 2 0.00 1.11 PASSED
WALL 4 6 2 2 0.00 2.20 PASSED
WALL 4 8 2 2 0.00 2.19 PASSED
WALL 4 17 2 2 0.00 2.19 PASSED
WALL 6 6 2 2 0.00 3.96 PASSED
WALL 6 8 2 2 0.00 4.13 PASSED
WALL 6 17 2 2 0.00 4.46 PASSED
WALL 10 6 2 2 0.02 1.16 PASSED
WALL 10 8 2 2 0.00 7.40 PASSED
WALL 10 17 2 2 0.00 13.08 PASSED
WALL 50 6 2 2 0.05 47.80 PASSED
WALL 50 8 2 2 0.04 57.51 PASSED
WALL 50 17 2 2 0.02 128.66 PASSED
Finished 42 tests, with the following results:
42 tests completed and passed residual checks.
0 tests completed and failed residual checks.
0 tests skipped because of illegal input values.
END OF TESTS.
--------------------------------------------------------------------------
Thanks,
Mehdi
Mehdi Bozzo-Rey
HPC Solution Developer
Platform OCS5
Platform computing
Phone: +1 905 948 4649
More information about the mvapich-discuss
mailing list