[mvapich-discuss] RE: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer

Lei Chai chai.15 at osu.edu
Wed Sep 24 17:14:47 EDT 2008


For the information of the mailing list, the problem has been solved 
offline. It is fixed by the patch below. The patch has been checked in 
to the trunk version of mvapich, and will also be in future releases.

Lei

Patch:

Index: mpid/ch_gen2/mpid_hrecv.c
===================================================================
--- mpid/ch_gen2/mpid_hrecv.c   (revision 2989)
+++ mpid/ch_gen2/mpid_hrecv.c   (working copy)
@@ -118,20 +118,6 @@
     }


-    /* We have a non-contiguous buffer.
-     * Normally we would check for a null user buffer inside
-     * MPID_VIA_Irecv, but in this case we will pass the allocated
-     * buffer, not the user buffer, so check the user buffer
-     * here.
-     */
-
-    if (Is_MPI_Bottom(buf, count, dtype_ptr)) {
-        /* do not have to adjust ptr here */
-    } else if (buf == 0 && count > 0) {
-        *error_code = MPI_ERR_BUFFER;
-        return;
-    }
-
     /* Increment reference count for this type
      */
     MPIR_Type_dup(dtype_ptr);

===========================================================


Mehdi Bozzo-Rey wrote:
> >From what I can see in the archive (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001888.html), there is something missing, so I resend my original email.
>
> Mehdi
>
>
> =======================================
> From: Mehdi Bozzo-Rey 
> Sent: September-08-08 7:48 AM
> To: 'mvapich-discuss at cse.ohio-state.edu'
> Subject: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer
>
> Hello,
>
> I recompiled mvapich 1.0.1, BLACS and ScaLAPACK.
>
> - I am able to run the tests included in the BLACS distribution
> - I am able to run most of the tests included in the ScaLAPACK distribution, except xcnep and xznep. They fail with the following errors:
>
> Do you have any idea what could be the root cause ?
>
> xcnep:
>
> --------------------------------------------------------------------------
> [mbozzore at compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xcnep
> ScaLAPACK QSQ^H by Schur Decomposition.
> 'MPI machine'
>
> Tests of the parallel complex single precision Schur decomposition.
> The following scaled residual checks will be computed:
>  Residual               = ||H-QSQ^H|| / (||H|| * eps * N )
>  Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
> The matrix A is randomly generated for each test.
>
> An explanation of the input/output parameters follows:
> TIME    : Indicates whether WALL or CPU time was used.
> N       : The number of columns in the matrix A.
> NB      : The size of the square blocks the matrix A is split into.
> P       : The number of process rows.
> Q       : The number of process columns.
> THRESH  : If a residual value is less than THRESH, CHECK is flagged as PASSED
> NEP time : Time in seconds to decompose the  matrix
> MFLOPS  : Rate of execution
>
> The following parameter values will be used:
>   N       :             1     2     3     4     6    10    50
>   NB      :             6     8    17
>   P       :             1     2
>   Q       :             1     2
>
> Relative machine precision (eps) is taken to be       0.596046E-07
> Routines pass computational tests if scaled residual is less than   20.000
>
> TIME     N  NB    P    Q NEP Time   MFLOPS  CHECK
> ---- ----- --- ---- ---- -------- -------- ------
>
> WALL     1   6    1    1     0.00     1.06 PASSED
> WALL     1   8    1    1     0.00    18.00 PASSED
> WALL     1  17    1    1     0.00     9.00 PASSED
> WALL     2   6    1    1     0.00     2.77 PASSED
> WALL     2   8    1    1     0.00    20.57 PASSED
> WALL     2  17    1    1     0.00    20.57 PASSED
> WALL     3   6    1    1     0.00     4.26 PASSED
> WALL     3   8    1    1     0.00    12.15 PASSED
> WALL     3  17    1    1     0.00    12.15 PASSED
> WALL     4   6    1    1     0.00    19.53 PASSED
> WALL     4   8    1    1     0.00    21.33 PASSED
> WALL     4  17    1    1     0.00    20.21 PASSED
> WALL     6   6    1    1     0.00    30.61 PASSED
> WALL     6   8    1    1     0.00    32.67 PASSED
> WALL     6  17    1    1     0.00    29.91 PASSED
> WALL    10   6    1    1     0.00    72.87 PASSED
> WALL    10   8    1    1     0.00    80.00 PASSED
> WALL    10  17    1    1     0.00    88.67 PASSED
> WALL    50   6    1    1     0.01   408.57 PASSED
> WALL    50   8    1    1     0.01   428.08 PASSED
> WALL    50  17    1    1     0.00   481.49 PASSED
> WALL     1   6    2    2     0.00     0.90 PASSED
> WALL     1   8    2    2     0.00     1.50 PASSED
> WALL     1  17    2    2     0.00     1.50 PASSED
> WALL     2   6    2    2     0.00     1.22 PASSED
> WALL     2   8    2    2     0.00     2.53 PASSED
> WALL     2  17    2    2     0.00     2.62 PASSED
> WALL     3   6    2    2     0.00     1.65 PASSED
> WALL     3   8    2    2     0.00     2.09 PASSED
> WALL     3  17    2    2     0.00     2.05 PASSED
> WALL     4   6    2    2     0.00     4.04 PASSED
> WALL     4   8    2    2     0.00     4.13 PASSED
> WALL     4  17    2    2     0.00     4.07 PASSED
> WALL     6   6    2    2     0.00     6.81 PASSED
> WALL     6   8    2    2     0.00     7.51 PASSED
> WALL     6  17    2    2     0.00     7.64 PASSED
> 0 - MPI_RECV : Invalid buffer pointer
> 2 - MPI_RECV : Invalid buffer pointer
> [2] [] Aborting Program!
> [0] [] Aborting Program!
> Abort signaled by rank 0:  Aborting program !
> Exit code -3 signaled from compute-00-02
> Killing remote processes...Abort signaled by rank 2:  Aborting program !
> MPI process terminated unexpectedly
> DONE
> [mbozzore at compute-00-02 TESTING]$ Signal 15 received.
> Signal 15 received.
> --------------------------------------------------------------------------
>
>
> And xznep:
>
> --------------------------------------------------------------------------
> [mbozzore at compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xznep
> ScaLAPACK QSQ^H by Schur Decomposition.
> 'MPI machine'
>
> Tests of the parallel complex double precision Schur decomposition.
> The following scaled residual checks will be computed:
>  Residual               = ||H-QSQ^H|| / (||H|| * eps * N )
>  Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
> The matrix A is randomly generated for each test.
>
> An explanation of the input/output parameters follows:
> TIME    : Indicates whether WALL or CPU time was used.
> N       : The number of columns in the matrix A.
> NB      : The size of the square blocks the matrix A is split into.
> P       : The number of process rows.
> Q       : The number of process columns.
> THRESH  : If a residual value is less than THRESH, CHECK is flagged as PASSED
> NEP time : Time in seconds to decompose the  matrix
> MFLOPS  : Rate of execution
>
> The following parameter values will be used:
>   N       :             1     2     3     4     6    10    50
>   NB      :             6     8    17
>   P       :             1     2
>   Q       :             1     2
>
> Relative machine precision (eps) is taken to be       0.111022E-15
> Routines pass computational tests if scaled residual is less than   20.000
>
> TIME     N  NB    P    Q NEP Time   MFLOPS  CHECK
> ---- ----- --- ---- ---- -------- -------- ------
>
> WALL     1   6    1    1     0.00     1.50 PASSED
> WALL     1   8    1    1     0.00    18.00 PASSED
> WALL     1  17    1    1     0.00    18.00 PASSED
> WALL     2   6    1    1     0.00     2.15 PASSED
> WALL     2   8    1    1     0.00    16.00 PASSED
> WALL     2  17    1    1     0.00    16.00 PASSED
> WALL     3   6    1    1     0.00     3.80 PASSED
> WALL     3   8    1    1     0.00     8.10 PASSED
> WALL     3  17    1    1     0.00     8.24 PASSED
> WALL     4   6    1    1     0.00    14.22 PASSED
> WALL     4   8    1    1     0.00    15.16 PASSED
> WALL     4  17    1    1     0.00    15.16 PASSED
> WALL     6   6    1    1     0.00    23.01 PASSED
> WALL     6   8    1    1     0.00    23.71 PASSED
> WALL     6  17    1    1     0.00    22.74 PASSED
> WALL    10   6    1    1     0.00    46.39 PASSED
> WALL    10   8    1    1     0.00    51.14 PASSED
> WALL    10  17    1    1     0.00    55.56 PASSED
> WALL    50   6    1    1     0.01   263.62 PASSED
> WALL    50   8    1    1     0.01   283.73 PASSED
> WALL    50  17    1    1     0.01   328.23 PASSED
> WALL     1   6    2    2     0.00     0.90 PASSED
> WALL     1   8    2    2     0.00     1.50 PASSED
> WALL     1  17    2    2     0.00     1.50 PASSED
> WALL     2   6    2    2     0.00     1.04 PASSED
> WALL     2   8    2    2     0.00     2.48 PASSED
> WALL     2  17    2    2     0.00     2.53 PASSED
> WALL     3   6    2    2     0.00     1.28 PASSED
> WALL     3   8    2    2     0.00     1.51 PASSED
> WALL     3  17    2    2     0.00     1.51 PASSED
> WALL     4   6    2    2     0.00     2.92 PASSED
> WALL     4   8    2    2     0.00     2.98 PASSED
> WALL     4  17    2    2     0.00     2.95 PASSED
> WALL     6   6    2    2     0.00     6.12 PASSED
> WALL     6   8    2    2     0.00     6.52 PASSED
> WALL     6  17    2    2     0.00     6.92 PASSED
> 0 - MPI_RECV : Invalid buffer pointer
> 2 - MPI_RECV : Invalid buffer pointer
> [2] [] Aborting Program!
> [0] [] Aborting Program!
> Abort signaled by rank 2:  Aborting program !
> Abort signaled by rank 0:  Aborting program !
> Exit code -3 signaled from compute-00-02
> Killing remote processes...MPI process terminated unexpectedly
> DONE
> [mbozzore at compute-00-02 TESTING]$ Signal 15 received.
> Signal 15 received.
> --------------------------------------------------------------------------
>
>
>
> My Bmake.inc and SLmake.inc are attached to this email.
>
> Note: mpich 1.27p1, Open MPI 1.2.4 (IB) and Open MPI 1.2.5 (IB) are OK.
>
> For example:
> --------------------------------------------------------------------------
> [mbozzore at compute-00-02 openmpi1.2.5]$ ompi_info | less
>                 Open MPI: 1.2.5
>    Open MPI SVN revision: r16989
> --------------------------------------------------------------------------
>
>
> --------------------------------------------------------------------------
> [mbozzore at compute-00-02 openmpi1.2.5]$ mpirun -np 4 --machinefile ./hosts --mca btl openib,self ./xcnep
> ScaLAPACK QSQ^H by Schur Decomposition.
> 'MPI machine'
>
> Tests of the parallel complex single precision Schur decomposition.
> The following scaled residual checks will be computed:
>  Residual               = ||H-QSQ^H|| / (||H|| * eps * N )
>  Orthogonality residual = ||I - Q^HQ|| / ( eps * N )
> The matrix A is randomly generated for each test.
>
> An explanation of the input/output parameters follows:
> TIME    : Indicates whether WALL or CPU time was used.
> N       : The number of columns in the matrix A.
> NB      : The size of the square blocks the matrix A is split into.
> P       : The number of process rows.
> Q       : The number of process columns.
> THRESH  : If a residual value is less than THRESH, CHECK is flagged as PASSED
> NEP time : Time in seconds to decompose the  matrix
> MFLOPS  : Rate of execution
>
> The following parameter values will be used:
>   N       :             1     2     3     4     6    10    50
>   NB      :             6     8    17
>   P       :             1     2
>   Q       :             1     2
>
> Relative machine precision (eps) is taken to be       0.596046E-07
> Routines pass computational tests if scaled residual is less than   20.000
>
> TIME     N  NB    P    Q NEP Time   MFLOPS  CHECK
> ---- ----- --- ---- ---- -------- -------- ------
>
> WALL     1   6    1    1     0.00     1.51 PASSED
> WALL     1   8    1    1     0.00    18.87 PASSED
> WALL     1  17    1    1     0.00    18.87 PASSED
> WALL     2   6    1    1     0.00     1.87 PASSED
> WALL     2   8    1    1     0.00    10.24 PASSED
> WALL     2  17    1    1     0.00    18.30 PASSED
> WALL     3   6    1    1     0.00     4.19 PASSED
> WALL     3   8    1    1     0.00    11.08 PASSED
> WALL     3  17    1    1     0.00    11.52 PASSED
> WALL     4   6    1    1     0.00    18.58 PASSED
> WALL     4   8    1    1     0.00    20.22 PASSED
> WALL     4  17    1    1     0.00    20.13 PASSED
> WALL     6   6    1    1     0.00    27.78 PASSED
> WALL     6   8    1    1     0.00    29.49 PASSED
> WALL     6  17    1    1     0.00    28.61 PASSED
> WALL    10   6    1    1     0.00    66.17 PASSED
> WALL    10   8    1    1     0.00    72.04 PASSED
> WALL    10  17    1    1     0.00    81.09 PASSED
> WALL    50   6    1    1     0.01   392.33 PASSED
> WALL    50   8    1    1     0.01   409.76 PASSED
> WALL    50  17    1    1     0.00   463.93 PASSED
> WALL     1   6    2    2     0.00     0.31 PASSED
> WALL     1   8    2    2     0.00     0.72 PASSED
> WALL     1  17    2    2     0.00     0.75 PASSED
> WALL     2   6    2    2     0.00     0.76 PASSED
> WALL     2   8    2    2     0.00     1.00 PASSED
> WALL     2  17    2    2     0.00     1.11 PASSED
> WALL     3   6    2    2     0.00     0.55 PASSED
> WALL     3   8    2    2     0.00     1.12 PASSED
> WALL     3  17    2    2     0.00     1.11 PASSED
> WALL     4   6    2    2     0.00     2.20 PASSED
> WALL     4   8    2    2     0.00     2.19 PASSED
> WALL     4  17    2    2     0.00     2.19 PASSED
> WALL     6   6    2    2     0.00     3.96 PASSED
> WALL     6   8    2    2     0.00     4.13 PASSED
> WALL     6  17    2    2     0.00     4.46 PASSED
> WALL    10   6    2    2     0.02     1.16 PASSED
> WALL    10   8    2    2     0.00     7.40 PASSED
> WALL    10  17    2    2     0.00    13.08 PASSED
> WALL    50   6    2    2     0.05    47.80 PASSED
> WALL    50   8    2    2     0.04    57.51 PASSED
> WALL    50  17    2    2     0.02   128.66 PASSED
>
> Finished     42 tests, with the following results:
>    42 tests completed and passed residual checks.
>     0 tests completed and failed residual checks.
>     0 tests skipped because of illegal input values.
>
>
> END OF TESTS.
> --------------------------------------------------------------------------
>
>
> Thanks,
>
> Mehdi
>
>
> Mehdi Bozzo-Rey
> HPC Solution Developer
> Platform OCS5
> Platform computing
> Phone: +1 905 948 4649
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>   



More information about the mvapich-discuss mailing list