[mvapich-discuss] mvapich2-0.9.8 blacs problems

amith rajith mamidala mamidala at cse.ohio-state.edu
Thu Mar 22 11:01:04 EDT 2007


Hi Bas,

Thanks for letting us know about this. We are looking into this and will
get back to you soon,

Thanks,
Amith

On Thu, 22 Mar 2007, Bas van der Vlies wrote:

> Hello,
>
>   We have made a two simpler programs that does not use scalapack/blacs
> and also shows the same behavior. See attachments.
>
> Here are the result:
>   mvapich 0.9.8: No problems
>   mvapich 0.9.9 trunk: see below for errors
>   mvapich2 0.9.8 : see below for errors
>
>
> Regards and Hope this helps with diagnosing the problem
>
> =====================================================================
>   mvapich 0.9.9 trunk:
> duptest:
> ====================================================
> Running with 8 processes
> will do 100000 dups and frees
> ............................................0 - <NO ERROR MESSAGE> :
> Pointer conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> [0] [] Aborting Program!
> 4 - <NO ERROR MESSAGE> : Pointer conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> 2 - <NO ERROR MESSAGE> : Pointer conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> 6 - <NO ERROR MESSAGE> : Pointer conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> mpirun_rsh: Abort signaled from [0]
> [4] [] Aborting Program!
> [2] [] Aborting Program!
> [6] [] Aborting Program!
> done.
> ====================================================
>
> splittest:
> ====================================================
> bas at ib-r21n1:~/src/applications$ mpirun -np 8 ./a.out
>
> Running with 8 processes
> will do 100000 splits and frees
> ......................................0 - <NO ERROR MESSAGE> : Pointer
> conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> [0] [] Aborting Program!
> 6 - <NO ERROR MESSAGE> : Pointer conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> 2 - <NO ERROR MESSAGE> : Pointer conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> 4 - <NO ERROR MESSAGE> : Pointer conversions exhausted
> Too many MPI objects may have been passed to/from Fortran
> without being freed
> mpirun_rsh: Abort signaled from [0]
> [6] [] Aborting Program!
> [2] [] Aborting Program!
> [4] [] Aborting Program!
> done.
> ====================================================
>
>
> mvapich2 0.9.8:
>
> duptest:
> ====================================================
> as at ib-r21n1:~/src/applications$ mpiexec -n $nprocs  ./a.out
> Running with 8 processes
> will do 100000 dups and frees
> .Fatal error in MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=0, key=0,
> new_comm=0xb7f4d8a4) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000001, color=1, key=1,
> new_comm=0xb7f7a7bc) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=2, key=0,
> new_comm=0xb7f778a4) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000001, color=2, key=1,
> new_comm=0xb7ecf7bc) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=3, key=0,
> new_comm=0xb7f398a4) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000001, color=3, key=1,
> new_comm=0xb7f447bc) failed
> MPIR_Comm_create(90): Too many communicatorsrank 7 in job 1
> ib-r21n1.irc.sara.nl_8763   caused collective abort of all ranks
>    exit status of rank 7: killed by signal 9
> rank 6 in job 1  ib-r21n1.irc.sara.nl_8763   caused collective abort of
> all ranks
>    exit status of rank 6: killed by signal 9
> Fatal error in MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=1, key=0,
> new_comm=0xb7f708a4) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000001, color=0, key=1,
> new_comm=0xb7f4d7bc) failed
> MPIR_Comm_create(90): Too many communicatorsrank 5 in job 1
> ib-r21n1.irc.sara.nl_8763   caused collective abort of all ranks
>    exit status of rank 5: return code 13
> rank 4 in job 1  ib-r21n1.irc.sara.nl_8763   caused collective abort of
> all ranks
>    exit status of rank 4: killed by signal 9
> ====================================================
>
> splitest:
> ====================================================
> Running with 8 processes
> will do 100000 splits and frees
> .Fatal error in MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=0, key=0,
> new_comm=0xb7f2b8a4) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=0, key=0,
> new_comm=0xb7f258a4) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=1, key=0,
> new_comm=0xb7f168a4) failed
> MPIR_Comm_create(90): Too many communicatorsFatal error in
> MPI_Comm_split: Other MPI error, error stack:
> MPI_Comm_split(290).: MPI_Comm_split(comm=0x84000002, color=1, key=0,
> new_comm=0xb7f328a4) failed
> MPIR_Comm_create(90): Too many communicatorsrank 2 in job 3
> ib-r21n1.irc.sara.nl_8763   caused collective abort of all ranks
>    exit status of rank 2: killed by signal 9
> rank 1 in job 3  ib-r21n1.irc.sara.nl_8763   caused collective abort of
> all ranks
>    exit status of rank 1: killed by signal 9
> rank 0 in job 3  ib-r21n1.irc.sara.nl_8763   caused collective abort of
> all ranks
>    exit status of rank 0: killed by signal 9
> ====================================================
> --
> ********************************************************************
> *                                                                  *
> *  Bas van der Vlies                     e-mail: basv at sara.nl      *
> *  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
> *  Kruislaan 415                         fax:    +31 20 6683167    *
> *  1098 SJ Amsterdam                                               *
> *                                                                  *
> ********************************************************************
>



More information about the mvapich-discuss mailing list