[mvapich-discuss] mvapich2-0.9.8 blacs problems (Another)

amith rajith mamidala mamidala at cse.ohio-state.edu
Mon Mar 26 15:46:37 EDT 2007


Hi Bas,

Attached is the patch which resolves this error with respect to the
number of communicators/groups that are created. The
program runs fine with the attached patch. We are also investigating this in more depth and
will get back to you if we see any more issues,

Thanks,
Amith


On Mon, 26 Mar 2007, Bas van der Vlies wrote:

> Hello,
>
>   We still have problems with mvapich2 0.9.8 + patches and blacs. Here
> is another file attached, build/run command:
>
> {{{
> mpif90  -o pdgemr2dtest.$brand -ff2c -Wall -g pdgemr2dtest.f90
> -lscalapack -lfblacs -lcblacs -lblacs -llapack -latlas
>
> echo 310 16 1000 | mpiexec -n $nprocs <program_name>
> }}}
>
>
> The problems always occurs if we do many loops. It will consume more and
> more memory and then it crash with the following error:
> {{{
>
> loop n mb nprocs npcol nprow   83  310   16    8    4    2
> loop n mb nprocs npcol nprow   84  310   16    8    4    2
> loop n mb nprocs npcol nprow   85  310   16    8    4    2
> rank 7 in job 1  ib-r6n18.irc.sara.nl_7000   caused collective abort of
> all ranks
>    exit status of rank 7: killed by signal 9
> rank 6 in job 1  ib-r6n18.irc.sara.nl_7000   caused collective abort of
> all ranks
>    exit status of rank 6: killed by signal 9
> rank 4 in job 1  ib-r6n18.irc.sara.nl_7000   caused collective abort of
> all ranks
>    exit status of rank 4: killed by signal 9
> }}}
>
>
>
> --
> ********************************************************************
> *                                                                  *
> *  Bas van der Vlies                     e-mail: basv at sara.nl      *
> *  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
> *  Kruislaan 415                         fax:    +31 20 6683167    *
> *  1098 SJ Amsterdam                                               *
> *                                                                  *
> ********************************************************************
>

-------------- next part --------------
Index: create_2level_comm.c
===================================================================
--- create_2level_comm.c        (revision 1120)
+++ create_2level_comm.c        (working copy)
@@ -163,6 +163,8 @@
     else{
         comm_ptr->shmem_coll_ok = 0;
        free_2level_comm(comm_ptr);
+       MPI_Group_free(&subgroup1);
+       MPI_Group_free(&comm_group);
     }



More information about the mvapich-discuss mailing list