[mvapich-discuss] FW: Possible bug: Segmentation fault with MPI_Reduce and MPI_IN_PLACE

Krishna Kandalla kandalla at cse.ohio-state.edu
Mon Nov 29 11:48:50 EST 2010


Hi Alexander,
            Thanks for the update. Good to know that the patch solved your
problem.

Regards,
Krishna

On Mon, Nov 29, 2010 at 11:18 AM, Alexander Alekhin <
alexander.alekhin at itseez.com> wrote:

> Hi Krishna,
>
> Thank you very much for your patch, it works.
>
> --
> Thanks,
> Alexander Alekhin
>
>
> Hi Alexander,
>                Thanks for confirming that your application works with this
> flag. I have generated a small patch against 1.5.1p1 that fixes this problem
> and it should allow you to run your application without having to use this
> flag. I have tested this patch with the sample code that you had provided
> and I have verified that it works. Please let us know if this patch solves
> your problem.
>
> Index: src/mpi/coll/reduce_osu.c
> ===================================================================
> --- src/mpi/coll/reduce_osu.c   (revision 4340)
> +++ src/mpi/coll/reduce_osu.c   (working copy)
> @@ -1014,8 +1014,13 @@
>                  } else{
>                      local_buf = (char*)shmem_buf + stride*local_rank;
>                      MPIR_Nest_incr();
> -                    mpi_errno = MPIR_Localcopy(sendbuf, count, datatype,
> local_buf,
> -                            count, datatype);
> +                    if(sendbuf != MPI_IN_PLACE) {
> +                        mpi_errno = MPIR_Localcopy(sendbuf, count,
> datatype, local_buf,
> +                               count, datatype);
> +                    } else {
> +                        mpi_errno = MPIR_Localcopy(recvbuf, count,
> datatype, local_buf,
> +                               count, datatype);
> +                    }
>                      MPIR_Nest_decr();
>                      MPIDI_CH3I_SHMEM_COLL_SetGatherComplete(local_size,
> local_rank, shmem_comm_rank);
>                  }
>
> Regards,
> Krishna
>
> On Wed, Nov 24, 2010 at 3:55 AM, Alexander Alekhin <
> alexander.alekhin at itseez.com> wrote:
>
>> Hi Krishna,
>>
>>
>>
>> I launched job with MV2_USE_SHMEM_REDUCE=0 flag and job was finished
>> successfully. I assume that this flag provides some performance degradation
>> on SMP systems.
>>
>>
>>
>> --
>>
>> Thanks,
>>
>> Alexander Alekhin
>>
>>
>>
>> *From:* krishna.kandalla at gmail.com [mailto:krishna.kandalla at gmail.com] *On
>> Behalf Of *Krishna Kandalla
>> *Sent:* Monday, November 22, 2010 10:28 PM
>> *To:* Alexander Alekhin
>> *Cc:* mvapich-discuss at cse.ohio-state.edu
>> *Subject:* Re: [mvapich-discuss] FW: Possible bug: Segmentation fault
>> with MPI_Reduce and MPI_IN_PLACE
>>
>>
>>
>> Hi Alexander,
>>
>>                   Thank you for reporting this error. Can you please try
>> running your application by setting the MV2_USE_SHMEM_REDUCE flag to 0.
>> You can find more information about this run-time variable at :
>>
>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6rc1.html
>>
>>
>>
>> Thanks,
>>
>> Krishna
>>
>> On Mon, Nov 22, 2010 at 11:06 AM, Alexander Alekhin <
>> alexander.alekhin at itseez.com> wrote:
>>
>> Hi,
>>
>>
>>
>> I use MVAPICH2 1.5.1p1 code from svn.
>> My mpiname info:
>> *MVAPICH2 1.5.1p1 Unofficial Build ch3:mrail*
>>
>> * *
>>
>> *Compilation*
>>
>> *CC: gcc  -g*
>>
>> *CXX: c++  -g*
>>
>> *F77: g77  -g*
>>
>> *F90: f95  -g*
>>
>>
>>
>> *Configuration*
>>
>> *--prefix=$HOME/mvapich2/install --enable-g=all
>> --enable-error-messages=all --enable-fast=none*
>>
>>
>>
>> My problem is in application failure which uses MPI_IN_PLACE with
>> MPI_Reduce operation.
>>
>>
>>
>> For example I provide this part of code which generate a segmentation
>> fault with 2 processes launched on same node:
>>
>>
>>
>> *// myid is rank in MPI_COMM_WORLD (see examples/cpi.c)*
>>
>> *{*
>>
>> *    MPI_Group g1, g2;*
>>
>> *    MPI_Comm comm;*
>>
>> *    int ranks[2] = { 1, 0 };*
>>
>> *    MPI_Comm_group(MPI_COMM_WORLD, &g1);*
>>
>> *    MPI_Group_incl(g1, 2, ranks, &g2);*
>>
>> * *
>>
>> *    MPI_Comm_create(MPI_COMM_WORLD, g2, &comm);*
>>
>> * *
>>
>> *    if (myid == 0) { // rank 1 of comm (root of Reduce)*
>>
>> *        int result = myid;*
>>
>> *        if (MPI_Reduce(MPI_IN_PLACE, &result, 1, MPI_INT, MPI_SUM, 1, comm) != MPI_SUCCESS) // fail is here*
>>
>> *            exit(1);*
>>
>> *    } else {*
>>
>> *        if (MPI_Reduce(&myid, NULL, 1, MPI_INT, MPI_SUM, 1, comm) != MPI_SUCCESS)*
>>
>> *            exit(1);*
>>
>> *    }*
>>
>> * *
>>
>> *    MPI_Comm_free(&comm);*
>>
>> *    MPI_Group_free(&g2);*
>>
>> *    MPI_Group_free(&g1);*
>>
>> *}*
>>
>>
>>
>> Command to launch:
>>
>> *mpiexec -np 2 -host <host_name> <binary_file>*
>>
>>
>>
>> GDB info:
>>
>> *0:  Program received signal SIGSEGV, Segmentation fault.*
>>
>> *0:  0x0000000000411a73 in MPIUI_Memcpy (dst=0x2aaaaadcfc0c,*
>>
>> *0:      src=0xffffffffffffffff, len=4) at ../../include/mpiimpl.h:122*
>>
>> *0:  122     memcpy(dst, src, len);*
>>
>> *0: (gdb) bt*
>>
>> *0:  #0  0x0000000000411a73 in MPIUI_Memcpy (dst=0x2aaaaadcfc0c,*
>>
>> *0:      src=0xffffffffffffffff, len=4) at ../../include/mpiimpl.h:122*
>>
>> *0:  #1  0x0000000000414eb3 in MPIR_Localcopy (sendbuf=0xffffffffffffffff,*
>>
>> *0:      sendcount=1, sendtype=1275069445, recvbuf=0x2aaaaadcfc0c, recvcount=1,*
>>
>> *0:      recvtype=1275069445) at helper_fns.c:335*
>>
>> *0:  #2  0x0000000000410d48 in PMPI_Reduce (sendbuf=0xffffffffffffffff,*
>>
>> *0:      recvbuf=0x7fffcb044bcc, count=1, datatype=1275069445, op=1476395011,*
>>
>> *0:      root=1, comm=-1006632960) at reduce_osu.c:1017*
>>
>>
>>
>> If I replace MPI_IN_PLACE to variable then all works fine.
>>
>>
>>
>> Can somebody check this problem?
>>
>>
>>
>> --
>>
>> Thanks,
>>
>> Alexander Alekhin
>>
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20101129/04db8d8f/attachment-0001.html


More information about the mvapich-discuss mailing list