[mvapich-discuss] FW: Possible bug: Segmentation fault with
MPI_Reduce and MPI_IN_PLACE
Alexander Alekhin
alexander.alekhin at itseez.com
Mon Nov 29 11:18:08 EST 2010
Hi Krishna,
Thank you very much for your patch, it works.
--
Thanks,
Alexander Alekhin
> Hi Alexander, Thanks for confirming that your application
> works with this flag. I have generated a small patch against 1.5.1p1
> that fixes >this problem and it should allow you to run your application
> without having to use this flag. I have tested this patch with the
> sample >code that you had provided and I have verified that it works.
> Please let us know if this patch solves your problem.
> Index: src/mpi/coll/reduce_osu.c
> ===================================================================
> --- src/mpi/coll/reduce_osu.c (revision 4340)
> +++ src/mpi/coll/reduce_osu.c (working copy)
> @@ -1014,8 +1014,13 @@
> } else{
> local_buf = (char*)shmem_buf + stride*local_rank;
> MPIR_Nest_incr();
> - mpi_errno = MPIR_Localcopy(sendbuf, count,
> datatype, local_buf,
> - count, datatype);
> + if(sendbuf != MPI_IN_PLACE) {
> + mpi_errno = MPIR_Localcopy(sendbuf, count,
> datatype, local_buf,
> + count, datatype);
> + } else {
> + mpi_errno = MPIR_Localcopy(recvbuf, count,
> datatype, local_buf,
> + count, datatype);
> + }
> MPIR_Nest_decr();
> MPIDI_CH3I_SHMEM_COLL_SetGatherComplete(local_size,
> local_rank, shmem_comm_rank);
> }
>
> Regards,
> Krishna
> On Wed, Nov 24, 2010 at 3:55 AM, Alexander Alekhin
> <alexander.alekhin at itseez.com> wrote:
>>
>> Hi Krishna,
>>
>>
>> I launched job with MV2_USE_SHMEM_REDUCE=0 flag and job was finished
>> successfully. I assume that this flag >>provides some performance
>> degradation on SMP systems.
>>
>>
>> --
>>
>> Thanks,
>>
>> Alexander Alekhin
>>
>>
>> From: krishna.kandalla at gmail.com [mailto:krishna.kandalla at gmail.com] On
>> Behalf Of Krishna Kandalla
>> Sent: Monday, November 22, 2010 10:28 PM
>> To: Alexander Alekhin
>> Cc: mvapich-discuss at cse.ohio-state.edu
>> Subject: Re: [mvapich-discuss] FW: Possible bug: Segmentation fault
>> with MPI_Reduce and MPI_IN_PLACE
>>
>>
>> Hi Alexander,
>> Thank you for reporting this error. Can you please try
>> running your application by setting the MV2_USE_SHMEM_REDUCE >>flag to
>> 0. You can find more information about this run-time variable at :
>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6rc1.html
>>
>>
>> Thanks,
>>
>> Krishna
>>
>> On Mon, Nov 22, 2010 at 11:06 AM, Alexander Alekhin
>> <alexander.alekhin at itseez.com> wrote:
>>
>> Hi,
>>
>>
>> I use MVAPICH2 1.5.1p1 code from svn.
>> My mpiname info:
>> MVAPICH2 1.5.1p1 Unofficial Build ch3:mrail
>>
>>
>> Compilation
>>
>> CC: gcc -g
>>
>> CXX: c++ -g
>>
>> F77: g77 -g
>>
>> F90: f95 -g
>>
>>
>> Configuration
>>
>> --prefix=$HOME/mvapich2/install --enable-g=all
>> --enable-error-messages=all --enable-fast=none
>>
>>
>> My problem is in application failure which uses MPI_IN_PLACE with
>> MPI_Reduce operation.
>>
>>
>> For example I provide this part of code which generate a segmentation
>> fault with 2 processes launched on same node:
>>
>>// myid is rank in MPI_COMM_WORLD (see examples/cpi.c)
>> {
>> MPI_Group g1, g2;
>> MPI_Comm comm;
>> int ranks[2] = { 1, 0 };
>> MPI_Comm_group(MPI_COMM_WORLD, &g1);
>> MPI_Group_incl(g1, 2, ranks, &g2);
>> MPI_Comm_create(MPI_COMM_WORLD, g2, &comm);
>> if (myid == 0) { // rank 1 of comm (root of Reduce)
>> int result = myid;
>> if (MPI_Reduce(MPI_IN_PLACE, &result, 1, MPI_INT, MPI_SUM, 1,
>> comm) != MPI_SUCCESS) // fail is here
>> exit(1);
>> } else {
>> if (MPI_Reduce(&myid, NULL, 1, MPI_INT, MPI_SUM, 1, comm) !=
>> MPI_SUCCESS)
>> exit(1);
>> }
>> MPI_Comm_free(&comm);
>> MPI_Group_free(&g2);
>> MPI_Group_free(&g1);
>> }
>>
>>
>> Command to launch:
>> mpiexec -np 2 -host <host_name> <binary_file>
>>
>>
>> GDB info:
>> 0: Program received signal SIGSEGV, Segmentation fault.
>> 0: 0x0000000000411a73 in MPIUI_Memcpy (dst=0x2aaaaadcfc0c,
>> 0: src=0xffffffffffffffff, len=4) at ../../include/mpiimpl.h:122
>> 0: 122 memcpy(dst, src, len);
>> 0: (gdb) bt
>> 0: #0 0x0000000000411a73 in MPIUI_Memcpy (dst=0x2aaaaadcfc0c,
>> 0: src=0xffffffffffffffff, len=4) at ../../include/mpiimpl.h:122
>> 0: #1 0x0000000000414eb3 in MPIR_Localcopy
>> (sendbuf=0xffffffffffffffff,
>> 0: sendcount=1, sendtype=1275069445, recvbuf=0x2aaaaadcfc0c,
>> recvcount=1,
>> 0: recvtype=1275069445) at helper_fns.c:335
>> 0: #2 0x0000000000410d48 in PMPI_Reduce (sendbuf=0xffffffffffffffff,
>> 0: recvbuf=0x7fffcb044bcc, count=1, datatype=1275069445,
>> op=1476395011,
>> 0: root=1, comm=-1006632960) at reduce_osu.c:1017
>>
>>
>> If I replace MPI_IN_PLACE to variable then all works fine.
>>
>>
>> Can somebody check this problem?
>>
>>
>> --
>>
>> Thanks,
>>
>> Alexander Alekhin
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
-------------- next part --------------
Skipped content of type multipart/related
More information about the mvapich-discuss
mailing list