[mvapich-discuss] FW: Possible bug: Segmentation fault with
MPI_Reduce and MPI_IN_PLACE
Krishna Kandalla
kandalla at cse.ohio-state.edu
Wed Nov 24 11:39:09 EST 2010
Hi Alexander,
Thanks for confirming that your application works with this
flag. I have generated a small patch against 1.5.1p1 that fixes this problem
and it should allow you to run your application without having to use this
flag. I have tested this patch with the sample code that you had provided
and I have verified that it works. Please let us know if this patch solves
your problem.
Index: src/mpi/coll/reduce_osu.c
===================================================================
--- src/mpi/coll/reduce_osu.c (revision 4340)
+++ src/mpi/coll/reduce_osu.c (working copy)
@@ -1014,8 +1014,13 @@
} else{
local_buf = (char*)shmem_buf + stride*local_rank;
MPIR_Nest_incr();
- mpi_errno = MPIR_Localcopy(sendbuf, count, datatype,
local_buf,
- count, datatype);
+ if(sendbuf != MPI_IN_PLACE) {
+ mpi_errno = MPIR_Localcopy(sendbuf, count,
datatype, local_buf,
+ count, datatype);
+ } else {
+ mpi_errno = MPIR_Localcopy(recvbuf, count,
datatype, local_buf,
+ count, datatype);
+ }
MPIR_Nest_decr();
MPIDI_CH3I_SHMEM_COLL_SetGatherComplete(local_size,
local_rank, shmem_comm_rank);
}
Regards,
Krishna
On Wed, Nov 24, 2010 at 3:55 AM, Alexander Alekhin <
alexander.alekhin at itseez.com> wrote:
> Hi Krishna,
>
>
>
> I launched job with MV2_USE_SHMEM_REDUCE=0 flag and job was finished
> successfully. I assume that this flag provides some performance degradation
> on SMP systems.
>
>
>
> --
>
> Thanks,
>
> Alexander Alekhin
>
>
>
> *From:* krishna.kandalla at gmail.com [mailto:krishna.kandalla at gmail.com] *On
> Behalf Of *Krishna Kandalla
> *Sent:* Monday, November 22, 2010 10:28 PM
> *To:* Alexander Alekhin
> *Cc:* mvapich-discuss at cse.ohio-state.edu
> *Subject:* Re: [mvapich-discuss] FW: Possible bug: Segmentation fault with
> MPI_Reduce and MPI_IN_PLACE
>
>
>
> Hi Alexander,
>
> Thank you for reporting this error. Can you please try
> running your application by setting the MV2_USE_SHMEM_REDUCE flag to 0.
> You can find more information about this run-time variable at :
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6rc1.html
>
>
>
> Thanks,
>
> Krishna
>
> On Mon, Nov 22, 2010 at 11:06 AM, Alexander Alekhin <
> alexander.alekhin at itseez.com> wrote:
>
> Hi,
>
>
>
> I use MVAPICH2 1.5.1p1 code from svn.
> My mpiname info:
> *MVAPICH2 1.5.1p1 Unofficial Build ch3:mrail*
>
> * *
>
> *Compilation*
>
> *CC: gcc -g*
>
> *CXX: c++ -g*
>
> *F77: g77 -g*
>
> *F90: f95 -g*
>
>
>
> *Configuration*
>
> *--prefix=$HOME/mvapich2/install --enable-g=all
> --enable-error-messages=all --enable-fast=none*
>
>
>
> My problem is in application failure which uses MPI_IN_PLACE with
> MPI_Reduce operation.
>
>
>
> For example I provide this part of code which generate a segmentation
> fault with 2 processes launched on same node:
>
>
>
> *// myid is rank in MPI_COMM_WORLD (see examples/cpi.c)*
>
> *{*
>
> * MPI_Group g1, g2;*
>
> * MPI_Comm comm;*
>
> * int ranks[2] = { 1, 0 };*
>
> * MPI_Comm_group(MPI_COMM_WORLD, &g1);*
>
> * MPI_Group_incl(g1, 2, ranks, &g2);*
>
> * *
>
> * MPI_Comm_create(MPI_COMM_WORLD, g2, &comm);*
>
> * *
>
> * if (myid == 0) { // rank 1 of comm (root of Reduce)*
>
> * int result = myid;*
>
> * if (MPI_Reduce(MPI_IN_PLACE, &result, 1, MPI_INT, MPI_SUM, 1, comm) != MPI_SUCCESS) // fail is here*
>
> * exit(1);*
>
> * } else {*
>
> * if (MPI_Reduce(&myid, NULL, 1, MPI_INT, MPI_SUM, 1, comm) != MPI_SUCCESS)*
>
> * exit(1);*
>
> * }*
>
> * *
>
> * MPI_Comm_free(&comm);*
>
> * MPI_Group_free(&g2);*
>
> * MPI_Group_free(&g1);*
>
> *}*
>
>
>
> Command to launch:
>
> *mpiexec -np 2 -host <host_name> <binary_file>*
>
>
>
> GDB info:
>
> *0: Program received signal SIGSEGV, Segmentation fault.*
>
> *0: 0x0000000000411a73 in MPIUI_Memcpy (dst=0x2aaaaadcfc0c,*
>
> *0: src=0xffffffffffffffff, len=4) at ../../include/mpiimpl.h:122*
>
> *0: 122 memcpy(dst, src, len);*
>
> *0: (gdb) bt*
>
> *0: #0 0x0000000000411a73 in MPIUI_Memcpy (dst=0x2aaaaadcfc0c,*
>
> *0: src=0xffffffffffffffff, len=4) at ../../include/mpiimpl.h:122*
>
> *0: #1 0x0000000000414eb3 in MPIR_Localcopy (sendbuf=0xffffffffffffffff,*
>
> *0: sendcount=1, sendtype=1275069445, recvbuf=0x2aaaaadcfc0c, recvcount=1,*
>
> *0: recvtype=1275069445) at helper_fns.c:335*
>
> *0: #2 0x0000000000410d48 in PMPI_Reduce (sendbuf=0xffffffffffffffff,*
>
> *0: recvbuf=0x7fffcb044bcc, count=1, datatype=1275069445, op=1476395011,*
>
> *0: root=1, comm=-1006632960) at reduce_osu.c:1017*
>
>
>
> If I replace MPI_IN_PLACE to variable then all works fine.
>
>
>
> Can somebody check this problem?
>
>
>
> --
>
> Thanks,
>
> Alexander Alekhin
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20101124/496d1ed7/attachment.html
More information about the mvapich-discuss
mailing list