[mvapich-discuss] segment fault from MPI_Send

Hari Subramoni subramoni.1 at osu.edu
Wed Oct 12 10:53:25 EDT 2016


Hello,

Can you try the mpirun_rsh job launcher instead of mpiexec and see if
things work?

Regards,
Hari.

On Tue, Oct 11, 2016 at 8:36 AM, 吴雪 <sy1406125 at buaa.edu.cn> wrote:

> Hi,all
> I'm using MVAPICH2-2.2rc2.I have a program called father and the father
> process use MPI_Comm_spawn to start 8 children processes called child.
> Source code are as follows.
> father:
> #include<mpi.h>
> int main(int argc,char **argv)
> {
> int provided = 0;
> MPI_Init(&argc,&argv);
> MPI_Info info=MPI_INFO_NULL;
> char deviceHosts[10] = "hf";
> MPI_Info_create(&info);
> MPI_Info_set(info,"hostfile",deviceHosts);
> MPI_Comm childComm;
> MPI_Comm_spawn("./child",MPI_ARGV_NULL,8,info,0,MPI_COMM_
> WORLD,&childComm,MPI_ERRCODES_IGNORE);
> int size = 64 * 1024;
> int i,j;
> int *a,*b;
> a = (int *)malloc(size * sizeof(int));
> b = (int *)malloc(size * sizeof(int));
> for(j = 0;j < 500;j ++)
> {
> for(i = 0;i < 8;i ++)
> {
> MPI_Send(a,size,MPI_BYTE,i,0,childComm);
> MPI_Recv(b,size,MPI_BYTE,i,0,childComm,MPI_STATUS_IGNORE);
> }
> }
> MPI_Finalize();
> return 0;
> }
> child:
> #include<mpi.h>
> #include<stdio.h>
> int main(int argc,char **argv)
> {
> int provided = 0;
> //MPI_Init_thread(argc,argv,MPI_THREAD_MULTIPLE,&provided);
> MPI_Init(&argc,&argv);
> int rank;
> MPI_Comm fatherComm;
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> printf("child %d start\n",rank);
>
> MPI_Comm_get_parent(&fatherComm);
> int size = 64 * 1024;
> int i;
> int *a,*b;
> b = (int *)malloc(size * sizeof(int));
> for(i = 0;i < 500;i ++)
> {
> printf("child %d receive round %d\n",rank,i);
> MPI_Recv(b,size,MPI_BYTE,0,0,fatherComm,MPI_STATUS_IGNORE);
> MPI_Send(b,size,MPI_BYTE,0,0,fatherComm);
> }
> printf("child %d exit\n",rank);
> MPI_Finalize();
> return 0;
> }
>
> the core file is:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007fb379bf2a50 in vma_compare_search () from
> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
> (gdb) bt
> #0  0x00007fb379bf2a50 in vma_compare_search () from
> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
> #1  0x00007fb379c11342 in avl_find () from /home/run/wx-workplace/
> mvapich2-2.2rc2/lib/libmpi.so.12
> #2  0x00007fb379bf311e in dreg_find () from /home/run/wx-workplace/
> mvapich2-2.2rc2/lib/libmpi.so.12
> #3  0x00007fb379bf539a in dreg_register () from /home/run/wx-workplace/
> mvapich2-2.2rc2/lib/libmpi.so.12
> #4  0x00007fb379c0e669 in MPIDI_CH3I_MRAIL_Prepare_rndv () from
> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
> #5  0x00007fb379bd63db in MPIDI_CH3_iStartRndvMsg () from
> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
> #6  0x00007fb379bd0916 in MPID_MRAIL_RndvSend () from
> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
> #7  0x00007fb379bca91d in MPID_Send () from /home/run/wx-workplace/
> mvapich2-2.2rc2/lib/libmpi.so.12
> #8  0x00007fb379b574e5 in PMPI_Send () from /home/run/wx-workplace/
> mvapich2-2.2rc2/lib/libmpi.so.12
> #9  0x0000000000400a5e in main ()
>
> and in file 'hf' is '192.168.2.2:8'. I use mpiexec to launch the
> job,'mpiexec -genv MV2_SUPPORT_DPM 1 -n 1 ./father'
> I've not been able to find out what causes segment fault and how to make
> it correct. I'll appreciate for any advice.
> Looking forward to your reply.
>
> xue
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161012/2d99dcef/attachment.html>


More information about the mvapich-discuss mailing list