[mvapich-discuss] [SPAM] Re: segment fault from MPI_Send

Hari Subramoni subramoni.1 at osu.edu
Thu Oct 13 00:08:57 EDT 2016


With mpirun_rsh, you need to either pass the environment variables directly
on the command line (mpirun_rsh -np... MV2_SUPPORT_DPM=1.... <exec>) or use
the "-export" option to export everything in the environment to all
processes (mpirun_rsh -export -np....)

Can you please try either one of these options and see if it works for you?

Thx,
Hari.

On Oct 13, 2016 12:02 AM, "吴雪" <sy1406125 at buaa.edu.cn> wrote:

> Hi,
> Thanks for your reply. And I tried to use mpirun_rsh to launch. But still
> didn't work. Errors are as follows. I tried to set MV2_SUPPORT_DPM=1 to
> support MPI_Comm_spawn. I got the same result. And I want to use '-genv' to
> pass some variables to each process. mpirun_rsh seems not support
> '-genv'.Is there any alternative?
>
> Best wishes,
> xue
>
> run at gpu-cluster-2:~/wx-cuda-workplace/mpiSpawn$ mpirun_rsh -hostfile hf
> -np 1 ./father
> [cli_0]: aborting job:
> Fatal error in MPI_Comm_spawn:
> Other MPI error, error stack:
> MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="./child", argv=(nil),
> maxprocs=8, info=0x9c000000, root=0, MPI_COMM_WORLD,
> intercomm=0x7fff46c654cc, errors=(nil)) failed
> MPIDI_Comm_spawn_multiple(147):
> MPID_Open_port(70)............: Function not implemented
>
> [gpu-cluster-2:mpispawn_0][readline] Unexpected End-Of-File on file
> descriptor 5. MPI process died?
> [gpu-cluster-2:mpispawn_0][mtpmi_processops] Error while reading PMI
> socket. MPI process died?
> [gpu-cluster-2:mpispawn_0][child_handler] MPI process (rank: 0, pid:
> 25579) exited with status 1
>
>
>
> -----原始邮件-----
> *发件人:* "Hari Subramoni" <subramoni.1 at osu.edu>
> *发送时间:* 2016年10月12日 星期三
> *收件人:* "吴雪" <sy1406125 at buaa.edu.cn>
> *抄送:* "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-
> state.edu>
> *主题:* [SPAM] Re: [mvapich-discuss] segment fault from MPI_Send
>
> Hello,
>
> Can you try the mpirun_rsh job launcher instead of mpiexec and see if
> things work?
>
> Regards,
> Hari.
>
> On Tue, Oct 11, 2016 at 8:36 AM, 吴雪 <sy1406125 at buaa.edu.cn> wrote:
>
>> Hi,all
>> I'm using MVAPICH2-2.2rc2.I have a program called father and the father
>> process use MPI_Comm_spawn to start 8 children processes called child.
>> Source code are as follows.
>> father:
>> #include<mpi.h>
>> int main(int argc,char **argv)
>> {
>> int provided = 0;
>> MPI_Init(&argc,&argv);
>> MPI_Info info=MPI_INFO_NULL;
>> char deviceHosts[10] = "hf";
>> MPI_Info_create(&info);
>> MPI_Info_set(info,"hostfile",deviceHosts);
>> MPI_Comm childComm;
>> MPI_Comm_spawn("./child",MPI_ARGV_NULL,8,info,0,MPI_COMM_WOR
>> LD,&childComm,MPI_ERRCODES_IGNORE);
>> int size = 64 * 1024;
>> int i,j;
>> int *a,*b;
>> a = (int *)malloc(size * sizeof(int));
>> b = (int *)malloc(size * sizeof(int));
>> for(j = 0;j < 500;j ++)
>> {
>> for(i = 0;i < 8;i ++)
>> {
>> MPI_Send(a,size,MPI_BYTE,i,0,childComm);
>> MPI_Recv(b,size,MPI_BYTE,i,0,childComm,MPI_STATUS_IGNORE);
>> }
>> }
>> MPI_Finalize();
>> return 0;
>> }
>> child:
>> #include<mpi.h>
>> #include<stdio.h>
>> int main(int argc,char **argv)
>> {
>> int provided = 0;
>> //MPI_Init_thread(argc,argv,MPI_THREAD_MULTIPLE,&provided);
>> MPI_Init(&argc,&argv);
>> int rank;
>> MPI_Comm fatherComm;
>> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>> printf("child %d start\n",rank);
>>
>> MPI_Comm_get_parent(&fatherComm);
>> int size = 64 * 1024;
>> int i;
>> int *a,*b;
>> b = (int *)malloc(size * sizeof(int));
>> for(i = 0;i < 500;i ++)
>> {
>> printf("child %d receive round %d\n",rank,i);
>> MPI_Recv(b,size,MPI_BYTE,0,0,fatherComm,MPI_STATUS_IGNORE);
>> MPI_Send(b,size,MPI_BYTE,0,0,fatherComm);
>> }
>> printf("child %d exit\n",rank);
>> MPI_Finalize();
>> return 0;
>> }
>>
>> the core file is:
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x00007fb379bf2a50 in vma_compare_search () from
>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>> (gdb) bt
>> #0  0x00007fb379bf2a50 in vma_compare_search () from
>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>> #1  0x00007fb379c11342 in avl_find () from /home/run/wx-workplace/mvapich
>> 2-2.2rc2/lib/libmpi.so.12
>> #2  0x00007fb379bf311e in dreg_find () from /home/run/wx-workplace/mvapich
>> 2-2.2rc2/lib/libmpi.so.12
>> #3  0x00007fb379bf539a in dreg_register () from
>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>> #4  0x00007fb379c0e669 in MPIDI_CH3I_MRAIL_Prepare_rndv () from
>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>> #5  0x00007fb379bd63db in MPIDI_CH3_iStartRndvMsg () from
>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>> #6  0x00007fb379bd0916 in MPID_MRAIL_RndvSend () from
>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>> #7  0x00007fb379bca91d in MPID_Send () from /home/run/wx-workplace/mvapich
>> 2-2.2rc2/lib/libmpi.so.12
>> #8  0x00007fb379b574e5 in PMPI_Send () from /home/run/wx-workplace/mvapich
>> 2-2.2rc2/lib/libmpi.so.12
>> #9  0x0000000000400a5e in main ()
>>
>> and in file 'hf' is '192.168.2.2:8'. I use mpiexec to launch the
>> job,'mpiexec -genv MV2_SUPPORT_DPM 1 -n 1 ./father'
>> I've not been able to find out what causes segment fault and how to make
>> it correct. I'll appreciate for any advice.
>> Looking forward to your reply.
>>
>> xue
>>
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161013/c748d41d/attachment-0001.html>


More information about the mvapich-discuss mailing list