[mvapich-discuss] [SPAM] Re: segment fault from MPI_Send

Hari Subramoni subramoni.1 at osu.edu
Thu Oct 13 13:00:54 EDT 2016


This issue has been resolved through off-list discussion. The user was able
to run the program using the following command line option.

 ./bin/mpirun_rsh -np 1 -hostfile hosts MV2_SUPPORT_DPM=1
MV2_USE_RDMA_FAST_PATH=0 ./father

Thx,
Hari.

On Thu, Oct 13, 2016 at 12:08 AM, Hari Subramoni <subramoni.1 at osu.edu>
wrote:

> With mpirun_rsh, you need to either pass the environment variables
> directly on the command line (mpirun_rsh -np... MV2_SUPPORT_DPM=1....
> <exec>) or use the "-export" option to export everything in the environment
> to all processes (mpirun_rsh -export -np....)
>
> Can you please try either one of these options and see if it works for
> you?
>
> Thx,
> Hari.
>
> On Oct 13, 2016 12:02 AM, "吴雪" <sy1406125 at buaa.edu.cn> wrote:
>
>> Hi,
>> Thanks for your reply. And I tried to use mpirun_rsh to launch. But still
>> didn't work. Errors are as follows. I tried to set MV2_SUPPORT_DPM=1 to
>> support MPI_Comm_spawn. I got the same result. And I want to use '-genv' to
>> pass some variables to each process. mpirun_rsh seems not support
>> '-genv'.Is there any alternative?
>>
>> Best wishes,
>> xue
>>
>> run at gpu-cluster-2:~/wx-cuda-workplace/mpiSpawn$ mpirun_rsh -hostfile hf
>> -np 1 ./father
>> [cli_0]: aborting job:
>> Fatal error in MPI_Comm_spawn:
>> Other MPI error, error stack:
>> MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="./child",
>> argv=(nil), maxprocs=8, info=0x9c000000, root=0, MPI_COMM_WORLD,
>> intercomm=0x7fff46c654cc, errors=(nil)) failed
>> MPIDI_Comm_spawn_multiple(147):
>> MPID_Open_port(70)............: Function not implemented
>>
>> [gpu-cluster-2:mpispawn_0][readline] Unexpected End-Of-File on file
>> descriptor 5. MPI process died?
>> [gpu-cluster-2:mpispawn_0][mtpmi_processops] Error while reading PMI
>> socket. MPI process died?
>> [gpu-cluster-2:mpispawn_0][child_handler] MPI process (rank: 0, pid:
>> 25579) exited with status 1
>>
>>
>>
>> -----原始邮件-----
>> *发件人:* "Hari Subramoni" <subramoni.1 at osu.edu>
>> *发送时间:* 2016年10月12日 星期三
>> *收件人:* "吴雪" <sy1406125 at buaa.edu.cn>
>> *抄送:* "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-stat
>> e.edu>
>> *主题:* [SPAM] Re: [mvapich-discuss] segment fault from MPI_Send
>>
>> Hello,
>>
>> Can you try the mpirun_rsh job launcher instead of mpiexec and see if
>> things work?
>>
>> Regards,
>> Hari.
>>
>> On Tue, Oct 11, 2016 at 8:36 AM, 吴雪 <sy1406125 at buaa.edu.cn> wrote:
>>
>>> Hi,all
>>> I'm using MVAPICH2-2.2rc2.I have a program called father and the father
>>> process use MPI_Comm_spawn to start 8 children processes called child.
>>> Source code are as follows.
>>> father:
>>> #include<mpi.h>
>>> int main(int argc,char **argv)
>>> {
>>> int provided = 0;
>>> MPI_Init(&argc,&argv);
>>> MPI_Info info=MPI_INFO_NULL;
>>> char deviceHosts[10] = "hf";
>>> MPI_Info_create(&info);
>>> MPI_Info_set(info,"hostfile",deviceHosts);
>>> MPI_Comm childComm;
>>> MPI_Comm_spawn("./child",MPI_ARGV_NULL,8,info,0,MPI_COMM_WOR
>>> LD,&childComm,MPI_ERRCODES_IGNORE);
>>> int size = 64 * 1024;
>>> int i,j;
>>> int *a,*b;
>>> a = (int *)malloc(size * sizeof(int));
>>> b = (int *)malloc(size * sizeof(int));
>>> for(j = 0;j < 500;j ++)
>>> {
>>> for(i = 0;i < 8;i ++)
>>> {
>>> MPI_Send(a,size,MPI_BYTE,i,0,childComm);
>>> MPI_Recv(b,size,MPI_BYTE,i,0,childComm,MPI_STATUS_IGNORE);
>>> }
>>> }
>>> MPI_Finalize();
>>> return 0;
>>> }
>>> child:
>>> #include<mpi.h>
>>> #include<stdio.h>
>>> int main(int argc,char **argv)
>>> {
>>> int provided = 0;
>>> //MPI_Init_thread(argc,argv,MPI_THREAD_MULTIPLE,&provided);
>>> MPI_Init(&argc,&argv);
>>> int rank;
>>> MPI_Comm fatherComm;
>>> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>> printf("child %d start\n",rank);
>>>
>>> MPI_Comm_get_parent(&fatherComm);
>>> int size = 64 * 1024;
>>> int i;
>>> int *a,*b;
>>> b = (int *)malloc(size * sizeof(int));
>>> for(i = 0;i < 500;i ++)
>>> {
>>> printf("child %d receive round %d\n",rank,i);
>>> MPI_Recv(b,size,MPI_BYTE,0,0,fatherComm,MPI_STATUS_IGNORE);
>>> MPI_Send(b,size,MPI_BYTE,0,0,fatherComm);
>>> }
>>> printf("child %d exit\n",rank);
>>> MPI_Finalize();
>>> return 0;
>>> }
>>>
>>> the core file is:
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x00007fb379bf2a50 in vma_compare_search () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> (gdb) bt
>>> #0  0x00007fb379bf2a50 in vma_compare_search () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #1  0x00007fb379c11342 in avl_find () from /home/run/wx-workplace/mvapich
>>> 2-2.2rc2/lib/libmpi.so.12
>>> #2  0x00007fb379bf311e in dreg_find () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #3  0x00007fb379bf539a in dreg_register () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #4  0x00007fb379c0e669 in MPIDI_CH3I_MRAIL_Prepare_rndv () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #5  0x00007fb379bd63db in MPIDI_CH3_iStartRndvMsg () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #6  0x00007fb379bd0916 in MPID_MRAIL_RndvSend () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #7  0x00007fb379bca91d in MPID_Send () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #8  0x00007fb379b574e5 in PMPI_Send () from
>>> /home/run/wx-workplace/mvapich2-2.2rc2/lib/libmpi.so.12
>>> #9  0x0000000000400a5e in main ()
>>>
>>> and in file 'hf' is '192.168.2.2:8'. I use mpiexec to launch the
>>> job,'mpiexec -genv MV2_SUPPORT_DPM 1 -n 1 ./father'
>>> I've not been able to find out what causes segment fault and how to make
>>> it correct. I'll appreciate for any advice.
>>> Looking forward to your reply.
>>>
>>> xue
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161013/66a13ef5/attachment-0001.html>


More information about the mvapich-discuss mailing list