[mvapich-discuss] Segfault when no user driver installed

Hari Subramoni subramoni.1 at osu.edu
Sat Jun 4 11:18:09 EDT 2016


Hi Maksym,

We will take a look at it.

However, as the error was triggered because of a system issue, I don't its
a code related issue.

Regards,
Hari.

On Sat, Jun 4, 2016 at 11:03 AM, Maksym Planeta <
mplaneta at os.inf.tu-dresden.de> wrote:

> The thing I wanted to report is that there is some issue with memory
> allocator. I just described the conditions to trigger the segfault.
>
> On 06/04/2016 04:59 PM, Hari Subramoni wrote:
>
>> Hi Maksym,
>>
>> Good to know that installing the missing libmlx4-1 package fixed the
>> problem. We will see if we can add a FAQ to our userguide to address
>> this issue.
>>
>> Regards,
>> Hari.
>>
>>
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon
>> >
>>         Virus-free. www.avast.com
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link
>> >
>>
>>
>> <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Fri, Jun 3, 2016 at 2:41 PM, Maksym Planeta
>> <mplaneta at os.inf.tu-dresden.de <mailto:mplaneta at os.inf.tu-dresden.de>>
>>
>> wrote:
>>
>>     Hi,
>>
>>     I was testing mvapich on new instalation and got following error
>>     message:
>>
>>     $ mpiexec -envall -np 2 -hosts 141.76.49.40,141.76.49.25
>>     $HOME/mpi/libexec/osu-micro-benchmarks/mpi/startup/osu_init
>>
>>     [os-dhcp040:mpi_rank_0][error_sighandler] Caught error: Segmentation
>>     fault (signal 11)
>>
>>
>> ===================================================================================
>>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>     =   PID 22406 RUNNING AT 141.76.49.40
>>     =   EXIT CODE: 139
>>     =   CLEANING UP REMAINING PROCESSES
>>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>
>> ===================================================================================
>>     [proxy:0:1 at os-dhcp025] HYD_pmcd_pmip_control_cmd_cb
>>     (pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
>>     [proxy:0:1 at os-dhcp025] HYDT_dmxu_poll_wait_for_event
>>     (tools/demux/demux_poll.c:76): callback returned error status
>>     [proxy:0:1 at os-dhcp025] main (pm/pmiserv/pmip.c:206): demux engine
>>     error waiting for event
>>     [mpiexec at os-dhcp040] HYDT_bscu_wait_for_completion
>>     (tools/bootstrap/utils/bscu_wait.c:76): one of the processes
>>     terminated badly; aborting
>>     [mpiexec at os-dhcp040] HYDT_bsci_wait_for_completion
>>     (tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>>     waiting for completion
>>     [mpiexec at os-dhcp040] HYD_pmci_wait_for_completion
>>     (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
>>     completion
>>     [mpiexec at os-dhcp040] main (ui/mpich/mpiexec.c:344): process manager
>>     error waiting for completion
>>
>>     It turned out that I was missing libmlx4-1 package and installing it
>>     fixed the problem.
>>
>>     And here is backtrace:
>>
>>     #0  0x00002aaaab200dbb in do_check_chunk (av=0x2aaaab618760
>>     <main_arena>,
>>          p=0x636f6c2f6374652f)
>>          at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2452
>>     #1  0x00002aaaab20128b in do_check_inuse_chunk (
>>          av=0x2aaaab618760 <main_arena>, p=0x636f6c2f6374652f)
>>          at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2541
>>     #2  0x00002aaaab20629d in malloc_consolidate (av=0x2aaaab618760
>>     <main_arena>)
>>          at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4553
>>     #3  0x00002aaaab20532f in _int_malloc (av=0x2aaaab618760 <main_arena>,
>>          bytes=552)
>>          at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4043
>>     #4  0x00002aaaab2040a1 in malloc (bytes=552)
>>          at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3408
>>     #5  0x00002aaaac5fbeed in __fopen_internal (
>>          filename=0x2aaaac6f991c "/etc/localtime", mode=0x2aaaac6f7f70
>>     "rce",
>>          is32=1) at iofopen.c:69
>>     #6  0x00002aaaac5fbf8a in _IO_new_fopen (filename=<optimized out>,
>>          mode=<optimized out>) at iofopen.c:97
>>     #7  0x00002aaaac63e007 in __tzfile_read (
>>          file=file at entry=0x2aaaac6f991c "/etc/localtime",
>>     extra=extra at entry=0,
>>          extrap=extrap at entry=0x0) at tzfile.c:168
>>     #8  0x00002aaaac63da39 in tzset_internal (always=<optimized out>,
>>          explicit=explicit at entry=1) at tzset.c:443
>>     #9  0x00002aaaac63ddab in __tz_convert (timer=0x7fffffffcd88,
>>          use_localtime=1, tp=0x2aaaac936560 <_tmbuf>) at tzset.c:628
>>     #10 0x00002aaaab15b10c in MPID_Abort (
>>          comm=0x2aaaab5e78c0 <MPID_Comm_builtin>, mpi_errno=0,
>> exit_code=1,
>>          error_msg=0x7fffffffd2c0 "Fatal error in MPI_Init:\nOther MPI
>>     error, error stack:\nMPIR_Init_thread(514)", '.' <repeats 12 times>,
>>     ": \nMPID_Init(365)", '.' <repeats 19 times>, ": channel
>>     initialization failed\nMPIDI_CH3_Init(414)", '.' <repeats 14 times>,
>>     ": rdma_get_"...) at src/mpid/ch3/src/mpid_abort.c:110
>>     #11 0x00002aaaab0f5992 in handleFatalError (
>>          comm_ptr=0x2aaaab5e78c0 <MPID_Comm_builtin>,
>>          fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
>>          at src/mpi/errhan/errutil.c:487
>>     #12 0x00002aaaab0f557b in MPIR_Err_return_comm (comm_ptr=0x0,
>>          fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
>>          at src/mpi/errhan/errutil.c:264
>>     #13 0x00002aaaab036ad8 in PMPI_Init (argc=0x7fffffffe36c,
>>     argv=0x7fffffffe360)
>>          at src/mpi/init/init.c:223
>>     #14 0x00000000004008f1 in main (argc=1, argv=0x7fffffffe4a8) at
>>     osu_init.c:23
>>
>>     The contents of p is:
>>
>>
>>     p p
>>     $2 = (mchunkptr) 0x636f6c2f6374652f
>>     (gdb) p (char *)&p
>>     $3 = 0x7fffffffcac0 "/etc/loc"
>>     (gdb)
>>
>>     --
>>     Regards,
>>     Maksym Planeta
>>
>>
>>     _______________________________________________
>>     mvapich-discuss mailing list
>>     mvapich-discuss at cse.ohio-state.edu
>>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160604/dc393274/attachment-0001.html>


More information about the mvapich-discuss mailing list