[mvapich-discuss] Segfault when no user driver installed

Maksym Planeta mplaneta at os.inf.tu-dresden.de
Sat Jun 4 11:03:07 EDT 2016


The thing I wanted to report is that there is some issue with memory 
allocator. I just described the conditions to trigger the segfault.

On 06/04/2016 04:59 PM, Hari Subramoni wrote:
> Hi Maksym,
>
> Good to know that installing the missing libmlx4-1 package fixed the
> problem. We will see if we can add a FAQ to our userguide to address
> this issue.
>
> Regards,
> Hari.
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
> 	Virus-free. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
>
>
> <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Fri, Jun 3, 2016 at 2:41 PM, Maksym Planeta
> <mplaneta at os.inf.tu-dresden.de <mailto:mplaneta at os.inf.tu-dresden.de>>
> wrote:
>
>     Hi,
>
>     I was testing mvapich on new instalation and got following error
>     message:
>
>     $ mpiexec -envall -np 2 -hosts 141.76.49.40,141.76.49.25
>     $HOME/mpi/libexec/osu-micro-benchmarks/mpi/startup/osu_init
>
>     [os-dhcp040:mpi_rank_0][error_sighandler] Caught error: Segmentation
>     fault (signal 11)
>
>     ===================================================================================
>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>     =   PID 22406 RUNNING AT 141.76.49.40
>     =   EXIT CODE: 139
>     =   CLEANING UP REMAINING PROCESSES
>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>     ===================================================================================
>     [proxy:0:1 at os-dhcp025] HYD_pmcd_pmip_control_cmd_cb
>     (pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
>     [proxy:0:1 at os-dhcp025] HYDT_dmxu_poll_wait_for_event
>     (tools/demux/demux_poll.c:76): callback returned error status
>     [proxy:0:1 at os-dhcp025] main (pm/pmiserv/pmip.c:206): demux engine
>     error waiting for event
>     [mpiexec at os-dhcp040] HYDT_bscu_wait_for_completion
>     (tools/bootstrap/utils/bscu_wait.c:76): one of the processes
>     terminated badly; aborting
>     [mpiexec at os-dhcp040] HYDT_bsci_wait_for_completion
>     (tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>     waiting for completion
>     [mpiexec at os-dhcp040] HYD_pmci_wait_for_completion
>     (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
>     completion
>     [mpiexec at os-dhcp040] main (ui/mpich/mpiexec.c:344): process manager
>     error waiting for completion
>
>     It turned out that I was missing libmlx4-1 package and installing it
>     fixed the problem.
>
>     And here is backtrace:
>
>     #0  0x00002aaaab200dbb in do_check_chunk (av=0x2aaaab618760
>     <main_arena>,
>          p=0x636f6c2f6374652f)
>          at
>     src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2452
>     #1  0x00002aaaab20128b in do_check_inuse_chunk (
>          av=0x2aaaab618760 <main_arena>, p=0x636f6c2f6374652f)
>          at
>     src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2541
>     #2  0x00002aaaab20629d in malloc_consolidate (av=0x2aaaab618760
>     <main_arena>)
>          at
>     src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4553
>     #3  0x00002aaaab20532f in _int_malloc (av=0x2aaaab618760 <main_arena>,
>          bytes=552)
>          at
>     src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4043
>     #4  0x00002aaaab2040a1 in malloc (bytes=552)
>          at
>     src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3408
>     #5  0x00002aaaac5fbeed in __fopen_internal (
>          filename=0x2aaaac6f991c "/etc/localtime", mode=0x2aaaac6f7f70
>     "rce",
>          is32=1) at iofopen.c:69
>     #6  0x00002aaaac5fbf8a in _IO_new_fopen (filename=<optimized out>,
>          mode=<optimized out>) at iofopen.c:97
>     #7  0x00002aaaac63e007 in __tzfile_read (
>          file=file at entry=0x2aaaac6f991c "/etc/localtime",
>     extra=extra at entry=0,
>          extrap=extrap at entry=0x0) at tzfile.c:168
>     #8  0x00002aaaac63da39 in tzset_internal (always=<optimized out>,
>          explicit=explicit at entry=1) at tzset.c:443
>     #9  0x00002aaaac63ddab in __tz_convert (timer=0x7fffffffcd88,
>          use_localtime=1, tp=0x2aaaac936560 <_tmbuf>) at tzset.c:628
>     #10 0x00002aaaab15b10c in MPID_Abort (
>          comm=0x2aaaab5e78c0 <MPID_Comm_builtin>, mpi_errno=0, exit_code=1,
>          error_msg=0x7fffffffd2c0 "Fatal error in MPI_Init:\nOther MPI
>     error, error stack:\nMPIR_Init_thread(514)", '.' <repeats 12 times>,
>     ": \nMPID_Init(365)", '.' <repeats 19 times>, ": channel
>     initialization failed\nMPIDI_CH3_Init(414)", '.' <repeats 14 times>,
>     ": rdma_get_"...) at src/mpid/ch3/src/mpid_abort.c:110
>     #11 0x00002aaaab0f5992 in handleFatalError (
>          comm_ptr=0x2aaaab5e78c0 <MPID_Comm_builtin>,
>          fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
>          at src/mpi/errhan/errutil.c:487
>     #12 0x00002aaaab0f557b in MPIR_Err_return_comm (comm_ptr=0x0,
>          fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
>          at src/mpi/errhan/errutil.c:264
>     #13 0x00002aaaab036ad8 in PMPI_Init (argc=0x7fffffffe36c,
>     argv=0x7fffffffe360)
>          at src/mpi/init/init.c:223
>     #14 0x00000000004008f1 in main (argc=1, argv=0x7fffffffe4a8) at
>     osu_init.c:23
>
>     The contents of p is:
>
>
>     p p
>     $2 = (mchunkptr) 0x636f6c2f6374652f
>     (gdb) p (char *)&p
>     $3 = 0x7fffffffcac0 "/etc/loc"
>     (gdb)
>
>     --
>     Regards,
>     Maksym Planeta
>
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160604/26228425/attachment.p7s>


More information about the mvapich-discuss mailing list