[mvapich-discuss] Segfault when no user driver installed
Hari Subramoni
subramoni.1 at osu.edu
Sat Jun 4 11:18:09 EDT 2016
Hi Maksym,
We will take a look at it.
However, as the error was triggered because of a system issue, I don't its
a code related issue.
Regards,
Hari.
On Sat, Jun 4, 2016 at 11:03 AM, Maksym Planeta <
mplaneta at os.inf.tu-dresden.de> wrote:
> The thing I wanted to report is that there is some issue with memory
> allocator. I just described the conditions to trigger the segfault.
>
> On 06/04/2016 04:59 PM, Hari Subramoni wrote:
>
>> Hi Maksym,
>>
>> Good to know that installing the missing libmlx4-1 package fixed the
>> problem. We will see if we can add a FAQ to our userguide to address
>> this issue.
>>
>> Regards,
>> Hari.
>>
>>
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon
>> >
>> Virus-free. www.avast.com
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link
>> >
>>
>>
>> <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Fri, Jun 3, 2016 at 2:41 PM, Maksym Planeta
>> <mplaneta at os.inf.tu-dresden.de <mailto:mplaneta at os.inf.tu-dresden.de>>
>>
>> wrote:
>>
>> Hi,
>>
>> I was testing mvapich on new instalation and got following error
>> message:
>>
>> $ mpiexec -envall -np 2 -hosts 141.76.49.40,141.76.49.25
>> $HOME/mpi/libexec/osu-micro-benchmarks/mpi/startup/osu_init
>>
>> [os-dhcp040:mpi_rank_0][error_sighandler] Caught error: Segmentation
>> fault (signal 11)
>>
>>
>> ===================================================================================
>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = PID 22406 RUNNING AT 141.76.49.40
>> = EXIT CODE: 139
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>
>> ===================================================================================
>> [proxy:0:1 at os-dhcp025] HYD_pmcd_pmip_control_cmd_cb
>> (pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
>> [proxy:0:1 at os-dhcp025] HYDT_dmxu_poll_wait_for_event
>> (tools/demux/demux_poll.c:76): callback returned error status
>> [proxy:0:1 at os-dhcp025] main (pm/pmiserv/pmip.c:206): demux engine
>> error waiting for event
>> [mpiexec at os-dhcp040] HYDT_bscu_wait_for_completion
>> (tools/bootstrap/utils/bscu_wait.c:76): one of the processes
>> terminated badly; aborting
>> [mpiexec at os-dhcp040] HYDT_bsci_wait_for_completion
>> (tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>> waiting for completion
>> [mpiexec at os-dhcp040] HYD_pmci_wait_for_completion
>> (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
>> completion
>> [mpiexec at os-dhcp040] main (ui/mpich/mpiexec.c:344): process manager
>> error waiting for completion
>>
>> It turned out that I was missing libmlx4-1 package and installing it
>> fixed the problem.
>>
>> And here is backtrace:
>>
>> #0 0x00002aaaab200dbb in do_check_chunk (av=0x2aaaab618760
>> <main_arena>,
>> p=0x636f6c2f6374652f)
>> at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2452
>> #1 0x00002aaaab20128b in do_check_inuse_chunk (
>> av=0x2aaaab618760 <main_arena>, p=0x636f6c2f6374652f)
>> at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2541
>> #2 0x00002aaaab20629d in malloc_consolidate (av=0x2aaaab618760
>> <main_arena>)
>> at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4553
>> #3 0x00002aaaab20532f in _int_malloc (av=0x2aaaab618760 <main_arena>,
>> bytes=552)
>> at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4043
>> #4 0x00002aaaab2040a1 in malloc (bytes=552)
>> at
>>
>> src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3408
>> #5 0x00002aaaac5fbeed in __fopen_internal (
>> filename=0x2aaaac6f991c "/etc/localtime", mode=0x2aaaac6f7f70
>> "rce",
>> is32=1) at iofopen.c:69
>> #6 0x00002aaaac5fbf8a in _IO_new_fopen (filename=<optimized out>,
>> mode=<optimized out>) at iofopen.c:97
>> #7 0x00002aaaac63e007 in __tzfile_read (
>> file=file at entry=0x2aaaac6f991c "/etc/localtime",
>> extra=extra at entry=0,
>> extrap=extrap at entry=0x0) at tzfile.c:168
>> #8 0x00002aaaac63da39 in tzset_internal (always=<optimized out>,
>> explicit=explicit at entry=1) at tzset.c:443
>> #9 0x00002aaaac63ddab in __tz_convert (timer=0x7fffffffcd88,
>> use_localtime=1, tp=0x2aaaac936560 <_tmbuf>) at tzset.c:628
>> #10 0x00002aaaab15b10c in MPID_Abort (
>> comm=0x2aaaab5e78c0 <MPID_Comm_builtin>, mpi_errno=0,
>> exit_code=1,
>> error_msg=0x7fffffffd2c0 "Fatal error in MPI_Init:\nOther MPI
>> error, error stack:\nMPIR_Init_thread(514)", '.' <repeats 12 times>,
>> ": \nMPID_Init(365)", '.' <repeats 19 times>, ": channel
>> initialization failed\nMPIDI_CH3_Init(414)", '.' <repeats 14 times>,
>> ": rdma_get_"...) at src/mpid/ch3/src/mpid_abort.c:110
>> #11 0x00002aaaab0f5992 in handleFatalError (
>> comm_ptr=0x2aaaab5e78c0 <MPID_Comm_builtin>,
>> fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
>> at src/mpi/errhan/errutil.c:487
>> #12 0x00002aaaab0f557b in MPIR_Err_return_comm (comm_ptr=0x0,
>> fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
>> at src/mpi/errhan/errutil.c:264
>> #13 0x00002aaaab036ad8 in PMPI_Init (argc=0x7fffffffe36c,
>> argv=0x7fffffffe360)
>> at src/mpi/init/init.c:223
>> #14 0x00000000004008f1 in main (argc=1, argv=0x7fffffffe4a8) at
>> osu_init.c:23
>>
>> The contents of p is:
>>
>>
>> p p
>> $2 = (mchunkptr) 0x636f6c2f6374652f
>> (gdb) p (char *)&p
>> $3 = 0x7fffffffcac0 "/etc/loc"
>> (gdb)
>>
>> --
>> Regards,
>> Maksym Planeta
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160604/dc393274/attachment-0001.html>
More information about the mvapich-discuss
mailing list