[mvapich-discuss] Segfault when no user driver installed
Maksym Planeta
mplaneta at os.inf.tu-dresden.de
Fri Jun 3 14:41:33 EDT 2016
Hi,
I was testing mvapich on new instalation and got following error message:
$ mpiexec -envall -np 2 -hosts 141.76.49.40,141.76.49.25
$HOME/mpi/libexec/osu-micro-benchmarks/mpi/startup/osu_init
[os-dhcp040:mpi_rank_0][error_sighandler] Caught error: Segmentation
fault (signal 11)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 22406 RUNNING AT 141.76.49.40
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1 at os-dhcp025] HYD_pmcd_pmip_control_cmd_cb
(pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
[proxy:0:1 at os-dhcp025] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1 at os-dhcp025] main (pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[mpiexec at os-dhcp040] HYDT_bscu_wait_for_completion
(tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at os-dhcp040] HYDT_bsci_wait_for_completion
(tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
[mpiexec at os-dhcp040] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
completion
[mpiexec at os-dhcp040] main (ui/mpich/mpiexec.c:344): process manager
error waiting for completion
It turned out that I was missing libmlx4-1 package and installing it
fixed the problem.
And here is backtrace:
#0 0x00002aaaab200dbb in do_check_chunk (av=0x2aaaab618760 <main_arena>,
p=0x636f6c2f6374652f)
at
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2452
#1 0x00002aaaab20128b in do_check_inuse_chunk (
av=0x2aaaab618760 <main_arena>, p=0x636f6c2f6374652f)
at
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2541
#2 0x00002aaaab20629d in malloc_consolidate (av=0x2aaaab618760
<main_arena>)
at
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4553
#3 0x00002aaaab20532f in _int_malloc (av=0x2aaaab618760 <main_arena>,
bytes=552)
at
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4043
#4 0x00002aaaab2040a1 in malloc (bytes=552)
at
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3408
#5 0x00002aaaac5fbeed in __fopen_internal (
filename=0x2aaaac6f991c "/etc/localtime", mode=0x2aaaac6f7f70 "rce",
is32=1) at iofopen.c:69
#6 0x00002aaaac5fbf8a in _IO_new_fopen (filename=<optimized out>,
mode=<optimized out>) at iofopen.c:97
#7 0x00002aaaac63e007 in __tzfile_read (
file=file at entry=0x2aaaac6f991c "/etc/localtime", extra=extra at entry=0,
extrap=extrap at entry=0x0) at tzfile.c:168
#8 0x00002aaaac63da39 in tzset_internal (always=<optimized out>,
explicit=explicit at entry=1) at tzset.c:443
#9 0x00002aaaac63ddab in __tz_convert (timer=0x7fffffffcd88,
use_localtime=1, tp=0x2aaaac936560 <_tmbuf>) at tzset.c:628
#10 0x00002aaaab15b10c in MPID_Abort (
comm=0x2aaaab5e78c0 <MPID_Comm_builtin>, mpi_errno=0, exit_code=1,
error_msg=0x7fffffffd2c0 "Fatal error in MPI_Init:\nOther MPI
error, error stack:\nMPIR_Init_thread(514)", '.' <repeats 12 times>, ":
\nMPID_Init(365)", '.' <repeats 19 times>, ": channel initialization
failed\nMPIDI_CH3_Init(414)", '.' <repeats 14 times>, ": rdma_get_"...)
at src/mpid/ch3/src/mpid_abort.c:110
#11 0x00002aaaab0f5992 in handleFatalError (
comm_ptr=0x2aaaab5e78c0 <MPID_Comm_builtin>,
fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
at src/mpi/errhan/errutil.c:487
#12 0x00002aaaab0f557b in MPIR_Err_return_comm (comm_ptr=0x0,
fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
at src/mpi/errhan/errutil.c:264
#13 0x00002aaaab036ad8 in PMPI_Init (argc=0x7fffffffe36c,
argv=0x7fffffffe360)
at src/mpi/init/init.c:223
#14 0x00000000004008f1 in main (argc=1, argv=0x7fffffffe4a8) at
osu_init.c:23
The contents of p is:
p p
$2 = (mchunkptr) 0x636f6c2f6374652f
(gdb) p (char *)&p
$3 = 0x7fffffffcac0 "/etc/loc"
(gdb)
--
Regards,
Maksym Planeta
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160603/29c5e067/attachment.p7s>
More information about the mvapich-discuss
mailing list