[mvapich-discuss] Segfault when no user driver installed

Maksym Planeta mplaneta at os.inf.tu-dresden.de
Fri Jun 3 14:41:33 EDT 2016


Hi,

I was testing mvapich on new instalation and got following error message:

$ mpiexec -envall -np 2 -hosts 141.76.49.40,141.76.49.25 
$HOME/mpi/libexec/osu-micro-benchmarks/mpi/startup/osu_init 
 

[os-dhcp040:mpi_rank_0][error_sighandler] Caught error: Segmentation 
fault (signal 11)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 22406 RUNNING AT 141.76.49.40
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1 at os-dhcp025] HYD_pmcd_pmip_control_cmd_cb 
(pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
[proxy:0:1 at os-dhcp025] HYDT_dmxu_poll_wait_for_event 
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1 at os-dhcp025] main (pm/pmiserv/pmip.c:206): demux engine error 
waiting for event
[mpiexec at os-dhcp040] HYDT_bscu_wait_for_completion 
(tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated 
badly; aborting
[mpiexec at os-dhcp040] HYDT_bsci_wait_for_completion 
(tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting 
for completion
[mpiexec at os-dhcp040] HYD_pmci_wait_for_completion 
(pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for 
completion
[mpiexec at os-dhcp040] main (ui/mpich/mpiexec.c:344): process manager 
error waiting for completion

It turned out that I was missing libmlx4-1 package and installing it 
fixed the problem.

And here is backtrace:

#0  0x00002aaaab200dbb in do_check_chunk (av=0x2aaaab618760 <main_arena>,
     p=0x636f6c2f6374652f)
     at 
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2452
#1  0x00002aaaab20128b in do_check_inuse_chunk (
     av=0x2aaaab618760 <main_arena>, p=0x636f6c2f6374652f)
     at 
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:2541
#2  0x00002aaaab20629d in malloc_consolidate (av=0x2aaaab618760 
<main_arena>)
     at 
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4553
#3  0x00002aaaab20532f in _int_malloc (av=0x2aaaab618760 <main_arena>,
     bytes=552)
     at 
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4043
#4  0x00002aaaab2040a1 in malloc (bytes=552)
     at 
src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3408
#5  0x00002aaaac5fbeed in __fopen_internal (
     filename=0x2aaaac6f991c "/etc/localtime", mode=0x2aaaac6f7f70 "rce",
     is32=1) at iofopen.c:69
#6  0x00002aaaac5fbf8a in _IO_new_fopen (filename=<optimized out>,
     mode=<optimized out>) at iofopen.c:97
#7  0x00002aaaac63e007 in __tzfile_read (
     file=file at entry=0x2aaaac6f991c "/etc/localtime", extra=extra at entry=0,
     extrap=extrap at entry=0x0) at tzfile.c:168
#8  0x00002aaaac63da39 in tzset_internal (always=<optimized out>,
     explicit=explicit at entry=1) at tzset.c:443
#9  0x00002aaaac63ddab in __tz_convert (timer=0x7fffffffcd88,
     use_localtime=1, tp=0x2aaaac936560 <_tmbuf>) at tzset.c:628
#10 0x00002aaaab15b10c in MPID_Abort (
     comm=0x2aaaab5e78c0 <MPID_Comm_builtin>, mpi_errno=0, exit_code=1,
     error_msg=0x7fffffffd2c0 "Fatal error in MPI_Init:\nOther MPI 
error, error stack:\nMPIR_Init_thread(514)", '.' <repeats 12 times>, ": 
\nMPID_Init(365)", '.' <repeats 19 times>, ": channel initialization 
failed\nMPIDI_CH3_Init(414)", '.' <repeats 14 times>, ": rdma_get_"...) 
at src/mpid/ch3/src/mpid_abort.c:110
#11 0x00002aaaab0f5992 in handleFatalError (
     comm_ptr=0x2aaaab5e78c0 <MPID_Comm_builtin>,
     fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
     at src/mpi/errhan/errutil.c:487
#12 0x00002aaaab0f557b in MPIR_Err_return_comm (comm_ptr=0x0,
     fcname=0x2aaaab2ea0e0 <FCNAME.22795> "MPI_Init", errcode=2143631)
     at src/mpi/errhan/errutil.c:264
#13 0x00002aaaab036ad8 in PMPI_Init (argc=0x7fffffffe36c, 
argv=0x7fffffffe360)
     at src/mpi/init/init.c:223
#14 0x00000000004008f1 in main (argc=1, argv=0x7fffffffe4a8) at 
osu_init.c:23

The contents of p is:


p p
$2 = (mchunkptr) 0x636f6c2f6374652f
(gdb) p (char *)&p
$3 = 0x7fffffffcac0 "/etc/loc"
(gdb)

-- 
Regards,
Maksym Planeta

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160603/29c5e067/attachment.p7s>


More information about the mvapich-discuss mailing list