[mvapich-discuss] segmentation fault

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Jun 20 12:48:09 EDT 2012


On Mon, Jun 18, 2012 at 09:41:28PM -0400, Jonathan Perkins wrote:
> On Mon, Jun 18, 2012 at 03:50:06PM -0400, Hoot Thompson wrote:
> > A little background, I've been working with Mellanox to get SR-IOV
> > working between two Virtual Machines (VM). As of today, I have two
> > real machines each with a VM and a virtual IB connection up between
> > them. Logged into one of the VMs, I can ping and run the rdma_bw and
> > rdma_lat tests between the VMs just fine. Attempts to run osu_bw
> > (compiled with the Intel compiler), fails with the
> > following................
> > 
> > 
> > [root at penguin1-vm1 mvapich2-1.8-r5435]# mpiexec -n 2 -hosts
> > 10.10.10.1,10.10.10.2
> > /root/osu/mvapich2-1.8-r5435/osu_benchmarks/osu_bw
> > [penguin1-vm1:mpi_rank_0][error_sighandler] Caught error:
> > Segmentation fault (signal 11)
> > [pengui2-vm1:mpi_rank_1][error_sighandler] Caught error:
> > Segmentation fault (signal 11)
> > 
> > =====================================================================================
> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > =   EXIT CODE: 139
> > =   CLEANING UP REMAINING PROCESSES
> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > =====================================================================================
> > [proxy:0:1 at pengui2-vm1] HYD_pmcd_pmip_control_cmd_cb
> > (./pm/pmiserv/pmip_cb.c:955): assert (!closed) failed
> > [proxy:0:1 at pengui2-vm1] HYDT_dmxu_poll_wait_for_event
> > (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:1 at pengui2-vm1] main (./pm/pmiserv/pmip.c:226): demux engine
> > error waiting for event
> > APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
> > 
> > 
> > Any thoughts?
> 
> Using a debug of MVAPICH2 (built with --disable-fast --enable-g=dbg)
> please set the environment variable MV2_DEBUG_CORESIZE=unlimited or
> MV2_DEBUG_SHOW_BACKTRACE=1 so that we can get more information about
> why its segfaulting.
> 
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8.html#x1-1120009.1.10

For everyone's info:
After getting the debugging info from the above options we found that
there was a hostname resolution problem on the virtual machines.  This
problem has now been resolved.

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list