[mvapich-discuss] bus error in MPIDI_CH3I_CM_SHMEM_Sync

Subramoni, Hari subramoni.1 at osu.edu
Fri Dec 4 17:29:19 EST 2020


Hi, Lana.

Sorry to hear that you’re facing issues. If possible, could you please try out your program with the new MVAPICH2 2.3.5 release we made a few days ago and see if it resolves your issues?

Best,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of Lana Deere
Sent: Friday, December 4, 2020 4:23 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] bus error in MPIDI_CH3I_CM_SHMEM_Sync

I'm having a rarely-occurring problem using mvapich2 2.3.4 GA on a CentOS7 cluster.

I've run our proprietary program using the same input dataset on the same cluster several hundred times, and about 1% of the time the run crashes with a bus error at a traceback which looks like this:
...ty/lib/libmpi.so.12 MPIDI_CH3I_CM_SHMEM_Sync
...ty/lib/libmpi.so.12 MPIDI_CH3I_SMP_init
...ty/lib/libmpi.so.12 MPIDI_CH3_Init
...ty/lib/libmpi.so.12 MPID_Init
...ty/lib/libmpi.so.12 MPIR_Init_thread
...ty/lib/libmpi.so.12 MPI_Init_thread
I'm not sure where it is exactly inside MPIDI_CH3I_CM_SHMEM_Sync.

The process which gets the bus error is always a child subprocess created using MPI_Comm_spawn.  The rest of the child subprocesses are hung somewhere in MPI_Init_thread (or it's subfunctions) and the parent processes are all hung somewhere in MPI_Comm_spawn (or its subfunctions).

Has anyone seen anything like this before?  Does anyone have any suggestions on how to try debugging it?  I see some PRINT_DEBUG statements in the function but I don't know how to turn them on.

.. Lana (lana.deere at gmail.com<mailto:lana.deere at gmail.com>)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20201204/016a1fb8/attachment.html>


More information about the mvapich-discuss mailing list