[mvapich-discuss] bus error in MPIDI_CH3I_CM_SHMEM_Sync

Lana Deere lana.deere at gmail.com
Fri Dec 4 17:42:41 EST 2020


OK, I will try the new version.

.. Lana (lana.deere at gmail.com)




On Fri, Dec 4, 2020 at 5:29 PM Subramoni, Hari <subramoni.1 at osu.edu> wrote:

> Hi, Lana.
>
>
>
> Sorry to hear that you’re facing issues. If possible, could you please try
> out your program with the new MVAPICH2 2.3.5 release we made a few days ago
> and see if it resolves your issues?
>
>
>
> Best,
>
> Hari.
>
>
>
> *From:* mvapich-discuss-bounces at cse.ohio-state.edu <
> mvapich-discuss-bounces at mailman.cse.ohio-state.edu> *On Behalf Of *Lana
> Deere
> *Sent:* Friday, December 4, 2020 4:23 PM
> *To:* mvapich-discuss at cse.ohio-state.edu <
> mvapich-discuss at mailman.cse.ohio-state.edu>
> *Subject:* [mvapich-discuss] bus error in MPIDI_CH3I_CM_SHMEM_Sync
>
>
>
> I'm having a rarely-occurring problem using mvapich2 2.3.4 GA on a CentOS7
> cluster.
>
> I've run our proprietary program using the same input dataset on the same
> cluster several hundred times, and about 1% of the time the run crashes
> with a bus error at a traceback which looks like this:
> ...ty/lib/libmpi.so.12 MPIDI_CH3I_CM_SHMEM_Sync
> ...ty/lib/libmpi.so.12 MPIDI_CH3I_SMP_init
> ...ty/lib/libmpi.so.12 MPIDI_CH3_Init
> ...ty/lib/libmpi.so.12 MPID_Init
> ...ty/lib/libmpi.so.12 MPIR_Init_thread
> ...ty/lib/libmpi.so.12 MPI_Init_thread
> I'm not sure where it is exactly inside MPIDI_CH3I_CM_SHMEM_Sync.
>
> The process which gets the bus error is always a child subprocess created
> using MPI_Comm_spawn.  The rest of the child subprocesses are hung
> somewhere in MPI_Init_thread (or it's subfunctions) and the parent
> processes are all hung somewhere in MPI_Comm_spawn (or its subfunctions).
>
> Has anyone seen anything like this before?  Does anyone have any
> suggestions on how to try debugging it?  I see some PRINT_DEBUG statements
> in the function but I don't know how to turn them on.
>
>
> .. Lana (lana.deere at gmail.com)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20201204/6af6a54f/attachment-0001.html>


More information about the mvapich-discuss mailing list