[mvapich-discuss] Hang in MVAPICH2-2.2 in PSM with MPI_THREAD_MULTIPLE

Moody, Adam T. moody20 at llnl.gov
Thu Aug 3 21:37:46 EDT 2017


Thanks, Hari.  I can reproduce it here, but I don't have a simple reproducer that I can send you yet.  I'll keep working this tomorrow under a debugger, but if you have ideas before that, I'm happy to try them.
-Adam
________________________________________
From: hari.subramoni at gmail.com <hari.subramoni at gmail.com> on behalf of Hari Subramoni <subramoni.1 at osu.edu>
Sent: Thursday, August 3, 2017 6:35:49 PM
To: Moody, Adam T.
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Hang in MVAPICH2-2.2 in PSM with MPI_THREAD_MULTIPLE

Hi Adam,

Thanks for the report. We will take a look at it and get back to you soon.

Do you happen to have any reproducer for this issue?

Regards,
Hari.

On Aug 3, 2017 9:32 PM, "Moody, Adam T." <moody20 at llnl.gov<mailto:moody20 at llnl.gov>> wrote:
Hello MVAPICH team,
We've got a user reporting an hang in MPI_Test when using MVAPICH2-2.2 for PSM.  This only happens when using MPI_THREAD_MULTIPLE.  Although the app is using MPI_THREAD_MULTIPLE, they don't actually use threads in this case.  They do not hit the hang if they call MPI_Init instead.

The stack trace on the main thread looks like the following:

pthread_spin_lock
psm_irecv
MPID_Irecv
MPIDU_Sched_continue
MPIDU_Sched_progress
psm_progress_wait
MPIR_Test_impl
PMPI_Test

I think the main thread has perhaps deadlocked itself on the primary PSM lock, but I'm not entirely clear why.

Can you see where the main thread might have grabbed the psm lock the first time based on this stack trace?

Or perhaps it bailed out of a function w/o releasing the lock?
Thanks,
-Adam
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list