[mvapich-discuss] MPI Init hard failure

Subramoni, Hari subramoni.1 at osu.edu
Fri Mar 16 11:28:44 EDT 2018


Hello.

I apologize for the delay in getting back here. Can you please try setting MV2_SHMEM_DIR=”/tmp” and see if things pass?

Best Regards,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Sashi Balasingam
Sent: Wednesday, March 7, 2018 8:06 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] MPI Init hard failure

Hi,
I am using mvapich2-2.2a in our application, on an Intel server (x64), running SuSe Linux  Enterprise Server 12, SP-1 (x86_64). Occasionally, I run into a hard failure during MPI Init cal. We see this issue, only after our application itself had a catastrophic failure with a 'SIGBUSS Abort Error', and the process crashes. The next time we launch our app, which does an MPI init, we see this error listed below. Currently, we have to reboot our system to recover from the failure.


Fatal error in PMPI_Init_thread:

Other MPI error, error stack:

MPIR_Init_thread(514)....:

MPID_Init(359)...........: channel initialization failed

MPIDI_CH3_Init(474)......:

MPIDI_CH3I_SMP_Init(1921): SHMEM_COLL_init failed

Please provide any suggestions on - (a) what might be the cause, (b) how to avoid if possible (c) how to recover from this hard failure, without reboot.

Thanks,
Sashi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 9688 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180316/25fda14f/attachment.bin>


More information about the mvapich-discuss mailing list