[mvapich-discuss] MPI Init hard failure
Subramoni, Hari
subramoni.1 at osu.edu
Fri Mar 16 11:28:44 EDT 2018
Hello.
I apologize for the delay in getting back here. Can you please try setting MV2_SHMEM_DIR=”/tmp” and see if things pass?
Best Regards,
Hari.
From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Sashi Balasingam
Sent: Wednesday, March 7, 2018 8:06 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] MPI Init hard failure
Hi,
I am using mvapich2-2.2a in our application, on an Intel server (x64), running SuSe Linux Enterprise Server 12, SP-1 (x86_64). Occasionally, I run into a hard failure during MPI Init cal. We see this issue, only after our application itself had a catastrophic failure with a 'SIGBUSS Abort Error', and the process crashes. The next time we launch our app, which does an MPI init, we see this error listed below. Currently, we have to reboot our system to recover from the failure.
Fatal error in PMPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(514)....:
MPID_Init(359)...........: channel initialization failed
MPIDI_CH3_Init(474)......:
MPIDI_CH3I_SMP_Init(1921): SHMEM_COLL_init failed
Please provide any suggestions on - (a) what might be the cause, (b) how to avoid if possible (c) how to recover from this hard failure, without reboot.
Thanks,
Sashi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 9688 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180316/25fda14f/attachment.bin>
More information about the mvapich-discuss
mailing list