[mvapich-discuss] MPI Init hard failure

Sashi Balasingam sashibala2 at yahoo.com
Wed Mar 7 20:06:16 EST 2018


Hi,I am using mvapich2-2.2a in our application, on an Intel server (x64), running SuSe Linux  Enterprise Server 12, SP-1 (x86_64). Occasionally, I run into a hard failure during MPI Init cal. We see this issue, only after our application itself had a catastrophic failure with a 'SIGBUSS Abort Error', and the process crashes. The next time we launch our app, which does an MPI init, we see this error listed below. Currently, we have to reboot our system to recover from the failure.

Fatal errorin PMPI_Init_thread:

Other MPIerror, error stack:

MPIR_Init_thread(514)....:

MPID_Init(359)...........:channel initialization failed

MPIDI_CH3_Init(474)......:

MPIDI_CH3I_SMP_Init(1921):SHMEM_COLL_init failed

Please provide any suggestions on - (a) what might be the cause, (b) how to avoid if possible (c) how to recover from this hard failure, without reboot.
Thanks,Sashi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180308/2876d3d1/attachment.html>


More information about the mvapich-discuss mailing list