[mvapich-discuss] VAPI_PORT_ERROR

Sayantan Sur surs at cse.ohio-state.edu
Wed Nov 22 20:03:19 EST 2006


Hello Axel,

Thanks for reporting this. We haven't seen this error before. I suspect 
that it might be due to some InfiniBand cables being loose. Can you run 
Pallas (IMB) benchmarks on the cluster to make sure that all global 
communications are OK? Also, if possible could you consider moving (or 
asking the sysadmin of your cluster) to move to Open Fabrics 
distribution (OFED)? That is the latest and greatest InfiniBand 
software. It will be great if you could move to OFED.

Thanks,
Sayantan.

Axel Rimanek wrote:
> Hello,
> I'm currently working on the infiniband cluster of TU-Muenchen with the 
> following software installed:
>
> SuSE Professional 9.1,
> Kernel 2.6.5
> gcc version 3.3.3
> OSU MVAPICH VERSION 0.9.7-SingleRail 
>
> When I execute my parallel program, I get the following error after some
> time (intensive calculation  for 90min which works fine)
>
> [40] Abort: Got an asynchronous event: VAPI_PORT_ERROR
> (VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
> [24] Abort: Got an asynchronous event: VAPI_PORT_ERROR
> (VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
> mpirun_rsh: Abort signaled from [40]
> [8] Abort: Got an asynchronous event: VAPI_PORT_ERROR
> (VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
> [56] Abort: Got an asynchronous event: VAPI_PORT_ERROR
> (VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
> done.
>
> When I try to restart the programm, I get instantly
>
> [24] Abort: [opt09:24] Got completion with error, code=VAPI_RETRY_EXC_ERR,
> vendor code=85
>  at line 2044 in file viacheck.c
> mpirun_rsh: Abort signaled from [24]
> done.
>
> I first executed the program on half of all available machines of the
> cluster and repeated everything now on the rest! I got the same error
> messages and now the program returns the last set of errors constantly.
>
> I already tried osu_bw in afterwards on all machines and it works...
>
> Thx,
> Axel
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>   

-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list