[mvapich-discuss] VAPI_PORT_ERROR
Axel Rimanek
Axel at Rimanek.de
Wed Nov 22 15:09:27 EST 2006
Hello,
I'm currently working on the infiniband cluster of TU-Muenchen with the
following software installed:
SuSE Professional 9.1,
Kernel 2.6.5
gcc version 3.3.3
OSU MVAPICH VERSION 0.9.7-SingleRail
When I execute my parallel program, I get the following error after some
time (intensive calculation for 90min which works fine)
[40] Abort: Got an asynchronous event: VAPI_PORT_ERROR
(VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
[24] Abort: Got an asynchronous event: VAPI_PORT_ERROR
(VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
mpirun_rsh: Abort signaled from [40]
[8] Abort: Got an asynchronous event: VAPI_PORT_ERROR
(VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
[56] Abort: Got an asynchronous event: VAPI_PORT_ERROR
(VAPI_EV_SYNDROME_NONE) at line 199 in file viainit.c
done.
When I try to restart the programm, I get instantly
[24] Abort: [opt09:24] Got completion with error, code=VAPI_RETRY_EXC_ERR,
vendor code=85
at line 2044 in file viacheck.c
mpirun_rsh: Abort signaled from [24]
done.
I first executed the program on half of all available machines of the
cluster and repeated everything now on the rest! I got the same error
messages and now the program returns the last set of errors constantly.
I already tried osu_bw in afterwards on all machines and it works...
Thx,
Axel
More information about the mvapich-discuss
mailing list