[mvapich-discuss] Erros with MVAPICH2-0.9.3
Amit H Kumar
AHKumar at odu.edu
Tue Sep 5 11:56:42 EDT 2006
Hi Abhinav,
I believe as you mentioned it only happens with specific nodes.
My perf_main tests fails as follows.
SENDER:
========
11:16] node-0-0:bin] perf_main --send -trc -mbw -s1280 -n1000
********************************************
********* perf_main version 8.0 *********
********* CPU is: 1.00 MHz *********
********************************************
************* RC BW Test started *********************
Completion with error on send queue (syndrome=0x81=VAPI_RETRY_EXC_ERR ,
opcode=0=VAPI_CQE_SQ_SEND_DATA)
PERF_poll_cq: completion with error (12) detected
RECEIVER:
==========
perf_main -a172.25.23.254
********************************************
********* perf_main version 8.0 *********
********* CPU is: 1.00 MHz *********
********************************************
************* RC BW Test started *********************
..... Nothing shows up it hangs ...
Any idea what could be wrong.
Thank you,
-Amit
Abhinav Vishnu <vishnu at cse.ohio-state.edu> wrote on 09/02/2006 12:57:03 AM:
> Hi Amit,
>
> > > The problem of getting "VAPI_PORT_ACTIVE""VAPI_PORT_ERROR" events
> > > seems like a system setup issue. These are the events
> > > generated by VAPI layer once a port comes down and up.
> > >
> >
> > Does it mean that the VAPI library setup needs to be re-visited on our
> > system.
> > Or this is something which we can ignore? Also jobs resulting in these
> > errors never get terminated.
>
> Thanks for reporting this problem.
>
> Actually, VAPI_PORT_ACTIVE means that the port status
> of the HCA is fluctuating. This is a typical symptom of bad
> connector either at the HCA end or the switch end. Do you see this error
> with the same two nodes? Because of this asynchronous event, the QPs
will
> go into the error state, and i do not expect the processes to go to
> completion.
>
> IMO, you will be able to see the errors at the VAPI level tests too, like
> perf_main. Can you please try perf_main on the two nodes which are
> showing the error and update us on your findings?
>
> Thanks again,
>
> -- Abhinav
>
> >
> > Thank you,
> > Amit
> >
> >
> > > Thanks.
> > >
> > > Lei
> > >
> > > ----- Original Message -----
> > > >
> > > > Hi MVAPICH----2-0.9.3
> > > >
> > > > Kernel Version : 2.4.21-20.ELsmp
> > > > Arch : x86_64
> > > > Compiler : INTEL8.1
> > > > mvapich2 : 0.9.3
> > > >
> > > >
> > > > Another Scenario of errors:
> > > >
> > > >
> > > > Thank you for any feedback,
> > > > -Amit
> > > >
> > > > <===================
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > sched_setaffinity: Bad address
> > > > # OSU MPI Bandwidth Test (Version 2.2)
> > > > # Size Bandwidth (MB/s)
> > > > 1 0.640062
> > > > 2 1.280123
> > > > 4 2.560247
> > > > 8 5.120615
> > > > 16 10.240986
> > > > 32 20.481973
> > > > 64 40.963946
> > > > 128 81.927891
> > > > 256 163.855783
> > > > 512 327.719380
> > > > 1024 655.485649
> > > > 2048 655.423131
> > > > 4096 655.427038
> > > > 8192 1048.682011
> > > > 16384 1048.727022
> > > > 32768 1048.677010
> > > > 65536 1048.680135
> > > > 131072 1048.687949
> > > > rank 6, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > > rank 7, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > > 262144 986.999817
> > > > 524288 828.593573
> > > > 1048576 803.785572
> > > > 2097152 816.001536
> > > > 4194304 822.250940
> > > > rank 6, Got an asynchronous event: VAPI_PORT_ERROR
> > > > (VAPI_EV_SYNDROME_NONE)rank 7, Got an asynchronous event:
> > > > VAPI_PORT_ERROR(VAPI_EV_SYNDROME_NONE)ran
> > > > k 7, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > > rank 6, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > > <==================
> > > >
> > > > _______________________________________________
> > > > mvapich-discuss mailing list
> > > > mvapich-discuss at mail.cse.ohio-state.edu
> > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > > >
> > >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at mail.cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
More information about the mvapich-discuss
mailing list