[mvapich-discuss] Erros with MVAPICH2-0.9.3
Abhinav Vishnu
vishnu at cse.ohio-state.edu
Sat Sep 2 00:57:03 EDT 2006
Hi Amit,
> > The problem of getting "VAPI_PORT_ACTIVE""VAPI_PORT_ERROR" events
> > seems like a system setup issue. These are the events
> > generated by VAPI layer once a port comes down and up.
> >
>
> Does it mean that the VAPI library setup needs to be re-visited on our
> system.
> Or this is something which we can ignore? Also jobs resulting in these
> errors never get terminated.
Thanks for reporting this problem.
Actually, VAPI_PORT_ACTIVE means that the port status
of the HCA is fluctuating. This is a typical symptom of bad
connector either at the HCA end or the switch end. Do you see this error
with the same two nodes? Because of this asynchronous event, the QPs will
go into the error state, and i do not expect the processes to go to
completion.
IMO, you will be able to see the errors at the VAPI level tests too, like
perf_main. Can you please try perf_main on the two nodes which are
showing the error and update us on your findings?
Thanks again,
-- Abhinav
>
> Thank you,
> Amit
>
>
> > Thanks.
> >
> > Lei
> >
> > ----- Original Message -----
> > >
> > > Hi MVAPICH----2-0.9.3
> > >
> > > Kernel Version : 2.4.21-20.ELsmp
> > > Arch : x86_64
> > > Compiler : INTEL8.1
> > > mvapich2 : 0.9.3
> > >
> > >
> > > Another Scenario of errors:
> > >
> > >
> > > Thank you for any feedback,
> > > -Amit
> > >
> > > <===================
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > # OSU MPI Bandwidth Test (Version 2.2)
> > > # Size Bandwidth (MB/s)
> > > 1 0.640062
> > > 2 1.280123
> > > 4 2.560247
> > > 8 5.120615
> > > 16 10.240986
> > > 32 20.481973
> > > 64 40.963946
> > > 128 81.927891
> > > 256 163.855783
> > > 512 327.719380
> > > 1024 655.485649
> > > 2048 655.423131
> > > 4096 655.427038
> > > 8192 1048.682011
> > > 16384 1048.727022
> > > 32768 1048.677010
> > > 65536 1048.680135
> > > 131072 1048.687949
> > > rank 6, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > rank 7, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > 262144 986.999817
> > > 524288 828.593573
> > > 1048576 803.785572
> > > 2097152 816.001536
> > > 4194304 822.250940
> > > rank 6, Got an asynchronous event: VAPI_PORT_ERROR
> > > (VAPI_EV_SYNDROME_NONE)rank 7, Got an asynchronous event:
> > > VAPI_PORT_ERROR(VAPI_EV_SYNDROME_NONE)ran
> > > k 7, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > rank 6, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > <==================
> > >
> > > _______________________________________________
> > > mvapich-discuss mailing list
> > > mvapich-discuss at mail.cse.ohio-state.edu
> > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > >
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at mail.cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list