[mvapich-discuss] Erros with MVAPICH2-0.9.3

Abhinav Vishnu vishnu at cse.ohio-state.edu
Sat Sep 2 00:57:03 EDT 2006


Hi Amit,

> > The problem of getting "VAPI_PORT_ACTIVE""VAPI_PORT_ERROR" events
> > seems like a system setup issue. These are the events
> > generated by VAPI layer once a port comes down and up.
> >
>
> Does it mean that the VAPI library setup needs to be re-visited on our
> system.
> Or this is something which we can ignore? Also jobs resulting in these
> errors never get terminated.

Thanks for reporting this problem.

Actually, VAPI_PORT_ACTIVE means that the port status
of the HCA is fluctuating. This is a typical symptom of bad
connector either at the HCA end or the switch end. Do you see this error
with the same two nodes?  Because of this asynchronous event, the QPs will
go into the error state, and i do not expect the processes to go to
completion.

IMO, you will be able to see the errors at the VAPI level tests too, like
perf_main.  Can you please try perf_main on the two nodes which are
showing the error and update us on your findings?

Thanks again,

-- Abhinav

>
> Thank you,
> Amit
>
>
> > Thanks.
> >
> > Lei
> >
> > ----- Original Message -----
> > >
> > > Hi MVAPICH----2-0.9.3
> > >
> > > Kernel Version    :     2.4.21-20.ELsmp
> > > Arch        :     x86_64
> > > Compiler    :     INTEL8.1
> > > mvapich2    :     0.9.3
> > >
> > >
> > > Another Scenario of errors:
> > >
> > >
> > > Thank you for any feedback,
> > > -Amit
> > >
> > > <===================
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > sched_setaffinity: Bad address
> > > # OSU MPI Bandwidth Test (Version 2.2)
> > > # Size          Bandwidth (MB/s)
> > > 1               0.640062
> > > 2               1.280123
> > > 4               2.560247
> > > 8               5.120615
> > > 16              10.240986
> > > 32              20.481973
> > > 64              40.963946
> > > 128             81.927891
> > > 256             163.855783
> > > 512             327.719380
> > > 1024            655.485649
> > > 2048            655.423131
> > > 4096            655.427038
> > > 8192            1048.682011
> > > 16384           1048.727022
> > > 32768           1048.677010
> > > 65536           1048.680135
> > > 131072          1048.687949
> > > rank 6, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > rank 7, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > 262144          986.999817
> > > 524288          828.593573
> > > 1048576         803.785572
> > > 2097152         816.001536
> > > 4194304         822.250940
> > > rank 6, Got an asynchronous event: VAPI_PORT_ERROR
> > > (VAPI_EV_SYNDROME_NONE)rank 7, Got an asynchronous event:
> > > VAPI_PORT_ERROR(VAPI_EV_SYNDROME_NONE)ran
> > > k 7, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > rank 6, Got an asynchronous event: VAPI_PORT_ACTIVE
> > > <==================
> > >
> > > _______________________________________________
> > > mvapich-discuss mailing list
> > > mvapich-discuss at mail.cse.ohio-state.edu
> > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > >
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at mail.cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list