[mvapich-discuss] Problem with fabric combining DDR and SDR cards

Matthew Koop koop at cse.ohio-state.edu
Wed Apr 30 22:08:16 EDT 2008


Craig,

This setting is needed currently since MVAPICH2 auto-selects an MTU of 1K
for SDR cards and 2K for DDR cards. If the environment is mixed you must
force it to one. We will be changing this for the next version.

You may try running with other MTUs. On our machines at OSU we found 1K to
be the best when running at SDR rates on that card -- particularly for
small-medium sized messages. It may be that for your application another
MTU may work better.

Matt

On Wed, 30 Apr 2008, Craig Tierney wrote:

> Matthew Koop wrote:
> > Craig,
> >
> > So are you running with MVAPICH2? Currently MVAPICH2 will require an
> > additional environment variable when using cards of different types:
> >
> > MV2_DEFAULT_MTU=IBV_MTU_1024
> >
> > We will be adding support for cards of different speeds and cards.
> > MVAPICH 1.0 already has this support.
> >
> > Let us know if this does not help,
> >
>
>  >
>
> I am running MVAPICH2.  Specifying any IBV_MTU_* setting (256,512,1024,2048)
> solves the problem for a small program (HPL).
>
> Why is this setting needed?  Are there any performance issues with
> setting this value?  Why not just use the IBV_MTU_2048 variable?
>
> Thansk,
> Craig
>
>
>
> > Matt
> >
> > On Fri, 25 Apr 2008, Craig Tierney wrote:
> >
> >> I have a SDR based fabric running OFED-1.2.5.1 and MVAPICH (both
> >> 1.0 and 1.0.2p1).  My vendor sent a DDR card as a replacement
> >> for a failed SDR and said 'it should just work'.  I tried to use
> >> it, but I am not able to run jobs.  I get the following error
> >> as codes startup:
> >>
> >> send desc error
> >> [0] Abort: [] Got completion with error 9, vendor code=8a, dest rank=2
> >>   at line 513 in file ibv_channel_manager.c
> >> rank 0 in job 1  w347_44628   caused collective abort of all ranks
> >>    exit status of rank 0: killed by signal 9
> >>
> >> The codes are able to start (for isntance HPL is able to its headers).
> >> This problem happens using both 1.0 and 1.0.2p1.  It does not happen
> >> with OpenMPI-1.2.4.
> >>
> >> Should I be able to combine DDR and SDR cards in the same fabric and
> >> run jobs across them?  Are there any performance issues with this
> >> (not with things running at DDR, but running worse than SDR)?
> >>
> >> Thanks,
> >> Craig
> >> --
> >> Craig Tierney (craig.tierney at noaa.gov)
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
>
> --
> Craig Tierney (craig.tierney at noaa.gov)
>



More information about the mvapich-discuss mailing list