[mvapich-discuss] Problem with fabric combining DDR and SDR cards

Matthew Koop koop at cse.ohio-state.edu
Tue Apr 29 12:25:07 EDT 2008


Craig,

So are you running with MVAPICH2? Currently MVAPICH2 will require an
additional environment variable when using cards of different types:

MV2_DEFAULT_MTU=IBV_MTU_1024

We will be adding support for cards of different speeds and cards.
MVAPICH 1.0 already has this support.

Let us know if this does not help,

Matt

On Fri, 25 Apr 2008, Craig Tierney wrote:

> I have a SDR based fabric running OFED-1.2.5.1 and MVAPICH (both
> 1.0 and 1.0.2p1).  My vendor sent a DDR card as a replacement
> for a failed SDR and said 'it should just work'.  I tried to use
> it, but I am not able to run jobs.  I get the following error
> as codes startup:
>
> send desc error
> [0] Abort: [] Got completion with error 9, vendor code=8a, dest rank=2
>   at line 513 in file ibv_channel_manager.c
> rank 0 in job 1  w347_44628   caused collective abort of all ranks
>    exit status of rank 0: killed by signal 9
>
> The codes are able to start (for isntance HPL is able to its headers).
> This problem happens using both 1.0 and 1.0.2p1.  It does not happen
> with OpenMPI-1.2.4.
>
> Should I be able to combine DDR and SDR cards in the same fabric and
> run jobs across them?  Are there any performance issues with this
> (not with things running at DDR, but running worse than SDR)?
>
> Thanks,
> Craig
> --
> Craig Tierney (craig.tierney at noaa.gov)
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list