[mvapich-discuss] Problem with fabric combining DDR and SDR cards

Craig Tierney Craig.Tierney at noaa.gov
Fri Apr 25 11:44:01 EDT 2008


I have a SDR based fabric running OFED-1.2.5.1 and MVAPICH (both
1.0 and 1.0.2p1).  My vendor sent a DDR card as a replacement
for a failed SDR and said 'it should just work'.  I tried to use
it, but I am not able to run jobs.  I get the following error
as codes startup:

send desc error
[0] Abort: [] Got completion with error 9, vendor code=8a, dest rank=2
  at line 513 in file ibv_channel_manager.c
rank 0 in job 1  w347_44628   caused collective abort of all ranks
   exit status of rank 0: killed by signal 9

The codes are able to start (for isntance HPL is able to its headers).
This problem happens using both 1.0 and 1.0.2p1.  It does not happen
with OpenMPI-1.2.4.

Should I be able to combine DDR and SDR cards in the same fabric and
run jobs across them?  Are there any performance issues with this
(not with things running at DDR, but running worse than SDR)?

Thanks,
Craig
-- 
Craig Tierney (craig.tierney at noaa.gov)


More information about the mvapich-discuss mailing list