[mvapich-discuss] Deadlock with CUDA and InfiniBand

Witherden, Freddie freddie.witherden08 at imperial.ac.uk
Thu Sep 11 11:51:19 EDT 2014


Hi Hari,

> Thanks for the details. I understand the issue now. I do not think QLogic HCAs have the proper support for the 
> rdma fast path feature in MVAPICH2. This could be the reason why you saw the hang with that feature enabled. 
> And yes - for QLogic HCA's you should be building MVAPICH2 with ch3:psm for best performance and 
> functionality.

Thank you for suggesting PSM.  I installed the Intel OFED stack on the cluster and recompiled MVAPICH2 with psm support.  Unfortunately, when running my application I get errors along the lines of:

  compute-0-1.local.20860Unexpected error in writev(): Invalid argument (errno=22) (fd=7,iovec=0x7fffe52f4640,len=3) (err=23)

on the nodes.  Switching back to the ch3:mrail build (with the fast path disabled, but on the new Intel stack) works, however.  Hence, I believe that the stack is functioning correctly -- just that there is something awry with PSM.

Regards, Freddie.


More information about the mvapich-discuss mailing list