[mvapich-discuss] Deadlock with CUDA and InfiniBand

Witherden, Freddie freddie.witherden08 at imperial.ac.uk
Sun Sep 14 17:48:41 EDT 2014


Hi Hari,

> We have not seen this error before. Our internal testing environment uses regular
>  OFED (OFED-1.5.3.2), not Intel OFED and it runs fine with PSM
>  (gQLogicIB-Basic.RHEL6-x86_64.7.0.1.0.43). So it could be that there is some
>  conflict between Intel's version of OFED and the PSM libraries.

Thank you for this suggestion.  I switched back to the OFED stack that came with my distribution along with the corresponding psm library/headers.  MVAPICH2 is now working with psm without having to touch any environmental variables.  Performance for my reference simulation is some 10% better than before, too.

One thing I have noticed is that  -- sometimes -- when launching my application with OpenMPI + IB I get a warning regarding fork() being considered harmful in the current configuration.  Installing a pthread_atfork handler to dump the stack traced showed cuInit() to be the responsible party.

Assuming MVAPICH2 has similar restrictions with regards to fork() have any issues relating to CUDA initialization been reported?

Regards, Freddie.


More information about the mvapich-discuss mailing list