[mvapich-discuss] Hang in Bcast

Matthew Koop koop at cse.ohio-state.edu
Thu Oct 5 02:43:16 EDT 2006


Adam,

> A user is hitting a hang in MPI_Bcast().  We are running a
> Mellanox-modified MVAPICH 0.9.7.  This may have something to do with it
> as the user claims he was able to run with the non-modified 0.9.7.  When
> I get the chance, I'll try to run against a non-modified 0.9.7 or 0.9.8.

I think this may be an issue as well.

> ...
> Do you have any ideas?  Are there any environment variables to disable
> the rdma fast path as a test?

I'd suggest trying with MVAPICH 0.9.8 if this problem is readily
reproducible. Mellanox has made many changes in their latest OFED release
candidates that could be the cause of this issue and they would be better
able to debug their changes if it is not seen in 0.9.8.

Please let us know if this error is also seen with in an unmodified
version of 0.9.8. To answer your other question, you can disable the
adaptive RDMA fast path by setting the environment variable
VIADEV_ADAPTIVE_RDMA_LIMIT to 0.

Thanks,

Matt




More information about the mvapich-discuss mailing list