[mvapich-discuss] Hang in CH3 SMP Rendezvous protocol w/ CUDA w/o Infiniband
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Thu Jan 29 09:17:16 EST 2015
On Thu, Jan 22, 2015 at 05:46:19PM -0500, Paul Sathre wrote:
> Hi Khaled,
>
> Thanks for the feedback, what additional information would be most useful?
> full config.log or some subset? /proc/cpuinfo? Something else?
>
> I've dug a little deeper and tried two other non-Infiniband systems I have
> access to, both of which succeed. (With a modified configure line to point
> to a userspace build of libibverbs.so v1.1.8-1 from the Debian repos and
> non-standard CUDA 6.0 path:
>
> ../mvapich2-2.1rc1/configure
> --prefix=/home/psath/mvapich2-2.1rc1/build/install --enable-cuda
> --disable-mcast --with-ib-libpath=/home/psath/libibverbs/install/lib
> --with-ib-include=/home/psath/libibverbs/install/include
> --with-libcuda=/usr/local/cuda-6.0/lib64
> --with-libcudart=/usr/local/cuda-6.0/lib64/
> )
>
> One successful system has dual K20Xm's running Nvidia driver version 331.67
>
> The other has a single C2070 running the same Nvidia driver.
>
> The hanging system has 4x Tesla C2070s running Nvidia driver 319.32 and
> libibverbs 1.1.6 (I have tested swapping in libibverbs 1.1.8 and gcc 4.8 to
> make it more like the successful systems, to no avail. Vimdiff examination
> of the config.log of the failing system vs. either succeeding system shows
> no significant changes.)
Hi Paul. Thanks for pointing out the version of the NVIDIA driver and
sorry that I didn't see this earlier. You'll need to update this to
331.20 or later to get things working. Please let us know if you have
any more questions or face any more issues.
--
Jonathan Perkins
More information about the mvapich-discuss
mailing list